CN111242883A - Dynamic scene HDR reconstruction method based on deep learning - Google Patents

Dynamic scene HDR reconstruction method based on deep learning Download PDF

Info

Publication number
CN111242883A
CN111242883A CN202010026179.3A CN202010026179A CN111242883A CN 111242883 A CN111242883 A CN 111242883A CN 202010026179 A CN202010026179 A CN 202010026179A CN 111242883 A CN111242883 A CN 111242883A
Authority
CN
China
Prior art keywords
images
image
network
hdr
exposure
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010026179.3A
Other languages
Chinese (zh)
Other versions
CN111242883B (en
Inventor
何刚
卢星星
宋嘉轩
李云松
谢卫莹
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xidian University
Original Assignee
Xidian University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xidian University filed Critical Xidian University
Priority to CN202010026179.3A priority Critical patent/CN111242883B/en
Publication of CN111242883A publication Critical patent/CN111242883A/en
Application granted granted Critical
Publication of CN111242883B publication Critical patent/CN111242883B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/50Image enhancement or restoration using two or more images, e.g. averaging or subtraction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/30Determination of transform parameters for the alignment of images, i.e. image registration
    • G06T7/33Determination of transform parameters for the alignment of images, i.e. image registration using feature-based methods
    • G06T7/337Determination of transform parameters for the alignment of images, i.e. image registration using feature-based methods involving reference images or patches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/97Determining parameters from multiple pictures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20212Image combination
    • G06T2207/20221Image fusion; Image merging

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Processing (AREA)

Abstract

The invention discloses a dynamic scene HDR reconstruction method based on deep learning, which solves the problem that the image processing effect needs to be improved in the prior art. The method comprises the following steps of acquiring three images of underexposure, normal exposure and overexposure in the same static scene by using a fixed camera; in a dynamic scene, the three images are acquired by a handheld camera and are marked as D1, D2 and D3; registering D1, S2 and D3 by using an LK optical flow method, and recording image sequences as a training set of pairs consisting of R1, R2 and R3 and the group Truth obtained in the step 1; transforming R1, R2 and R3 into linear domains, denoted as H1, H2 and H3, using the corresponding curves of the camera; extracting brightness information in H1, H2 and H3 images by using a contrast operator, and recording the brightness information as M1, M2 and M3; extracting detail information in the images of R1, R2 and R3 by using a gradient operator, and recording the detail information as L1, L2 and L3; the orientation module based on Resnet is designed. The HDR image generated by the technology has rich details, high contrast and wide color gamut and high dynamic range.

Description

Dynamic scene HDR reconstruction method based on deep learning
Technical Field
The invention relates to the field of digital video and computer photographic image processing, in particular to a dynamic scene HDR reconstruction method based on deep learning.
Background
Dynamic range refers to the ratio of the luminance maximum to the luminance minimum in a scene. In a real scene, the dynamic range can reach 10^8 from the brightest sunlight to the darkest starlight, and the brightness range which can be distinguished by human eyes is also as high as 10^ 5. However, the dynamic range captured by the common sensor does not exceed 10^3, and the dynamic range of the display is only 10^ 2. Due to the fact that the dynamic ranges of a real scene and a common digital device are not matched with each other, the problems of overexposure, underexposure, loss of detail information and the like of an image captured by an imaging device usually occur. In practical applications, HDR images are difficult to obtain, reducing the effectiveness of the images in applications such as digital television, computer photography, and game rendering. Therefore, the method has stronger practical significance for the research of the HDR image reconstruction algorithm.
Currently, HDR image reconstruction algorithms are mainly divided into two directions of traditional image fusion and deep learning. The method for acquiring the HDR image through the traditional image fusion mainly comprises a direct fusion method, a block fusion method and a layered fusion method. The direct fusion method mainly comprises the following steps of converting High Dynamic Range radiation Maps from photos, providing that the brightness values and the exposure time of images with different exposure degrees are related to the illuminance of pixel points at corresponding positions, establishing a corresponding curve model of the camera according to the brightness values and the exposure time, solving a response function of the camera, and obtaining the illuminance value of a real scene through inverse operation. And fusing the plurality of images into a high dynamic range image after obtaining the illumination numerical value of the real scene, and finally displaying the high dynamic range image on a common display screen through tone mapping. The method is computationally complex and the resulting high dynamic range image cannot be directly displayed on a common display screen. The fusion method based on the region is to divide blocks along with the image and adopt the information entropy theory. And selecting the blocks containing large amount of information for fusion. However, the method has poor effect on processing the block boundary, and the fused image is easy to generate obvious block effect. The method based on hierarchical Fusion mainly comprises Exposure Fusion, the thesis provides a method based on Laplace pyramid Fusion, a plurality of multi-Exposure images are subjected to scale decomposition, then three evaluation indexes of contrast, saturation and Exposure are synthesized to obtain a weight graph of each image, a comprehensive pyramid coefficient is obtained after the weight graph is subjected to average weighting, and finally the Laplace pyramid is reconstructed to obtain a Fusion image. This method is currently the most efficient fusion method. However, this method has a certain disadvantage that the detail information of the image is seriously lost in the too bright and too dark areas.
The method for acquiring the high dynamic range image based on Deep learning mainly comprises the following methods, the paper HDR image reconstruction From a Single exposed use Deep CNNs proposes to use an automatic encoder to take a Single LDR image as input, firstly perform down-sampling to extract features, and then perform up-sampling to reconstruct the HDR image. A paper debug Patch-Based HDR Reconnection of Dynamic Scenes utilizes a group of images with different exposures, including N overexposed images, N underexposed images and a normally exposed image, the method proposed by the paper firstly utilizes a camera response function to adjust the overexposed and underexposed images to ensure that the overexposed and underexposed images have the same exposure amount as the normally exposed image, then utilizes MBDS (scanning visual data using bidirectional exposure) to select two images which are closest to the content of the normally exposed image from the underexposed and overexposed images after the exposure amount is changed, and converts the normally exposed image and the two selected images into 10 bits to be fused, thereby obtaining a final high Dynamic range image. The image contrast obtained by the algorithm is obviously improved, the details of highlight and dark parts are effectively enhanced, but when micro-motion exists in a multi-frame image or a camera shakes, the high dynamic range image obtained by the algorithm has a ghost phenomenon.
Disclosure of Invention
The invention overcomes the problem that the image processing effect needs to be improved in the prior art, and provides the dynamic scene HDR reconstruction method based on deep learning, which is rich in details and high in definition.
The technical solution of the present invention is to provide a dynamic scene HDR reconstruction method based on deep learning, which comprises the following steps: comprises the following steps of (a) carrying out,
step 1: in the same static scene, three images of underexposure, normal exposure and overexposure with the same details and range are obtained by a tripod camera and recorded as S1, S2 and S3, the exposure time of the corresponding images is recorded, and the images are fused by a weighted fusion algorithm to obtain a Ground Truth which is recorded as T;
step 2, in a dynamic scene, acquiring three images of underexposure, normal exposure and overexposure by using a handheld camera, recording the three images as D1, D2 and D3, and replacing the D2 with the image S2 obtained in the step 1;
and step 3: registering D1, S2 and D3 by using an LK optical flow method, recording the registered image sequences as R1, R2 and R3, and forming a paired training set by the group Truth obtained in the step 1;
and 4, step 4: transforming R1, R2 and R3 into a linear domain using the camera corresponding curves, recording the transformed images as H1, H2 and H3;
and 5: extracting brightness information in H1, H2 and H3 images by using a contrast operator, and recording the obtained brightness images as M1, M2 and M3;
step 6: extracting detail information in the R1, R2 and R3 images by using a gradient operator, and recording the obtained detail images as L1, L2 and L3;
and 7: designing an Attention module based on Resnet;
and 8: constructing an HDR reconstruction network based on U-Net and ResNet, and designing a mixed structure loss function;
and step 9: merging the channels of the images R1, R2 and R3 obtained in the step 3 and the channels of the images H1, H2 and H3 obtained in the step 4 as the input of the step 8, merging the channels of the images M1, M2 and M3 obtained in the step 5 and the channels of the images L1, L2 and L3 obtained in the step 6 as the input of the Attention module constructed in the step 7, and taking the image T obtained in the step 1 as a label to train the network;
step 10: inputting the test image into the trained reconstruction network for the network model trained in the step 9 to obtain an HDR image;
step 11: and carrying out tone mapping on the generated HDR image by utilizing a Reinhard tone mapping algorithm, and displaying a reconstructed image on an 8bit display screen.
Preferably, the specific steps of step 1 are:
step 1-1: exposure adjustment was performed on the resulting images S1, S2, and S3, which are denoted as L1, L2, and L3 as shown in the following formula:
Figure BDA0002362545150000021
step 1-2: and (3) fusing the L1, the L2 and the L3 obtained in the step 1-1 according to a simple fusion algorithm to generate an HDR image as a group Truth, wherein a specific formula is as follows:
Figure BDA0002362545150000031
preferably, the specific steps of step 3 are:
step 3-1: exposure adjustment is carried out on the three images D1, S2 and D3 obtained in the step 2, the exposure amount of S2 is adjusted to be the same as the exposure amount of D1 by using the exposure response curve of the camera, which is recorded as D2-1, and the exposure response curve of the camera is Ev-f (Bv, Sv), wherein E isVThe exposure of the image is determined by the exposure F and the exposure time T of the camera, and the calculation method is shown as the following formula:
Figure BDA0002362545150000032
the exposure of the camera is determined by the focal length f and the aperture diameter D of the camera, and the calculation method is shown as the following formula:
Figure BDA0002362545150000033
Bvis the brightness value of the image, i.e. the pixel value, SvIs the ISO photosensitivity coefficient of the camera, which is a constant, where the value is 100;
step 3-2: detecting characteristic points in D1 and D2-1 by using a Harris corner detection method;
step 3-3: calculating an optical flow vector between D1 and D2-1 by using an LK optical flow method;
step 3-4: aligning D1 and D2-1 by using a bicubic interpolation method and the optical flow vector obtained in the step 3-3;
step 3-5: repeating the step 3-1, adjusting the exposure amount of S2 to be the same as D3 by using the camera response curve, and recording as D2-3;
step 3-6: repeat steps 3-2, 3-3 and 3-4 above to align D3 with D2-3.
Preferably, the step 4 uses a gamma curve to convert the image from a linear domain to a non-linear domain, as shown in the following formula: f ═ xγWhere γ is 2, x is an LDR image, and f is an HDR domain image obtained after transformation.
Preferably, in the step 5, the luminance information of the images H1, H2 and H3 obtained in the step 4 is extracted by using a contrast operator, which is specifically expressed as follows:
Figure BDA0002362545150000034
Figure BDA0002362545150000035
preferably, the step 6 uses a gradient operator to extract detail information of the images R1, R2 and R3 obtained in the step 3, as shown in the following formula:
Figure BDA0002362545150000036
preferably, the specific steps of step 7 are:
step 7-1: combining the brightness characteristic maps M1, M2 and M3 obtained in the step 5 and the detail characteristic maps L1, L2 and L3 obtained in the step 6 as the input of the Attention module;
step 7-2: constructing an Attention module, and passing the input obtained in the step 7-1 through a Resnet module;
and 7-3: the output obtained in the step 7-2 passes through three layers of convolution layers, the size of a convolution kernel is 3 x 3, the convolution step size is 2, the used activation function is Relu, and the expression of the activation function is as follows: max (0, x);
and 7-4: and (4) outputting the characteristic diagram obtained in the step (7-3) as f _ A through a Sigmoid activation function, wherein the output characteristic diagram is specifically shown as the following formula:
Figure BDA0002362545150000041
preferably, the specific steps of step 8 are:
step 8-1: constructing an encoding network, namely a down-sampling network, wherein the network consists of four layers of convolution blocks, and the structure of each convolution block comprises a convolution layer, a batch normalization BN layer and an activation function Relu layer;
step 8-2: merging, namely concat, the images H1, H2 and H3 obtained in the step 4 and the corresponding channels of the images R1, R2 and R3 obtained in the step 3 respectively as the input of a coding network, merging the output channels of the three encoders after down-sampling of two layers of rolling blocks respectively after 3 groups of images, and recording the output characteristic image as f _ U;
step 8-3: and performing dot multiplication on the output characteristic diagram of the step 8-2 and the output characteristic diagram of the step 7-4, wherein the output characteristic diagram is recorded as F: f — a · F _ U;
step 8-4: adding the output characteristic diagram obtained in the step 8-3 and the output characteristic diagram obtained in the step 8-2, and recording the obtained characteristic diagram as F _ R, wherein F _ R is F + F _ u;
and 8-5: constructing a fusion network, wherein the network consists of a residual block, and the input is the output characteristic diagram obtained in the step 8-4;
and 8-6: constructing a decoding network, namely an up-sampling network, wherein the network consists of four layers of convolution blocks and is symmetrical to a coding network, each convolution block has a BN layer, a Relu layer and a deconvolution layer, and jump connection is established between corresponding layers with the same size as the image size of the coding network;
and 8-7: the loss function of the network consists of two parts, including MSE loss and VGG loss, as follows: MSE loss calculation is carried out on an image obtained by carrying out tone mapping on a HDR image generated by a network
Figure BDA0002362545150000042
And the mean square error between the groups Truth obtained in the step 1 is as follows:
Figure BDA0002362545150000043
the perceptual loss function is a VGG-16 network pre-trained on the ImageNet dataset and is denoted as φ, and is calculated as follows:
Figure BDA0002362545150000044
and 8-8: and (4) training the network by using the network input and the label obtained in the step (9) to complete the reconstruction process of the HDR.
Preferably, step 11 performs tone mapping on the test image obtained in step 10 by using a Reinhard tone mapping algorithm.
Compared with the prior art, the dynamic scene HDR reconstruction method based on deep learning has the following advantages: the method can be used for reconstructing a Low Dynamic Range (LDR) image containing a small moving object to obtain a High Dynamic Range (HDR) image, firstly, the images are registered by using an optical flow method for ghost and halo phenomena occurring when a traditional method based on image fusion and the existing method based on deep learning is used for processing a dynamic scene, meanwhile, brightness and detail information in the low dynamic range image are extracted, a deep learning model based on U-Net and ResNet is constructed, and the extracted brightness and detail information are used for assisting in training the model, so that the HDR image generated after multi-exposure image fusion contains rich details and higher contrast. Aiming at the problems of highlight and dark part detail loss of the existing HDR reconstruction algorithm, a mixed structure loss function is designed to ensure detail reconstruction, so that the aim of the invention is fulfilled.
The HDR reconstruction algorithm is combined with a dynamic scene and a static scene to produce an available data set and a real HDR image on the premise of not depending on hardware equipment; designing a CNN network based on a U-ResNet frame, and fusing multiple frames of LDR images with different exposures by utilizing deep learning to reconstruct an HDR image; and meanwhile, designing an attention module, extracting the detail and brightness information of the LDR image through a traditional image algorithm, and taking the detail and brightness information as the input of the attention module to assist in training the reconstruction network. The detail and brightness information of the image are improved through a designed algorithm, and meanwhile, the dynamic range of the image is expanded. The method can process images with larger motion in a scene and images with more saturated areas, and the generated HDR images have rich details, high contrast, wide color gamut and high dynamic range.
Drawings
FIG. 1 is a schematic diagram of the network architecture of the present invention;
FIG. 2 is a schematic diagram of a network structure of the Attention module in the present invention;
FIG. 3 is an LDR image with an exposure of-2 EV according to a first simulation test of the present invention;
FIG. 4 is an LDR image with an exposure of 0EV according to a first simulation test of the present invention;
FIG. 5 is an LDR image with an exposure of +2EV according to a first simulation test of the present invention;
FIG. 6 is a HDR image tone mapped using Deep-HDR method according to the first simulation test of the present invention;
FIG. 7 is a tone mapped HDR image using the Expand-HDR method of the present invention simulation test one;
FIG. 8 is a tone mapped HDR image using the Sen method in simulation test one of the present invention;
FIG. 9 is a tone mapped HDR image using the method of the present invention in a simulation test of the present invention;
FIG. 10 is an image of a group Truth in simulation test one of the present invention;
FIG. 11 is an LDR image with an exposure of-2 EV according to a second simulation test of the present invention;
FIG. 12 is an LDR image with an exposure of 0EV according to a second simulation test of the present invention;
FIG. 13 is an LDR image with an exposure of +2EV according to a second simulation test of the present invention;
FIG. 14 is a HDR image tone mapped using Deep-HDR method according to the first simulation test of the present invention;
FIG. 15 is a tone mapped HDR image using the Expand-HDR method of the present invention simulation test one;
FIG. 16 is a tone mapped HDR image using the Sen method in simulation test one of the present invention;
FIG. 17 is a tone mapped HDR image using the method of the present invention in a simulation test of the present invention.
FIG. 18 is an image of a group Truth in simulation test one of the present invention;
wherein Deep-hdr refers to the method proposed in the paper Deep high dynamic range imaging with large for the estimated movements;
the Expand-hdr refers to the method proposed in the article Expand connected network for high dynamic range expansion from low dynamic range content computer graphics Forum;
sen refers to the method proposed in the paper Robust batch-based hdr recovery of dynamic scenes;
our refers to the methods set forth herein;
the group Truth refers to a real image.
Detailed Description
The deep learning based dynamic scene HDR reconstruction method of the present invention is further described with reference to the accompanying drawings and the detailed description below: as shown in the figure, the present embodiment includes the following steps,
step 1: in the same static scene, three images of underexposure, normal exposure and overexposure with the same details and range are obtained by a tripod camera and recorded as S1, S2 and S3, the exposure time of the corresponding images is recorded, and the images are fused by a weighted fusion algorithm to obtain a Ground Truth which is recorded as T;
step 2, in a dynamic scene, acquiring three images of underexposure, normal exposure and overexposure by using a handheld camera, recording the three images as D1, D2 and D3, and replacing the D2 with the image S2 obtained in the step 1;
and step 3: registering D1, S2 and D3 by using an LK optical flow method, recording the registered image sequences as R1, R2 and R3, and forming a paired training set by the group Truth obtained in the step 1;
and 4, step 4: transforming R1, R2 and R3 into a linear domain using the camera corresponding curves, recording the transformed images as H1, H2 and H3;
and 5: extracting brightness information in H1, H2 and H3 images by using a contrast operator, and recording the obtained brightness images as M1, M2 and M3;
step 6: extracting detail information in the R1, R2 and R3 images by using a gradient operator, and recording the obtained detail images as L1, L2 and L3;
and 7: designing an Attention module based on Resnet;
and 8: constructing an HDR reconstruction network based on U-Net and ResNet, and designing a mixed structure loss function;
and step 9: merging the channels of the images R1, R2 and R3 obtained in the step 3 and the channels of the images H1, H2 and H3 obtained in the step 4 as the input of the step 8, merging the channels of the images M1, M2 and M3 obtained in the step 5 and the channels of the images L1, L2 and L3 obtained in the step 6 as the input of the Attention module constructed in the step 7, and taking the image T obtained in the step 1 as a label to train the network;
step 10: inputting the test image into the trained reconstruction network for the network model trained in the step 9 to obtain an HDR image;
step 11: and carrying out tone mapping on the generated HDR image by utilizing a Reinhard tone mapping algorithm, and displaying a reconstructed image on an 8bit display screen.
The specific steps of the step 1 are as follows:
step 1-1: exposure adjustment was performed on the resulting images S1, S2, and S3, which are denoted as L1, L2, and L3 as shown in the following formula:
Figure BDA0002362545150000061
step 1-2: and (3) fusing the L1, the L2 and the L3 obtained in the step 1-1 according to a simple fusion algorithm to generate an HDR image as a group Truth, wherein a specific formula is as follows:
Figure BDA0002362545150000071
the specific steps of the step 3 are as follows:
step 3-1: exposure adjustment is performed on the three images D1, S2, and D3 obtained in step 2, and a camera is usedThe exposure response curve of (a) adjusts the exposure amount of S2 to be the same as the exposure amount of D1, and is denoted as D2-1, and the exposure response curve of the camera is Ev-f (Bv, Sv), where E isVThe exposure of the image is determined by the exposure F and the exposure time T of the camera, and the calculation method is shown as the following formula:
Figure BDA0002362545150000072
the exposure of the camera is determined by the focal length f and the aperture diameter D of the camera, and the calculation method is shown as the following formula:
Figure BDA0002362545150000073
Bvis the brightness value of the image, i.e. the pixel value, SvIs the ISO photosensitivity coefficient of the camera, which is a constant, where the value is 100;
step 3-2: detecting characteristic points in D1 and D2-1 by using a Harris corner detection method;
step 3-3: calculating an optical flow vector between D1 and D2-1 by using an LK optical flow method;
step 3-4: aligning D1 and D2-1 by using a bicubic interpolation method and the optical flow vector obtained in the step 3-3;
step 3-5: repeating the step 3-1, adjusting the exposure amount of S2 to be the same as D3 by using the camera response curve, and recording as D2-3;
step 3-6: repeat steps 3-2, 3-3 and 3-4 above to align D3 with D2-3.
Step 4, converting the image from a linear domain to a non-linear domain by using a gamma curve, as shown in the following formula: f ═ xγWhere γ is 2, x is an LDR image, and f is an HDR domain image obtained after transformation.
In the step 5, the contrast operator is used to extract the brightness information of the images H1, H2, and H3 obtained in the step 4, which is specifically shown as follows:
Figure BDA0002362545150000074
Figure BDA0002362545150000075
the step 6 utilizes a gradient operator to extract detail information of the images R1, R2 and R3 obtained in the step 3, and the following formula is shown as follows:
Figure BDA0002362545150000076
the specific steps of the step 7 are as follows:
step 7-1: combining the brightness characteristic maps M1, M2 and M3 obtained in the step 5 and the detail characteristic maps L1, L2 and L3 obtained in the step 6 as the input of the Attention module;
step 7-2: constructing an Attention module, and passing the input obtained in the step 7-1 through a Resnet module;
and 7-3: the output obtained in the step 7-2 passes through three layers of convolution layers, the size of a convolution kernel is 3 x 3, the convolution step size is 2, the used activation function is Relu, and the expression of the activation function is as follows: max (0, x);
and 7-4: and (4) outputting the characteristic diagram obtained in the step (7-3) as f _ A through a Sigmoid activation function, wherein the output characteristic diagram is specifically shown as the following formula:
Figure BDA0002362545150000081
the specific steps of the step 8 are as follows:
step 8-1: constructing an encoding network, namely a down-sampling network, wherein the network consists of four layers of convolution blocks, and the structure of each convolution block comprises a convolution layer, a batch normalization BN layer and an activation function Relu layer;
step 8-2: merging, namely concat, the images H1, H2 and H3 obtained in the step 4 and the corresponding channels of the images R1, R2 and R3 obtained in the step 3 respectively as the input of a coding network, merging the output channels of the three encoders after down-sampling of two layers of rolling blocks respectively after 3 groups of images, and recording the output characteristic image as f _ U;
step 8-3: and performing dot multiplication on the output characteristic diagram of the step 8-2 and the output characteristic diagram of the step 7-4, wherein the output characteristic diagram is recorded as F: f — a · F _ U;
step 8-4: adding the output characteristic diagram obtained in the step 8-3 and the output characteristic diagram obtained in the step 8-2, and recording the obtained characteristic diagram as F _ R, wherein F _ R is F + F _ u;
and 8-5: constructing a fusion network, wherein the network consists of a residual block, and the input is the output characteristic diagram obtained in the step 8-4;
and 8-6: constructing a decoding network, namely an up-sampling network, wherein the network consists of four layers of convolution blocks and is symmetrical to a coding network, each convolution block has a BN layer, a Relu layer and a deconvolution layer, and jump connection is established between corresponding layers with the same size as the image size of the coding network;
and 8-7: the loss function of the network consists of two parts, including MSE loss and VGG loss, as follows: MSE loss calculation is carried out on an image obtained by carrying out tone mapping on a HDR image generated by a network
Figure BDA0002362545150000082
And the mean square error between the groups Truth obtained in the step 1 is as follows:
Figure BDA0002362545150000083
the perceptual loss function is a VGG-16 network pre-trained on the ImageNet dataset and is denoted as φ, and is calculated as follows:
Figure BDA0002362545150000084
and 8-8: and (4) training the network by using the network input and the label obtained in the step (9) to complete the reconstruction process of the HDR.
And 11, performing tone mapping on the test image obtained in the step 10 by adopting a Reinhard tone mapping algorithm.
The effect of the present invention is further described below with the simulation experiment:
1. simulation experiment conditions are as follows:
the hardware environment of the invention simulation is as follows: intel Core (TM) i5-4570 CPU @3.20GHz x 8, GPU NVIDIAGeForce GTX 10808G run memory; software environment: ubuntu16.04, python 3.6; experiment framework: tensorflow.
2. Simulation and instance content and result analysis
The invention selects a test set in a public HDR data set as an experimental sample, inputs the experimental sample into a trained network for experiment, obtains the contrast effect of the image after a tone mapping algorithm, inputs the image into LDR images with different exposures in three dynamic scenes, and respectively outputs the exposure as-2 EV, 0EV and +2EV, if a first group of images are shown in figures 3-5 and a second group of images are shown in figures 11-13, the HDR image after tone mapping is output:
the green boxes in the two sets of comparison result figures indicate that our algorithm recovers better than the existing algorithm at the details of the highlights and the darks. As can be seen from the first set of comparison results shown in FIGS. 6-10, the spill-HDR, Expand-HDR and Sen methods all have the overflow phenomenon in the marked highlight region, and the details of the highlight region are completely recovered by our algorithm, so as to achieve the same effect as the real image. From the second set of comparison results, fig. 14-18, it can be seen that the existing algorithm is wrong in the recovery at highlights, especially the Sen method, and the details of the recovery of our method are closest to the real scene. Meanwhile, compared with the existing algorithm, the objective index PSNR is averagely improved by 0.1 dB.
References in the practice of the invention:
[1]Wu,Shangzhe,Xu,Jiarui,Tai,Yu-Wing,&Tang,Chi-Keung..Deep highdynamic range imaging with large foreground motions.
[2]Marnerides,D.,Bashford-Rogers,T.,Hatchett,J.,&Debattista,K..Expandnet:a deep convolutional neural network for high dynamic rangeexpansion from low dynamic range content.Computer Graphics Forum,37(2),37-49.
[3]Sen,P.,Kalantari,N.K.,Yaesoubi,M.,Darabi,S.,&Shechtman,E..(2012).Robust patch-based hdr reconstruction of dynamic scenes.ACM Transactions onGraphics,31(6).

Claims (9)

1. a dynamic scene HDR reconstruction method based on deep learning is characterized in that: comprises the following steps of (a) carrying out,
step 1: in the same static scene, three images of underexposure, normal exposure and overexposure with the same details and range are obtained by a tripod camera and recorded as S1, S2 and S3, the exposure time of the corresponding images is recorded, and the images are fused by a weighted fusion algorithm to obtain a Ground Truth which is recorded as T;
step 2, in a dynamic scene, acquiring three images of underexposure, normal exposure and overexposure by using a handheld camera, recording the three images as D1, D2 and D3, and replacing the D2 with the image S2 obtained in the step 1;
and step 3: registering D1, S2 and D3 by using an LK optical flow method, recording the registered image sequences as R1, R2 and R3, and forming a paired training set by the group Truth obtained in the step 1;
and 4, step 4: transforming R1, R2 and R3 into a linear domain using the camera corresponding curves, recording the transformed images as H1, H2 and H3;
and 5: extracting brightness information in H1, H2 and H3 images by using a contrast operator, and recording the obtained brightness images as M1, M2 and M3;
step 6: extracting detail information in the R1, R2 and R3 images by using a gradient operator, and recording the obtained detail images as L1, L2 and L3;
and 7: designing an Attention module based on Resnet;
and 8: constructing an HDR reconstruction network based on U-Net and ResNet, and designing a mixed structure loss function;
and step 9: merging the channels of the images R1, R2 and R3 obtained in the step 3 and the channels of the images H1, H2 and H3 obtained in the step 4 as the input of the step 8, merging the channels of the images M1, M2 and M3 obtained in the step 5 and the channels of the images L1, L2 and L3 obtained in the step 6 as the input of the Attention module constructed in the step 7, and taking the image T obtained in the step 1 as a label to train the network;
step 10: inputting the test image into the trained reconstruction network for the network model trained in the step 9 to obtain an HDR image;
step 11: and carrying out tone mapping on the generated HDR image by utilizing a Reinhard tone mapping algorithm, and displaying a reconstructed image on an 8bit display screen.
2. The deep learning based dynamic scene HDR reconstruction method as claimed in claim 1, wherein: the specific steps of the step 1 are as follows:
step 1-1: exposure adjustment was performed on the resulting images S1, S2, and S3, which are denoted as L1, L2, and L3 as shown in the following formula:
Figure FDA0002362545140000011
step 1-2: and (3) fusing the L1, the L2 and the L3 obtained in the step 1-1 according to a simple fusion algorithm to generate an HDR image as a group Truth, wherein a specific formula is as follows:
Figure FDA0002362545140000012
3. the deep learning based dynamic scene HDR reconstruction method as claimed in claim 1, wherein: the specific steps of the step 3 are as follows:
step 3-1: exposure adjustment is carried out on the three images D1, S2 and D3 obtained in the step 2, the exposure amount of S2 is adjusted to be the same as the exposure amount of D1 by using the exposure response curve of the camera, which is recorded as D2-1, and the exposure response curve of the camera is Ev-f (Bv, Sv), wherein E isVThe exposure of the image is determined by the exposure F and the exposure time T of the camera, and the calculation method is shown as the following formula:
Figure FDA0002362545140000021
the exposure of the camera is determined by the focal length f and the aperture diameter D of the camera, and the calculation method is shown as the following formula:
Figure FDA0002362545140000022
Bvis the brightness value of the image, i.e. the pixel value, SvIs the ISO photosensitivity coefficient of the camera, which is a constant, where the value is 100;
step 3-2: detecting characteristic points in D1 and D2-1 by using a Harris corner detection method;
step 3-3: calculating an optical flow vector between D1 and D2-1 by using an LK optical flow method;
step 3-4: aligning D1 and D2-1 by using a bicubic interpolation method and the optical flow vector obtained in the step 3-3;
step 3-5: repeating the step 3-1, adjusting the exposure amount of S2 to be the same as D3 by using the camera response curve, and recording as D2-3;
step 3-6: repeat steps 3-2, 3-3 and 3-4 above to align D3 with D2-3.
4. The deep learning based dynamic scene HDR reconstruction method as claimed in claim 1, wherein: step 4, converting the image from a linear domain to a non-linear domain by using a gamma curve, as shown in the following formula: f ═ xγWhere γ is 2, x is an LDR image, and f is an HDR domain image obtained after transformation.
5. The deep learning based dynamic scene HDR reconstruction method as claimed in claim 1, wherein: in the step 5, the contrast operator is used to extract the brightness information of the images H1, H2, and H3 obtained in the step 4, which is specifically shown as follows:
Figure FDA0002362545140000023
Figure FDA0002362545140000024
6. the method of claim 1 for deep learning based HDR reconstruction of dynamic scenes, wherein: the step 6 utilizes a gradient operator to extract detail information of the images R1, R2 and R3 obtained in the step 3, and the following formula is shown as follows:
Figure FDA0002362545140000025
7. the deep learning based dynamic scene HDR reconstruction method as claimed in claim 1, wherein: the specific steps of the step 7 are as follows:
step 7-1: combining the brightness characteristic maps M1, M2 and M3 obtained in the step 5 and the detail characteristic maps L1, L2 and L3 obtained in the step 6 as the input of the Attention module;
step 7-2: constructing an Attention module, and passing the input obtained in the step 7-1 through a Resnet module;
and 7-3: the output obtained in the step 7-2 passes through three layers of convolution layers, the size of a convolution kernel is 3 x 3, the convolution step size is 2, the used activation function is Relu, and the expression of the activation function is as follows: max (0, x);
and 7-4: and (4) outputting the characteristic diagram obtained in the step (7-3) as f _ A through a Sigmoid activation function, wherein the output characteristic diagram is specifically shown as the following formula:
Figure FDA0002362545140000031
8. the deep learning based dynamic scene HDR reconstruction method as claimed in claim 1, wherein: the specific steps of the step 8 are as follows:
step 8-1: constructing an encoding network, namely a down-sampling network, wherein the network consists of four layers of convolution blocks, and the structure of each convolution block comprises a convolution layer, a batch normalization BN layer and an activation function Relu layer;
step 8-2: merging, namely concat, the images H1, H2 and H3 obtained in the step 4 and the corresponding channels of the images R1, R2 and R3 obtained in the step 3 respectively as the input of a coding network, merging the output channels of the three encoders after down-sampling of two layers of rolling blocks respectively after 3 groups of images, and recording the output characteristic image as f _ U;
step 8-3: and performing dot multiplication on the output characteristic diagram of the step 8-2 and the output characteristic diagram of the step 7-4, wherein the output characteristic diagram is recorded as F: f — a · F _ U;
step 8-4: adding the output characteristic diagram obtained in the step 8-3 and the output characteristic diagram obtained in the step 8-2, and recording the obtained characteristic diagram as F _ R, wherein F _ R is F + F _ u;
and 8-5: constructing a fusion network, wherein the network consists of a residual block, and the input is the output characteristic diagram obtained in the step 8-4;
and 8-6: constructing a decoding network, namely an up-sampling network, wherein the network consists of four layers of convolution blocks and is symmetrical to a coding network, each convolution block has a BN layer, a Relu layer and a deconvolution layer, and jump connection is established between corresponding layers with the same size as the image size of the coding network;
and 8-7: the loss function of the network consists of two parts, including MSE loss and VGG loss, as follows: MSE loss calculation is carried out on an image obtained by carrying out tone mapping on a HDR image generated by a network
Figure FDA0002362545140000032
And the mean square error between the group Truth obtained in the step 1 is as follows:
Figure FDA0002362545140000033
the perceptual loss function is a VGG-16 network pre-trained on the ImageNet dataset and is denoted as φ, and is calculated as follows:
Figure FDA0002362545140000034
and 8-8: and (4) training the network by using the network input and the label obtained in the step (9) to complete the reconstruction process of the HDR.
9. The deep learning based dynamic scene HDR reconstruction method as claimed in claim 1, wherein: and 11, performing tone mapping on the test image obtained in the step 10 by adopting a Reinhard tone mapping algorithm.
CN202010026179.3A 2020-01-10 2020-01-10 Dynamic scene HDR reconstruction method based on deep learning Active CN111242883B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010026179.3A CN111242883B (en) 2020-01-10 2020-01-10 Dynamic scene HDR reconstruction method based on deep learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010026179.3A CN111242883B (en) 2020-01-10 2020-01-10 Dynamic scene HDR reconstruction method based on deep learning

Publications (2)

Publication Number Publication Date
CN111242883A true CN111242883A (en) 2020-06-05
CN111242883B CN111242883B (en) 2023-03-28

Family

ID=70872293

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010026179.3A Active CN111242883B (en) 2020-01-10 2020-01-10 Dynamic scene HDR reconstruction method based on deep learning

Country Status (1)

Country Link
CN (1) CN111242883B (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111709896A (en) * 2020-06-18 2020-09-25 三星电子(中国)研发中心 Method and equipment for mapping LDR video into HDR video
CN111986134A (en) * 2020-08-26 2020-11-24 中国空间技术研究院 Remote sensing imaging method and device for area-array camera
CN112435306A (en) * 2020-11-20 2021-03-02 上海北昂医药科技股份有限公司 G banding chromosome HDR image reconstruction method
CN113132655A (en) * 2021-03-09 2021-07-16 浙江工业大学 HDR video synthesis method based on deep learning
CN113379698A (en) * 2021-06-08 2021-09-10 武汉大学 Illumination estimation method based on step-by-step joint supervision
CN113971639A (en) * 2021-08-27 2022-01-25 天津大学 Depth estimation based under-exposed LDR image reconstruction HDR image
CN114189633A (en) * 2021-12-22 2022-03-15 北京紫光展锐通信技术有限公司 HDR image imaging method and device and electronic equipment
WO2023178610A1 (en) * 2022-03-24 2023-09-28 京东方科技集团股份有限公司 Image processing method, computing system, device and readable storage medium
WO2023246392A1 (en) * 2022-06-22 2023-12-28 京东方科技集团股份有限公司 Image acquisition method, apparatus and device, and non-transient computer storage medium
CN117745603A (en) * 2024-02-20 2024-03-22 湖南科洛德科技有限公司 Product image correction method and device based on linear array scanning device and storage medium
CN117876282A (en) * 2024-03-08 2024-04-12 昆明理工大学 High dynamic range imaging method based on multi-task interaction promotion

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120201456A1 (en) * 2009-10-08 2012-08-09 International Business Machines Corporation Transforming a digital image from a low dynamic range (ldr) image to a high dynamic range (hdr) image
CN108805836A (en) * 2018-05-31 2018-11-13 大连理工大学 Method for correcting image based on the reciprocating HDR transformation of depth
US20190096046A1 (en) * 2017-09-25 2019-03-28 The Regents Of The University Of California Generation of high dynamic range visual media

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120201456A1 (en) * 2009-10-08 2012-08-09 International Business Machines Corporation Transforming a digital image from a low dynamic range (ldr) image to a high dynamic range (hdr) image
US20190096046A1 (en) * 2017-09-25 2019-03-28 The Regents Of The University Of California Generation of high dynamic range visual media
CN108805836A (en) * 2018-05-31 2018-11-13 大连理工大学 Method for correcting image based on the reciprocating HDR transformation of depth

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
张淑芳等: "采用主成分分析与梯度金字塔的高动态范围图像生成方法", 《西安交通大学学报》 *
都琳等: "针对动态目标的高动态范围图像融合算法研究", 《光学学报》 *

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111709896A (en) * 2020-06-18 2020-09-25 三星电子(中国)研发中心 Method and equipment for mapping LDR video into HDR video
CN111709896B (en) * 2020-06-18 2023-04-07 三星电子(中国)研发中心 Method and equipment for mapping LDR video into HDR video
CN111986134B (en) * 2020-08-26 2023-11-24 中国空间技术研究院 Remote sensing imaging method and device for area-array camera
CN111986134A (en) * 2020-08-26 2020-11-24 中国空间技术研究院 Remote sensing imaging method and device for area-array camera
CN112435306A (en) * 2020-11-20 2021-03-02 上海北昂医药科技股份有限公司 G banding chromosome HDR image reconstruction method
CN113132655A (en) * 2021-03-09 2021-07-16 浙江工业大学 HDR video synthesis method based on deep learning
CN113379698A (en) * 2021-06-08 2021-09-10 武汉大学 Illumination estimation method based on step-by-step joint supervision
CN113379698B (en) * 2021-06-08 2022-07-05 武汉大学 Illumination estimation method based on step-by-step joint supervision
CN113971639A (en) * 2021-08-27 2022-01-25 天津大学 Depth estimation based under-exposed LDR image reconstruction HDR image
CN114189633A (en) * 2021-12-22 2022-03-15 北京紫光展锐通信技术有限公司 HDR image imaging method and device and electronic equipment
WO2023178610A1 (en) * 2022-03-24 2023-09-28 京东方科技集团股份有限公司 Image processing method, computing system, device and readable storage medium
WO2023246392A1 (en) * 2022-06-22 2023-12-28 京东方科技集团股份有限公司 Image acquisition method, apparatus and device, and non-transient computer storage medium
CN117745603A (en) * 2024-02-20 2024-03-22 湖南科洛德科技有限公司 Product image correction method and device based on linear array scanning device and storage medium
CN117876282A (en) * 2024-03-08 2024-04-12 昆明理工大学 High dynamic range imaging method based on multi-task interaction promotion
CN117876282B (en) * 2024-03-08 2024-05-14 昆明理工大学 High dynamic range imaging method based on multi-task interaction promotion

Also Published As

Publication number Publication date
CN111242883B (en) 2023-03-28

Similar Documents

Publication Publication Date Title
CN111242883B (en) Dynamic scene HDR reconstruction method based on deep learning
Fan et al. Integrating semantic segmentation and retinex model for low-light image enhancement
Lee et al. Deep chain hdri: Reconstructing a high dynamic range image from a single low dynamic range image
Cai et al. Learning a deep single image contrast enhancer from multi-exposure images
Zhou et al. Cross-view enhancement network for underwater images
Pan et al. Multi-exposure high dynamic range imaging with informative content enhanced network
CN113592726A (en) High dynamic range imaging method, device, electronic equipment and storage medium
Rasheed et al. LSR: Lightening super-resolution deep network for low-light image enhancement
Lv et al. Low-light image enhancement via deep Retinex decomposition and bilateral learning
Yin et al. Two exposure fusion using prior-aware generative adversarial network
Chen et al. End-to-end single image enhancement based on a dual network cascade model
Zhang et al. Multi-branch and progressive network for low-light image enhancement
CN115035011A (en) Low-illumination image enhancement method for self-adaptive RetinexNet under fusion strategy
Cao et al. A brightness-adaptive kernel prediction network for inverse tone mapping
Tan et al. High dynamic range imaging for dynamic scenes with large-scale motions and severe saturation
Chen et al. Improving dynamic hdr imaging with fusion transformer
CN117237207A (en) Ghost-free high dynamic range light field imaging method for dynamic scene
Tian et al. Deformable convolutional network constrained by contrastive learning for underwater image enhancement
Ye et al. Single exposure high dynamic range image reconstruction based on deep dual-branch network
Hu et al. High dynamic range imaging with short-and long-exposures based on artificial remapping using multiscale exposure fusion
Van Vo et al. High dynamic range video synthesis using superpixel-based illuminance-invariant motion estimation
Ma et al. Image Dehazing Based on Improved Color Channel Transfer and Multiexposure Fusion
Singh et al. Variational approach for intensity domain multi-exposure image fusion
Kinoshita et al. Deep inverse tone mapping using LDR based learning for estimating HDR images with absolute luminance
Yang et al. Multi-scale extreme exposure images fusion based on deep learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant