CN111242883A - Dynamic scene HDR reconstruction method based on deep learning - Google Patents
Dynamic scene HDR reconstruction method based on deep learning Download PDFInfo
- Publication number
- CN111242883A CN111242883A CN202010026179.3A CN202010026179A CN111242883A CN 111242883 A CN111242883 A CN 111242883A CN 202010026179 A CN202010026179 A CN 202010026179A CN 111242883 A CN111242883 A CN 111242883A
- Authority
- CN
- China
- Prior art keywords
- images
- image
- network
- hdr
- exposure
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 63
- 238000013135 deep learning Methods 0.000 title claims abstract description 21
- 230000003287 optical effect Effects 0.000 claims abstract description 14
- 238000012549 training Methods 0.000 claims abstract description 9
- 230000003068 static effect Effects 0.000 claims abstract description 5
- 230000001131 transforming effect Effects 0.000 claims abstract description 4
- 238000010586 diagram Methods 0.000 claims description 29
- 238000004422 calculation algorithm Methods 0.000 claims description 26
- 238000012360 testing method Methods 0.000 claims description 23
- 230000006870 function Effects 0.000 claims description 22
- 230000004927 fusion Effects 0.000 claims description 18
- 238000013507 mapping Methods 0.000 claims description 18
- 230000004913 activation Effects 0.000 claims description 12
- 238000005070 sampling Methods 0.000 claims description 11
- 238000004364 calculation method Methods 0.000 claims description 9
- 230000004044 response Effects 0.000 claims description 9
- 230000008569 process Effects 0.000 claims description 4
- 208000037170 Delayed Emergence from Anesthesia Diseases 0.000 claims description 3
- 206010034972 Photosensitivity reaction Diseases 0.000 claims description 3
- 238000001514 detection method Methods 0.000 claims description 3
- 238000010606 normalization Methods 0.000 claims description 3
- 230000036211 photosensitivity Effects 0.000 claims description 3
- 238000005096 rolling process Methods 0.000 claims description 3
- 230000009466 transformation Effects 0.000 claims description 3
- 230000000694 effects Effects 0.000 abstract description 7
- 238000012545 processing Methods 0.000 abstract description 5
- 238000004088 simulation Methods 0.000 description 20
- 238000007500 overflow downdraw method Methods 0.000 description 6
- 238000013527 convolutional neural network Methods 0.000 description 3
- 238000003384 imaging method Methods 0.000 description 3
- 230000033001 locomotion Effects 0.000 description 3
- 238000011084 recovery Methods 0.000 description 3
- 238000002474 experimental method Methods 0.000 description 2
- 238000005316 response function Methods 0.000 description 2
- 230000002457 bidirectional effect Effects 0.000 description 1
- 238000000354 decomposition reaction Methods 0.000 description 1
- 238000013136 deep learning model Methods 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 125000001475 halogen functional group Chemical group 0.000 description 1
- 238000005286 illumination Methods 0.000 description 1
- 230000005855 radiation Effects 0.000 description 1
- 238000009877 rendering Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 229920006395 saturated elastomer Polymers 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T5/00—Image enhancement or restoration
- G06T5/50—Image enhancement or restoration using two or more images, e.g. averaging or subtraction
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/30—Determination of transform parameters for the alignment of images, i.e. image registration
- G06T7/33—Determination of transform parameters for the alignment of images, i.e. image registration using feature-based methods
- G06T7/337—Determination of transform parameters for the alignment of images, i.e. image registration using feature-based methods involving reference images or patches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/97—Determining parameters from multiple pictures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20212—Image combination
- G06T2207/20221—Image fusion; Image merging
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Image Processing (AREA)
Abstract
The invention discloses a dynamic scene HDR reconstruction method based on deep learning, which solves the problem that the image processing effect needs to be improved in the prior art. The method comprises the following steps of acquiring three images of underexposure, normal exposure and overexposure in the same static scene by using a fixed camera; in a dynamic scene, the three images are acquired by a handheld camera and are marked as D1, D2 and D3; registering D1, S2 and D3 by using an LK optical flow method, and recording image sequences as a training set of pairs consisting of R1, R2 and R3 and the group Truth obtained in the step 1; transforming R1, R2 and R3 into linear domains, denoted as H1, H2 and H3, using the corresponding curves of the camera; extracting brightness information in H1, H2 and H3 images by using a contrast operator, and recording the brightness information as M1, M2 and M3; extracting detail information in the images of R1, R2 and R3 by using a gradient operator, and recording the detail information as L1, L2 and L3; the orientation module based on Resnet is designed. The HDR image generated by the technology has rich details, high contrast and wide color gamut and high dynamic range.
Description
Technical Field
The invention relates to the field of digital video and computer photographic image processing, in particular to a dynamic scene HDR reconstruction method based on deep learning.
Background
Dynamic range refers to the ratio of the luminance maximum to the luminance minimum in a scene. In a real scene, the dynamic range can reach 10^8 from the brightest sunlight to the darkest starlight, and the brightness range which can be distinguished by human eyes is also as high as 10^ 5. However, the dynamic range captured by the common sensor does not exceed 10^3, and the dynamic range of the display is only 10^ 2. Due to the fact that the dynamic ranges of a real scene and a common digital device are not matched with each other, the problems of overexposure, underexposure, loss of detail information and the like of an image captured by an imaging device usually occur. In practical applications, HDR images are difficult to obtain, reducing the effectiveness of the images in applications such as digital television, computer photography, and game rendering. Therefore, the method has stronger practical significance for the research of the HDR image reconstruction algorithm.
Currently, HDR image reconstruction algorithms are mainly divided into two directions of traditional image fusion and deep learning. The method for acquiring the HDR image through the traditional image fusion mainly comprises a direct fusion method, a block fusion method and a layered fusion method. The direct fusion method mainly comprises the following steps of converting High Dynamic Range radiation Maps from photos, providing that the brightness values and the exposure time of images with different exposure degrees are related to the illuminance of pixel points at corresponding positions, establishing a corresponding curve model of the camera according to the brightness values and the exposure time, solving a response function of the camera, and obtaining the illuminance value of a real scene through inverse operation. And fusing the plurality of images into a high dynamic range image after obtaining the illumination numerical value of the real scene, and finally displaying the high dynamic range image on a common display screen through tone mapping. The method is computationally complex and the resulting high dynamic range image cannot be directly displayed on a common display screen. The fusion method based on the region is to divide blocks along with the image and adopt the information entropy theory. And selecting the blocks containing large amount of information for fusion. However, the method has poor effect on processing the block boundary, and the fused image is easy to generate obvious block effect. The method based on hierarchical Fusion mainly comprises Exposure Fusion, the thesis provides a method based on Laplace pyramid Fusion, a plurality of multi-Exposure images are subjected to scale decomposition, then three evaluation indexes of contrast, saturation and Exposure are synthesized to obtain a weight graph of each image, a comprehensive pyramid coefficient is obtained after the weight graph is subjected to average weighting, and finally the Laplace pyramid is reconstructed to obtain a Fusion image. This method is currently the most efficient fusion method. However, this method has a certain disadvantage that the detail information of the image is seriously lost in the too bright and too dark areas.
The method for acquiring the high dynamic range image based on Deep learning mainly comprises the following methods, the paper HDR image reconstruction From a Single exposed use Deep CNNs proposes to use an automatic encoder to take a Single LDR image as input, firstly perform down-sampling to extract features, and then perform up-sampling to reconstruct the HDR image. A paper debug Patch-Based HDR Reconnection of Dynamic Scenes utilizes a group of images with different exposures, including N overexposed images, N underexposed images and a normally exposed image, the method proposed by the paper firstly utilizes a camera response function to adjust the overexposed and underexposed images to ensure that the overexposed and underexposed images have the same exposure amount as the normally exposed image, then utilizes MBDS (scanning visual data using bidirectional exposure) to select two images which are closest to the content of the normally exposed image from the underexposed and overexposed images after the exposure amount is changed, and converts the normally exposed image and the two selected images into 10 bits to be fused, thereby obtaining a final high Dynamic range image. The image contrast obtained by the algorithm is obviously improved, the details of highlight and dark parts are effectively enhanced, but when micro-motion exists in a multi-frame image or a camera shakes, the high dynamic range image obtained by the algorithm has a ghost phenomenon.
Disclosure of Invention
The invention overcomes the problem that the image processing effect needs to be improved in the prior art, and provides the dynamic scene HDR reconstruction method based on deep learning, which is rich in details and high in definition.
The technical solution of the present invention is to provide a dynamic scene HDR reconstruction method based on deep learning, which comprises the following steps: comprises the following steps of (a) carrying out,
step 1: in the same static scene, three images of underexposure, normal exposure and overexposure with the same details and range are obtained by a tripod camera and recorded as S1, S2 and S3, the exposure time of the corresponding images is recorded, and the images are fused by a weighted fusion algorithm to obtain a Ground Truth which is recorded as T;
step 2, in a dynamic scene, acquiring three images of underexposure, normal exposure and overexposure by using a handheld camera, recording the three images as D1, D2 and D3, and replacing the D2 with the image S2 obtained in the step 1;
and step 3: registering D1, S2 and D3 by using an LK optical flow method, recording the registered image sequences as R1, R2 and R3, and forming a paired training set by the group Truth obtained in the step 1;
and 4, step 4: transforming R1, R2 and R3 into a linear domain using the camera corresponding curves, recording the transformed images as H1, H2 and H3;
and 5: extracting brightness information in H1, H2 and H3 images by using a contrast operator, and recording the obtained brightness images as M1, M2 and M3;
step 6: extracting detail information in the R1, R2 and R3 images by using a gradient operator, and recording the obtained detail images as L1, L2 and L3;
and 7: designing an Attention module based on Resnet;
and 8: constructing an HDR reconstruction network based on U-Net and ResNet, and designing a mixed structure loss function;
and step 9: merging the channels of the images R1, R2 and R3 obtained in the step 3 and the channels of the images H1, H2 and H3 obtained in the step 4 as the input of the step 8, merging the channels of the images M1, M2 and M3 obtained in the step 5 and the channels of the images L1, L2 and L3 obtained in the step 6 as the input of the Attention module constructed in the step 7, and taking the image T obtained in the step 1 as a label to train the network;
step 10: inputting the test image into the trained reconstruction network for the network model trained in the step 9 to obtain an HDR image;
step 11: and carrying out tone mapping on the generated HDR image by utilizing a Reinhard tone mapping algorithm, and displaying a reconstructed image on an 8bit display screen.
Preferably, the specific steps of step 1 are:
step 1-1: exposure adjustment was performed on the resulting images S1, S2, and S3, which are denoted as L1, L2, and L3 as shown in the following formula:
step 1-2: and (3) fusing the L1, the L2 and the L3 obtained in the step 1-1 according to a simple fusion algorithm to generate an HDR image as a group Truth, wherein a specific formula is as follows:
preferably, the specific steps of step 3 are:
step 3-1: exposure adjustment is carried out on the three images D1, S2 and D3 obtained in the step 2, the exposure amount of S2 is adjusted to be the same as the exposure amount of D1 by using the exposure response curve of the camera, which is recorded as D2-1, and the exposure response curve of the camera is Ev-f (Bv, Sv), wherein E isVThe exposure of the image is determined by the exposure F and the exposure time T of the camera, and the calculation method is shown as the following formula:the exposure of the camera is determined by the focal length f and the aperture diameter D of the camera, and the calculation method is shown as the following formula:Bvis the brightness value of the image, i.e. the pixel value, SvIs the ISO photosensitivity coefficient of the camera, which is a constant, where the value is 100;
step 3-2: detecting characteristic points in D1 and D2-1 by using a Harris corner detection method;
step 3-3: calculating an optical flow vector between D1 and D2-1 by using an LK optical flow method;
step 3-4: aligning D1 and D2-1 by using a bicubic interpolation method and the optical flow vector obtained in the step 3-3;
step 3-5: repeating the step 3-1, adjusting the exposure amount of S2 to be the same as D3 by using the camera response curve, and recording as D2-3;
step 3-6: repeat steps 3-2, 3-3 and 3-4 above to align D3 with D2-3.
Preferably, the step 4 uses a gamma curve to convert the image from a linear domain to a non-linear domain, as shown in the following formula: f ═ xγWhere γ is 2, x is an LDR image, and f is an HDR domain image obtained after transformation.
Preferably, in the step 5, the luminance information of the images H1, H2 and H3 obtained in the step 4 is extracted by using a contrast operator, which is specifically expressed as follows:
preferably, the step 6 uses a gradient operator to extract detail information of the images R1, R2 and R3 obtained in the step 3, as shown in the following formula:
preferably, the specific steps of step 7 are:
step 7-1: combining the brightness characteristic maps M1, M2 and M3 obtained in the step 5 and the detail characteristic maps L1, L2 and L3 obtained in the step 6 as the input of the Attention module;
step 7-2: constructing an Attention module, and passing the input obtained in the step 7-1 through a Resnet module;
and 7-3: the output obtained in the step 7-2 passes through three layers of convolution layers, the size of a convolution kernel is 3 x 3, the convolution step size is 2, the used activation function is Relu, and the expression of the activation function is as follows: max (0, x);
and 7-4: and (4) outputting the characteristic diagram obtained in the step (7-3) as f _ A through a Sigmoid activation function, wherein the output characteristic diagram is specifically shown as the following formula:
preferably, the specific steps of step 8 are:
step 8-1: constructing an encoding network, namely a down-sampling network, wherein the network consists of four layers of convolution blocks, and the structure of each convolution block comprises a convolution layer, a batch normalization BN layer and an activation function Relu layer;
step 8-2: merging, namely concat, the images H1, H2 and H3 obtained in the step 4 and the corresponding channels of the images R1, R2 and R3 obtained in the step 3 respectively as the input of a coding network, merging the output channels of the three encoders after down-sampling of two layers of rolling blocks respectively after 3 groups of images, and recording the output characteristic image as f _ U;
step 8-3: and performing dot multiplication on the output characteristic diagram of the step 8-2 and the output characteristic diagram of the step 7-4, wherein the output characteristic diagram is recorded as F: f — a · F _ U;
step 8-4: adding the output characteristic diagram obtained in the step 8-3 and the output characteristic diagram obtained in the step 8-2, and recording the obtained characteristic diagram as F _ R, wherein F _ R is F + F _ u;
and 8-5: constructing a fusion network, wherein the network consists of a residual block, and the input is the output characteristic diagram obtained in the step 8-4;
and 8-6: constructing a decoding network, namely an up-sampling network, wherein the network consists of four layers of convolution blocks and is symmetrical to a coding network, each convolution block has a BN layer, a Relu layer and a deconvolution layer, and jump connection is established between corresponding layers with the same size as the image size of the coding network;
and 8-7: the loss function of the network consists of two parts, including MSE loss and VGG loss, as follows: MSE loss calculation is carried out on an image obtained by carrying out tone mapping on a HDR image generated by a networkAnd the mean square error between the groups Truth obtained in the step 1 is as follows:the perceptual loss function is a VGG-16 network pre-trained on the ImageNet dataset and is denoted as φ, and is calculated as follows:
and 8-8: and (4) training the network by using the network input and the label obtained in the step (9) to complete the reconstruction process of the HDR.
Preferably, step 11 performs tone mapping on the test image obtained in step 10 by using a Reinhard tone mapping algorithm.
Compared with the prior art, the dynamic scene HDR reconstruction method based on deep learning has the following advantages: the method can be used for reconstructing a Low Dynamic Range (LDR) image containing a small moving object to obtain a High Dynamic Range (HDR) image, firstly, the images are registered by using an optical flow method for ghost and halo phenomena occurring when a traditional method based on image fusion and the existing method based on deep learning is used for processing a dynamic scene, meanwhile, brightness and detail information in the low dynamic range image are extracted, a deep learning model based on U-Net and ResNet is constructed, and the extracted brightness and detail information are used for assisting in training the model, so that the HDR image generated after multi-exposure image fusion contains rich details and higher contrast. Aiming at the problems of highlight and dark part detail loss of the existing HDR reconstruction algorithm, a mixed structure loss function is designed to ensure detail reconstruction, so that the aim of the invention is fulfilled.
The HDR reconstruction algorithm is combined with a dynamic scene and a static scene to produce an available data set and a real HDR image on the premise of not depending on hardware equipment; designing a CNN network based on a U-ResNet frame, and fusing multiple frames of LDR images with different exposures by utilizing deep learning to reconstruct an HDR image; and meanwhile, designing an attention module, extracting the detail and brightness information of the LDR image through a traditional image algorithm, and taking the detail and brightness information as the input of the attention module to assist in training the reconstruction network. The detail and brightness information of the image are improved through a designed algorithm, and meanwhile, the dynamic range of the image is expanded. The method can process images with larger motion in a scene and images with more saturated areas, and the generated HDR images have rich details, high contrast, wide color gamut and high dynamic range.
Drawings
FIG. 1 is a schematic diagram of the network architecture of the present invention;
FIG. 2 is a schematic diagram of a network structure of the Attention module in the present invention;
FIG. 3 is an LDR image with an exposure of-2 EV according to a first simulation test of the present invention;
FIG. 4 is an LDR image with an exposure of 0EV according to a first simulation test of the present invention;
FIG. 5 is an LDR image with an exposure of +2EV according to a first simulation test of the present invention;
FIG. 6 is a HDR image tone mapped using Deep-HDR method according to the first simulation test of the present invention;
FIG. 7 is a tone mapped HDR image using the Expand-HDR method of the present invention simulation test one;
FIG. 8 is a tone mapped HDR image using the Sen method in simulation test one of the present invention;
FIG. 9 is a tone mapped HDR image using the method of the present invention in a simulation test of the present invention;
FIG. 10 is an image of a group Truth in simulation test one of the present invention;
FIG. 11 is an LDR image with an exposure of-2 EV according to a second simulation test of the present invention;
FIG. 12 is an LDR image with an exposure of 0EV according to a second simulation test of the present invention;
FIG. 13 is an LDR image with an exposure of +2EV according to a second simulation test of the present invention;
FIG. 14 is a HDR image tone mapped using Deep-HDR method according to the first simulation test of the present invention;
FIG. 15 is a tone mapped HDR image using the Expand-HDR method of the present invention simulation test one;
FIG. 16 is a tone mapped HDR image using the Sen method in simulation test one of the present invention;
FIG. 17 is a tone mapped HDR image using the method of the present invention in a simulation test of the present invention.
FIG. 18 is an image of a group Truth in simulation test one of the present invention;
wherein Deep-hdr refers to the method proposed in the paper Deep high dynamic range imaging with large for the estimated movements;
the Expand-hdr refers to the method proposed in the article Expand connected network for high dynamic range expansion from low dynamic range content computer graphics Forum;
sen refers to the method proposed in the paper Robust batch-based hdr recovery of dynamic scenes;
our refers to the methods set forth herein;
the group Truth refers to a real image.
Detailed Description
The deep learning based dynamic scene HDR reconstruction method of the present invention is further described with reference to the accompanying drawings and the detailed description below: as shown in the figure, the present embodiment includes the following steps,
step 1: in the same static scene, three images of underexposure, normal exposure and overexposure with the same details and range are obtained by a tripod camera and recorded as S1, S2 and S3, the exposure time of the corresponding images is recorded, and the images are fused by a weighted fusion algorithm to obtain a Ground Truth which is recorded as T;
step 2, in a dynamic scene, acquiring three images of underexposure, normal exposure and overexposure by using a handheld camera, recording the three images as D1, D2 and D3, and replacing the D2 with the image S2 obtained in the step 1;
and step 3: registering D1, S2 and D3 by using an LK optical flow method, recording the registered image sequences as R1, R2 and R3, and forming a paired training set by the group Truth obtained in the step 1;
and 4, step 4: transforming R1, R2 and R3 into a linear domain using the camera corresponding curves, recording the transformed images as H1, H2 and H3;
and 5: extracting brightness information in H1, H2 and H3 images by using a contrast operator, and recording the obtained brightness images as M1, M2 and M3;
step 6: extracting detail information in the R1, R2 and R3 images by using a gradient operator, and recording the obtained detail images as L1, L2 and L3;
and 7: designing an Attention module based on Resnet;
and 8: constructing an HDR reconstruction network based on U-Net and ResNet, and designing a mixed structure loss function;
and step 9: merging the channels of the images R1, R2 and R3 obtained in the step 3 and the channels of the images H1, H2 and H3 obtained in the step 4 as the input of the step 8, merging the channels of the images M1, M2 and M3 obtained in the step 5 and the channels of the images L1, L2 and L3 obtained in the step 6 as the input of the Attention module constructed in the step 7, and taking the image T obtained in the step 1 as a label to train the network;
step 10: inputting the test image into the trained reconstruction network for the network model trained in the step 9 to obtain an HDR image;
step 11: and carrying out tone mapping on the generated HDR image by utilizing a Reinhard tone mapping algorithm, and displaying a reconstructed image on an 8bit display screen.
The specific steps of the step 1 are as follows:
step 1-1: exposure adjustment was performed on the resulting images S1, S2, and S3, which are denoted as L1, L2, and L3 as shown in the following formula:
step 1-2: and (3) fusing the L1, the L2 and the L3 obtained in the step 1-1 according to a simple fusion algorithm to generate an HDR image as a group Truth, wherein a specific formula is as follows:
the specific steps of the step 3 are as follows:
step 3-1: exposure adjustment is performed on the three images D1, S2, and D3 obtained in step 2, and a camera is usedThe exposure response curve of (a) adjusts the exposure amount of S2 to be the same as the exposure amount of D1, and is denoted as D2-1, and the exposure response curve of the camera is Ev-f (Bv, Sv), where E isVThe exposure of the image is determined by the exposure F and the exposure time T of the camera, and the calculation method is shown as the following formula:the exposure of the camera is determined by the focal length f and the aperture diameter D of the camera, and the calculation method is shown as the following formula:Bvis the brightness value of the image, i.e. the pixel value, SvIs the ISO photosensitivity coefficient of the camera, which is a constant, where the value is 100;
step 3-2: detecting characteristic points in D1 and D2-1 by using a Harris corner detection method;
step 3-3: calculating an optical flow vector between D1 and D2-1 by using an LK optical flow method;
step 3-4: aligning D1 and D2-1 by using a bicubic interpolation method and the optical flow vector obtained in the step 3-3;
step 3-5: repeating the step 3-1, adjusting the exposure amount of S2 to be the same as D3 by using the camera response curve, and recording as D2-3;
step 3-6: repeat steps 3-2, 3-3 and 3-4 above to align D3 with D2-3.
Step 4, converting the image from a linear domain to a non-linear domain by using a gamma curve, as shown in the following formula: f ═ xγWhere γ is 2, x is an LDR image, and f is an HDR domain image obtained after transformation.
In the step 5, the contrast operator is used to extract the brightness information of the images H1, H2, and H3 obtained in the step 4, which is specifically shown as follows:
the step 6 utilizes a gradient operator to extract detail information of the images R1, R2 and R3 obtained in the step 3, and the following formula is shown as follows:
the specific steps of the step 7 are as follows:
step 7-1: combining the brightness characteristic maps M1, M2 and M3 obtained in the step 5 and the detail characteristic maps L1, L2 and L3 obtained in the step 6 as the input of the Attention module;
step 7-2: constructing an Attention module, and passing the input obtained in the step 7-1 through a Resnet module;
and 7-3: the output obtained in the step 7-2 passes through three layers of convolution layers, the size of a convolution kernel is 3 x 3, the convolution step size is 2, the used activation function is Relu, and the expression of the activation function is as follows: max (0, x);
and 7-4: and (4) outputting the characteristic diagram obtained in the step (7-3) as f _ A through a Sigmoid activation function, wherein the output characteristic diagram is specifically shown as the following formula:
the specific steps of the step 8 are as follows:
step 8-1: constructing an encoding network, namely a down-sampling network, wherein the network consists of four layers of convolution blocks, and the structure of each convolution block comprises a convolution layer, a batch normalization BN layer and an activation function Relu layer;
step 8-2: merging, namely concat, the images H1, H2 and H3 obtained in the step 4 and the corresponding channels of the images R1, R2 and R3 obtained in the step 3 respectively as the input of a coding network, merging the output channels of the three encoders after down-sampling of two layers of rolling blocks respectively after 3 groups of images, and recording the output characteristic image as f _ U;
step 8-3: and performing dot multiplication on the output characteristic diagram of the step 8-2 and the output characteristic diagram of the step 7-4, wherein the output characteristic diagram is recorded as F: f — a · F _ U;
step 8-4: adding the output characteristic diagram obtained in the step 8-3 and the output characteristic diagram obtained in the step 8-2, and recording the obtained characteristic diagram as F _ R, wherein F _ R is F + F _ u;
and 8-5: constructing a fusion network, wherein the network consists of a residual block, and the input is the output characteristic diagram obtained in the step 8-4;
and 8-6: constructing a decoding network, namely an up-sampling network, wherein the network consists of four layers of convolution blocks and is symmetrical to a coding network, each convolution block has a BN layer, a Relu layer and a deconvolution layer, and jump connection is established between corresponding layers with the same size as the image size of the coding network;
and 8-7: the loss function of the network consists of two parts, including MSE loss and VGG loss, as follows: MSE loss calculation is carried out on an image obtained by carrying out tone mapping on a HDR image generated by a networkAnd the mean square error between the groups Truth obtained in the step 1 is as follows:the perceptual loss function is a VGG-16 network pre-trained on the ImageNet dataset and is denoted as φ, and is calculated as follows:
and 8-8: and (4) training the network by using the network input and the label obtained in the step (9) to complete the reconstruction process of the HDR.
And 11, performing tone mapping on the test image obtained in the step 10 by adopting a Reinhard tone mapping algorithm.
The effect of the present invention is further described below with the simulation experiment:
1. simulation experiment conditions are as follows:
the hardware environment of the invention simulation is as follows: intel Core (TM) i5-4570 CPU @3.20GHz x 8, GPU NVIDIAGeForce GTX 10808G run memory; software environment: ubuntu16.04, python 3.6; experiment framework: tensorflow.
2. Simulation and instance content and result analysis
The invention selects a test set in a public HDR data set as an experimental sample, inputs the experimental sample into a trained network for experiment, obtains the contrast effect of the image after a tone mapping algorithm, inputs the image into LDR images with different exposures in three dynamic scenes, and respectively outputs the exposure as-2 EV, 0EV and +2EV, if a first group of images are shown in figures 3-5 and a second group of images are shown in figures 11-13, the HDR image after tone mapping is output:
the green boxes in the two sets of comparison result figures indicate that our algorithm recovers better than the existing algorithm at the details of the highlights and the darks. As can be seen from the first set of comparison results shown in FIGS. 6-10, the spill-HDR, Expand-HDR and Sen methods all have the overflow phenomenon in the marked highlight region, and the details of the highlight region are completely recovered by our algorithm, so as to achieve the same effect as the real image. From the second set of comparison results, fig. 14-18, it can be seen that the existing algorithm is wrong in the recovery at highlights, especially the Sen method, and the details of the recovery of our method are closest to the real scene. Meanwhile, compared with the existing algorithm, the objective index PSNR is averagely improved by 0.1 dB.
References in the practice of the invention:
[1]Wu,Shangzhe,Xu,Jiarui,Tai,Yu-Wing,&Tang,Chi-Keung..Deep highdynamic range imaging with large foreground motions.
[2]Marnerides,D.,Bashford-Rogers,T.,Hatchett,J.,&Debattista,K..Expandnet:a deep convolutional neural network for high dynamic rangeexpansion from low dynamic range content.Computer Graphics Forum,37(2),37-49.
[3]Sen,P.,Kalantari,N.K.,Yaesoubi,M.,Darabi,S.,&Shechtman,E..(2012).Robust patch-based hdr reconstruction of dynamic scenes.ACM Transactions onGraphics,31(6).
Claims (9)
1. a dynamic scene HDR reconstruction method based on deep learning is characterized in that: comprises the following steps of (a) carrying out,
step 1: in the same static scene, three images of underexposure, normal exposure and overexposure with the same details and range are obtained by a tripod camera and recorded as S1, S2 and S3, the exposure time of the corresponding images is recorded, and the images are fused by a weighted fusion algorithm to obtain a Ground Truth which is recorded as T;
step 2, in a dynamic scene, acquiring three images of underexposure, normal exposure and overexposure by using a handheld camera, recording the three images as D1, D2 and D3, and replacing the D2 with the image S2 obtained in the step 1;
and step 3: registering D1, S2 and D3 by using an LK optical flow method, recording the registered image sequences as R1, R2 and R3, and forming a paired training set by the group Truth obtained in the step 1;
and 4, step 4: transforming R1, R2 and R3 into a linear domain using the camera corresponding curves, recording the transformed images as H1, H2 and H3;
and 5: extracting brightness information in H1, H2 and H3 images by using a contrast operator, and recording the obtained brightness images as M1, M2 and M3;
step 6: extracting detail information in the R1, R2 and R3 images by using a gradient operator, and recording the obtained detail images as L1, L2 and L3;
and 7: designing an Attention module based on Resnet;
and 8: constructing an HDR reconstruction network based on U-Net and ResNet, and designing a mixed structure loss function;
and step 9: merging the channels of the images R1, R2 and R3 obtained in the step 3 and the channels of the images H1, H2 and H3 obtained in the step 4 as the input of the step 8, merging the channels of the images M1, M2 and M3 obtained in the step 5 and the channels of the images L1, L2 and L3 obtained in the step 6 as the input of the Attention module constructed in the step 7, and taking the image T obtained in the step 1 as a label to train the network;
step 10: inputting the test image into the trained reconstruction network for the network model trained in the step 9 to obtain an HDR image;
step 11: and carrying out tone mapping on the generated HDR image by utilizing a Reinhard tone mapping algorithm, and displaying a reconstructed image on an 8bit display screen.
2. The deep learning based dynamic scene HDR reconstruction method as claimed in claim 1, wherein: the specific steps of the step 1 are as follows:
step 1-1: exposure adjustment was performed on the resulting images S1, S2, and S3, which are denoted as L1, L2, and L3 as shown in the following formula:
3. the deep learning based dynamic scene HDR reconstruction method as claimed in claim 1, wherein: the specific steps of the step 3 are as follows:
step 3-1: exposure adjustment is carried out on the three images D1, S2 and D3 obtained in the step 2, the exposure amount of S2 is adjusted to be the same as the exposure amount of D1 by using the exposure response curve of the camera, which is recorded as D2-1, and the exposure response curve of the camera is Ev-f (Bv, Sv), wherein E isVThe exposure of the image is determined by the exposure F and the exposure time T of the camera, and the calculation method is shown as the following formula:the exposure of the camera is determined by the focal length f and the aperture diameter D of the camera, and the calculation method is shown as the following formula:Bvis the brightness value of the image, i.e. the pixel value, SvIs the ISO photosensitivity coefficient of the camera, which is a constant, where the value is 100;
step 3-2: detecting characteristic points in D1 and D2-1 by using a Harris corner detection method;
step 3-3: calculating an optical flow vector between D1 and D2-1 by using an LK optical flow method;
step 3-4: aligning D1 and D2-1 by using a bicubic interpolation method and the optical flow vector obtained in the step 3-3;
step 3-5: repeating the step 3-1, adjusting the exposure amount of S2 to be the same as D3 by using the camera response curve, and recording as D2-3;
step 3-6: repeat steps 3-2, 3-3 and 3-4 above to align D3 with D2-3.
4. The deep learning based dynamic scene HDR reconstruction method as claimed in claim 1, wherein: step 4, converting the image from a linear domain to a non-linear domain by using a gamma curve, as shown in the following formula: f ═ xγWhere γ is 2, x is an LDR image, and f is an HDR domain image obtained after transformation.
7. the deep learning based dynamic scene HDR reconstruction method as claimed in claim 1, wherein: the specific steps of the step 7 are as follows:
step 7-1: combining the brightness characteristic maps M1, M2 and M3 obtained in the step 5 and the detail characteristic maps L1, L2 and L3 obtained in the step 6 as the input of the Attention module;
step 7-2: constructing an Attention module, and passing the input obtained in the step 7-1 through a Resnet module;
and 7-3: the output obtained in the step 7-2 passes through three layers of convolution layers, the size of a convolution kernel is 3 x 3, the convolution step size is 2, the used activation function is Relu, and the expression of the activation function is as follows: max (0, x);
8. the deep learning based dynamic scene HDR reconstruction method as claimed in claim 1, wherein: the specific steps of the step 8 are as follows:
step 8-1: constructing an encoding network, namely a down-sampling network, wherein the network consists of four layers of convolution blocks, and the structure of each convolution block comprises a convolution layer, a batch normalization BN layer and an activation function Relu layer;
step 8-2: merging, namely concat, the images H1, H2 and H3 obtained in the step 4 and the corresponding channels of the images R1, R2 and R3 obtained in the step 3 respectively as the input of a coding network, merging the output channels of the three encoders after down-sampling of two layers of rolling blocks respectively after 3 groups of images, and recording the output characteristic image as f _ U;
step 8-3: and performing dot multiplication on the output characteristic diagram of the step 8-2 and the output characteristic diagram of the step 7-4, wherein the output characteristic diagram is recorded as F: f — a · F _ U;
step 8-4: adding the output characteristic diagram obtained in the step 8-3 and the output characteristic diagram obtained in the step 8-2, and recording the obtained characteristic diagram as F _ R, wherein F _ R is F + F _ u;
and 8-5: constructing a fusion network, wherein the network consists of a residual block, and the input is the output characteristic diagram obtained in the step 8-4;
and 8-6: constructing a decoding network, namely an up-sampling network, wherein the network consists of four layers of convolution blocks and is symmetrical to a coding network, each convolution block has a BN layer, a Relu layer and a deconvolution layer, and jump connection is established between corresponding layers with the same size as the image size of the coding network;
and 8-7: the loss function of the network consists of two parts, including MSE loss and VGG loss, as follows: MSE loss calculation is carried out on an image obtained by carrying out tone mapping on a HDR image generated by a networkAnd the mean square error between the group Truth obtained in the step 1 is as follows:the perceptual loss function is a VGG-16 network pre-trained on the ImageNet dataset and is denoted as φ, and is calculated as follows:
and 8-8: and (4) training the network by using the network input and the label obtained in the step (9) to complete the reconstruction process of the HDR.
9. The deep learning based dynamic scene HDR reconstruction method as claimed in claim 1, wherein: and 11, performing tone mapping on the test image obtained in the step 10 by adopting a Reinhard tone mapping algorithm.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010026179.3A CN111242883B (en) | 2020-01-10 | 2020-01-10 | Dynamic scene HDR reconstruction method based on deep learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010026179.3A CN111242883B (en) | 2020-01-10 | 2020-01-10 | Dynamic scene HDR reconstruction method based on deep learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111242883A true CN111242883A (en) | 2020-06-05 |
CN111242883B CN111242883B (en) | 2023-03-28 |
Family
ID=70872293
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010026179.3A Active CN111242883B (en) | 2020-01-10 | 2020-01-10 | Dynamic scene HDR reconstruction method based on deep learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111242883B (en) |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111709896A (en) * | 2020-06-18 | 2020-09-25 | 三星电子(中国)研发中心 | Method and equipment for mapping LDR video into HDR video |
CN111986134A (en) * | 2020-08-26 | 2020-11-24 | 中国空间技术研究院 | Remote sensing imaging method and device for area-array camera |
CN112435306A (en) * | 2020-11-20 | 2021-03-02 | 上海北昂医药科技股份有限公司 | G banding chromosome HDR image reconstruction method |
CN113132655A (en) * | 2021-03-09 | 2021-07-16 | 浙江工业大学 | HDR video synthesis method based on deep learning |
CN113379698A (en) * | 2021-06-08 | 2021-09-10 | 武汉大学 | Illumination estimation method based on step-by-step joint supervision |
CN113971639A (en) * | 2021-08-27 | 2022-01-25 | 天津大学 | Depth estimation based under-exposed LDR image reconstruction HDR image |
CN114189633A (en) * | 2021-12-22 | 2022-03-15 | 北京紫光展锐通信技术有限公司 | HDR image imaging method and device and electronic equipment |
WO2023178610A1 (en) * | 2022-03-24 | 2023-09-28 | 京东方科技集团股份有限公司 | Image processing method, computing system, device and readable storage medium |
WO2023246392A1 (en) * | 2022-06-22 | 2023-12-28 | 京东方科技集团股份有限公司 | Image acquisition method, apparatus and device, and non-transient computer storage medium |
CN117745603A (en) * | 2024-02-20 | 2024-03-22 | 湖南科洛德科技有限公司 | Product image correction method and device based on linear array scanning device and storage medium |
CN117876282A (en) * | 2024-03-08 | 2024-04-12 | 昆明理工大学 | High dynamic range imaging method based on multi-task interaction promotion |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20120201456A1 (en) * | 2009-10-08 | 2012-08-09 | International Business Machines Corporation | Transforming a digital image from a low dynamic range (ldr) image to a high dynamic range (hdr) image |
CN108805836A (en) * | 2018-05-31 | 2018-11-13 | 大连理工大学 | Method for correcting image based on the reciprocating HDR transformation of depth |
US20190096046A1 (en) * | 2017-09-25 | 2019-03-28 | The Regents Of The University Of California | Generation of high dynamic range visual media |
-
2020
- 2020-01-10 CN CN202010026179.3A patent/CN111242883B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20120201456A1 (en) * | 2009-10-08 | 2012-08-09 | International Business Machines Corporation | Transforming a digital image from a low dynamic range (ldr) image to a high dynamic range (hdr) image |
US20190096046A1 (en) * | 2017-09-25 | 2019-03-28 | The Regents Of The University Of California | Generation of high dynamic range visual media |
CN108805836A (en) * | 2018-05-31 | 2018-11-13 | 大连理工大学 | Method for correcting image based on the reciprocating HDR transformation of depth |
Non-Patent Citations (2)
Title |
---|
张淑芳等: "采用主成分分析与梯度金字塔的高动态范围图像生成方法", 《西安交通大学学报》 * |
都琳等: "针对动态目标的高动态范围图像融合算法研究", 《光学学报》 * |
Cited By (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111709896A (en) * | 2020-06-18 | 2020-09-25 | 三星电子(中国)研发中心 | Method and equipment for mapping LDR video into HDR video |
CN111709896B (en) * | 2020-06-18 | 2023-04-07 | 三星电子(中国)研发中心 | Method and equipment for mapping LDR video into HDR video |
CN111986134B (en) * | 2020-08-26 | 2023-11-24 | 中国空间技术研究院 | Remote sensing imaging method and device for area-array camera |
CN111986134A (en) * | 2020-08-26 | 2020-11-24 | 中国空间技术研究院 | Remote sensing imaging method and device for area-array camera |
CN112435306A (en) * | 2020-11-20 | 2021-03-02 | 上海北昂医药科技股份有限公司 | G banding chromosome HDR image reconstruction method |
CN113132655A (en) * | 2021-03-09 | 2021-07-16 | 浙江工业大学 | HDR video synthesis method based on deep learning |
CN113379698A (en) * | 2021-06-08 | 2021-09-10 | 武汉大学 | Illumination estimation method based on step-by-step joint supervision |
CN113379698B (en) * | 2021-06-08 | 2022-07-05 | 武汉大学 | Illumination estimation method based on step-by-step joint supervision |
CN113971639A (en) * | 2021-08-27 | 2022-01-25 | 天津大学 | Depth estimation based under-exposed LDR image reconstruction HDR image |
CN114189633A (en) * | 2021-12-22 | 2022-03-15 | 北京紫光展锐通信技术有限公司 | HDR image imaging method and device and electronic equipment |
WO2023178610A1 (en) * | 2022-03-24 | 2023-09-28 | 京东方科技集团股份有限公司 | Image processing method, computing system, device and readable storage medium |
WO2023246392A1 (en) * | 2022-06-22 | 2023-12-28 | 京东方科技集团股份有限公司 | Image acquisition method, apparatus and device, and non-transient computer storage medium |
CN117745603A (en) * | 2024-02-20 | 2024-03-22 | 湖南科洛德科技有限公司 | Product image correction method and device based on linear array scanning device and storage medium |
CN117876282A (en) * | 2024-03-08 | 2024-04-12 | 昆明理工大学 | High dynamic range imaging method based on multi-task interaction promotion |
CN117876282B (en) * | 2024-03-08 | 2024-05-14 | 昆明理工大学 | High dynamic range imaging method based on multi-task interaction promotion |
Also Published As
Publication number | Publication date |
---|---|
CN111242883B (en) | 2023-03-28 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111242883B (en) | Dynamic scene HDR reconstruction method based on deep learning | |
Fan et al. | Integrating semantic segmentation and retinex model for low-light image enhancement | |
Lee et al. | Deep chain hdri: Reconstructing a high dynamic range image from a single low dynamic range image | |
Cai et al. | Learning a deep single image contrast enhancer from multi-exposure images | |
Zhou et al. | Cross-view enhancement network for underwater images | |
Pan et al. | Multi-exposure high dynamic range imaging with informative content enhanced network | |
CN113592726A (en) | High dynamic range imaging method, device, electronic equipment and storage medium | |
Rasheed et al. | LSR: Lightening super-resolution deep network for low-light image enhancement | |
Lv et al. | Low-light image enhancement via deep Retinex decomposition and bilateral learning | |
Yin et al. | Two exposure fusion using prior-aware generative adversarial network | |
Chen et al. | End-to-end single image enhancement based on a dual network cascade model | |
Zhang et al. | Multi-branch and progressive network for low-light image enhancement | |
CN115035011A (en) | Low-illumination image enhancement method for self-adaptive RetinexNet under fusion strategy | |
Cao et al. | A brightness-adaptive kernel prediction network for inverse tone mapping | |
Tan et al. | High dynamic range imaging for dynamic scenes with large-scale motions and severe saturation | |
Chen et al. | Improving dynamic hdr imaging with fusion transformer | |
CN117237207A (en) | Ghost-free high dynamic range light field imaging method for dynamic scene | |
Tian et al. | Deformable convolutional network constrained by contrastive learning for underwater image enhancement | |
Ye et al. | Single exposure high dynamic range image reconstruction based on deep dual-branch network | |
Hu et al. | High dynamic range imaging with short-and long-exposures based on artificial remapping using multiscale exposure fusion | |
Van Vo et al. | High dynamic range video synthesis using superpixel-based illuminance-invariant motion estimation | |
Ma et al. | Image Dehazing Based on Improved Color Channel Transfer and Multiexposure Fusion | |
Singh et al. | Variational approach for intensity domain multi-exposure image fusion | |
Kinoshita et al. | Deep inverse tone mapping using LDR based learning for estimating HDR images with absolute luminance | |
Yang et al. | Multi-scale extreme exposure images fusion based on deep learning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |