CN113971639A

CN113971639A - Depth estimation based under-exposed LDR image reconstruction HDR image

Info

Publication number: CN113971639A
Application number: CN202110993297.6A
Authority: CN
Inventors: 张涛; 梁杰; 王昊
Original assignee: Tianjin University
Current assignee: Tianjin University
Priority date: 2021-08-27
Filing date: 2021-08-27
Publication date: 2022-01-25

Abstract

Digital cameras can only capture the brightness of a real scene within a limited range. Texture details present in High Dynamic Range (HDR) data are lost due to quantization and saturation of the camera sensor, which presents great difficulties for subsequent image-based applications (depth estimation, etc.). As an indicator of HDR image reconstruction effect, depth estimation errors caused by incorrect tones reflected in LDR images may support improving learning efficiency of HDR image generation networks. Therefore, we propose an end-to-end framework consisting of two connected CNNs for the most critical issues in HDR image reconstruction. By performing a large number of quantitative and qualitative experiments on the reference and recently released challenging data sets, it is demonstrated that the proposed method achieves extraordinary performance compared to the latest single image HDR reconstruction and depth estimation algorithms.

Description

Depth estimation based under-exposed LDR image reconstruction HDR image

Technical Field

The invention relates to a reconstruction technology of HDR images, depth estimation is introduced into an image reconstruction model, and the whole network is superior to other HDR image reconstruction methods in an unsupervised learning mode.

Background

Digital cameras can only capture the brightness of a real scene within a limited range. Due to quantization and saturation of the camera sensor, texture details present in the High Dynamic Range (HDR) data are lost, causing great difficulties for subsequent image-based applications (depth estimation, etc.). Unlike existing learning-based methods, which introduce depth estimation into a learning model for HDR image reconstruction, the proposed method facilitates each other. As an indicator of HDR image reconstruction effect, depth estimation errors caused by incorrect tones in LDR images may support improving learning efficiency of HDR image reconstruction networks. On the other hand, a good HDR image may also improve depth estimation accuracy. Therefore, we propose an end-to-end framework consisting of two connected CNNs for HDR image reconstruction and depth estimation. Since underexposed images are the most critical issue in HDR image reconstruction, the reconstruction of HDR images from underexposed LDR images is mainly studied herein. By performing a large number of quantitative and qualitative experiments on the reference and recently released challenging data sets, it is demonstrated that the proposed method achieves extraordinary performance compared to the latest single image HDR reconstruction and depth estimation algorithms.

Currently, HDR image reconstruction techniques are widely used. HDR images generally contain more texture information, can bring better visual experience to human beings, and are therefore widely used in the fields of photography, physics-based rendering, computer games, movies, medical and industrial imaging, and the like. Since digital high resolution cameras are typically very expensive, high resolution image reconstruction algorithms that generate high resolution images from images captured by conventional digital cameras have become a common method of high resolution imaging. Whereas high resolution imaging can preserve details in both the extremely dark and bright regions, it has great potential to aid various tasks, in particular depth estimation, which relies on accurate color and intensity information to explore the pixel spatial relationships to estimate depth, with much less depth error for normally exposed LDR images. Under-exposed LDR images are usually noisy, resulting in loss of illumination information, which is not favorable for extraction of effective features. In contrast, HDR retains rich texture and color information. Intuitively, in dark environments, humans cannot accurately estimate the depth of a scene. Inspired by the depth estimation performance of computer vision and human eye vision under different illumination conditions, in order to better acquire the depth information of an exposed area, the HDR technology can be used for solving the problem of insufficient depth information of the exposed area, so that the lost depth information is recovered, the error of depth estimation is reduced, and the reconstruction of an HDR image is facilitated.

Inverse tone mapping can recover information lost due to underexposure/overexposure and color quantization. The most common methods are to fuse a series of Low Dynamic Range (LDR) images of different exposure times produced by a conventional camera or to directly learn a single LDR to HDR mapping, generating HDR images, although these methods reconstruct HDR images that can meet visual requirements, in many applications they do not reflect satisfactory results.

Disclosure of Invention

The present invention proposes an efficient method to reconstruct HDR images and corresponding depth maps from underexposed LDR images. Firstly, HDR reconstruction is combined with depth estimation, and the HDR reconstruction problem is solved. Compared to most other methods accepting a normal exposure image as input, the overexposure model network herein generates LDR images of different exposures with an underexposed LDR image as input; compared with most methods for reconstructing HDR images, the learning framework does not need the HDR images as ground-route and trains the HDR reconstruction network by taking depth loss as a constraint condition.

Detailed Description

The present invention combines the knowledge of HDR reconstruction with depth estimation to design a model. The core idea of the invention is to integrate depth estimation into the HDR image reconstruction model and develop two connected depth neural networks to solve each task. The method comprises the following specific steps:

the first step is the HDR reconstruction network.

The method of inferring 32 bit HDR images directly from 8 bit LDR images is difficult to train with unsupervised learning, and HDR reconstruction networks do not map LDRs directly to HDR images, but employ indirect methods to generate LDR images of different exposures, which are then merged into HDR images. The specific method is to give the input LDR image, infer from the given underexposed image that the LDR image with higher exposure than the input image, then output the overexposed image, and finally generate the final HDR image from these encompassed LDR images.

The second step is a depth estimation network.

And taking the HDR image as an input of a depth estimation network to perform depth estimation on the scene. The framework provided by the invention adopts an unsupervised depth estimation network based on CNN, and a depth map is trained from an image as input. The specific method is that a pair of three-dimensional HDR images are respectively sent into the same CNN network model, corresponding disparity maps are output, then reconstructed HDR left and right images are obtained through a sampler, and the predicted parity check can be aligned with the opposite input images.

CNN firstly adopts an encoder structure to extract geometric features, a decoder structure further constrains output parallax, nearest pixels at peripheral positions are linearly searched through bilinear sampling, and a distorted image is formed

Ultimately generating a reconstructed HDR image

And

warping images

Predictive disparity map

The relationship with the original input image I is as follows:

Claims

1. the present invention combines the knowledge of HDR reconstruction with depth estimation to design a model. The core idea of the invention is to integrate depth estimation into an HDR image reconstruction model and develop two connected depth neural networks to solve each task, and the method is characterized by comprising the following steps:

1) since the method of inferring the 32-bit HDR image directly from the 8-bit LDR image is difficult to train for unsupervised learning, the present invention proposes an efficient method to generate HDR images and corresponding accurate depth maps under the exposed LDR image;

2) the method combines a training HDR reconstruction network and a depth estimation network, adopts an unsupervised end-to-end training strategy, integrates depth estimation into an HDR image reconstruction model, and develops two connected depth networks to solve each task;

3) in the invention, residual connection is added between the convolution layer and the deconvolution layer to transmit information between different exposure images, the characteristic diagram contains a large amount of image details, and the characteristic diagram can be shared between different exposure images, which is beneficial to the deconvolution layer to recover better images, and in addition, the residual connection can also reversely transmit the gradient to the bottom, thereby being convenient for better training network;

4) the invention discusses the influence of a depth loss function on an HDR image and a depth image, and adopts an ablation research method.

2. The method of generating an HDR image of claim 1, wherein step 1) comprises:

(1) using the underexposed image as an input, deducing LDR images with higher exposure rate than the input image, wherein the information recorded by each LDR image corresponds to the content of different dynamic ranges in the real scene;

(2) the overexposed images are output and finally the final HDR image is generated from these LDR images.

3. The method of claim 1, wherein step 2) comprises:

(1) firstly, giving an input LDR image, and recovering detail information lost due to underexposure by using an HDR image generation network, wherein the HDR image generation network consists of a convolutional layer and an anti-convolutional layer and is used for learning different exposure representations, the convolutional layer serves as a feature extractor and retains main components of objects in the image, and the anti-convolutional layers are combined to recover the details of the image;

(2) taking HDR images as input of a depth estimation network, performing depth estimation on a scene, learning the whole network in an unsupervised mode, constraining the network learning process by utilizing the relation between the HDR images and the LDR images, and training by combining the LDR generated by HDR and the LDR generated by losing the consistency between the LDR images generated by RGB under different exposure amounts until the HDR and the LDR are well generated;

(3) respectively sending a pair of three-dimensional HDR images into the same network model, outputting corresponding disparity maps, then obtaining reconstructed HDR left and right images through a sampler, and aligning predicted parity with opposite input images;

(4) the depth estimation network adopts an encoder to extract geometric features, the decoder structure further constrains output parallax, nearest pixels of the surrounding positions are searched linearly through bilinear sampling, and finally a reconstructed image is generated.

(5) In order to make the reconstructed image consistent with the original image, appearance matching loss, parallax smoothness loss and left-right parallax consistency loss are respectively set.