CN113973175A

CN113973175A - Rapid HDR video reconstruction method

Info

Publication number: CN113973175A
Application number: CN202110993299.5A
Authority: CN
Inventors: 张涛; 梁杰; 王昊
Original assignee: Tianjin University
Current assignee: Tianjin University
Priority date: 2021-08-27
Filing date: 2021-08-27
Publication date: 2022-01-25

Abstract

High quality video is important in SLAM and video surveillance, which can clearly monitor objects or activities of interest and improve the positioning accuracy of SLAM. The conversion of low quality video to high quality video is becoming more and more a key to the robotic task, and High Dynamic Range (HDR) imaging technology represents high quality images and is finding wider application. The pixels contained in HDR images and videos represent a larger range of colors and luminances than the pixels provided by conventional Low Dynamic Range (LDR) images and videos. HDR imaging is a technique used to take pictures that captures a greater dynamic luminance range than standard digital cameras. The goal of HDR imaging is to produce a range of real-world luminances similar to those observed by the human visual system, resulting in a more realistic and engaging experience for the observer.

Description

Rapid HDR video reconstruction method

One, the technical field

The invention provides a method for accelerating an LDR (low density direct) video to reconstruct an HDR (high dynamic range) video by adopting a divide-and-conquer strategy, belonging to the field of HDR video reconstruction.

Second, background Art

High quality video is important in SLAM and video surveillance, which can clearly monitor objects or activities of interest and improve the positioning accuracy of SLAM. The conversion of low quality video to high quality video is becoming more and more a key to the robotic task, and High Dynamic Range (HDR) imaging technology represents high quality images and is finding wider application. The pixels contained in HDR images and videos represent a larger range of colors and luminances than the pixels provided by conventional Low Dynamic Range (LDR) images and videos. The brightness of a real scene may be directed from bright sunlight to extreme shadows. Standard digital cameras typically use 8 bits to represent each color channel of an image, one of the main disadvantages being that this representation cannot capture all the luminance range in a real scene. HDR imaging is a technique used to take pictures that captures a greater dynamic luminance range than standard digital cameras. The goal of HDR imaging is to produce a range of real-world luminances similar to those observed by the human visual system, resulting in a more realistic and engaging experience for the observer.

To achieve HDR reconstruction from LDR images, inverse tone mapping algorithms have great potential, converting a large number of original LDR images into HDR images by recovering the missing signals in a given image, but these algorithms can only be performed in certain types of scenes. In recent years, with the rapid development of deep learning techniques, a high dynamic range image reconstruction method based on a neural network has been proposed. Endo et al, Lee et al, and Eiletsen et al successfully restored the lost dynamic range using a deep neural network.

The fast moving object monitoring/static camera HDR video frame reconstruction method based on the divide and conquer strategy is provided. The divide and conquer strategy is also applied in other fields and research subjects, but has never been discussed in the HDR video reconstruction problem. We explore this efficient strategy in HDR video reconstruction to explore temporal information in video, which has never been efficiently explored before. Specifically, we first use a target detection method to separate the foreground from the entire image, and then train a specified network of background and foreground, connected by context-aware constraints. Once both are trained, this framework combines the background and foreground into an HDR video frame. In order to overcome the problem of inconsistent foreground and background color tones in the synthesis process, a context perception loss constraint is designed to provide context information for the background training of the foreground HDR. Experimental results show that the method can reconstruct high-quality HDR video frames in a shorter time.

Third, the invention

Aiming at a monitoring camera with a moving target, the invention provides a method for accelerating HDR video frame reconstruction by adopting a divide-and-conquer strategy. The framework consists of two connected CNN branches, modeling the entire HDR video reconstruction process. In this process, we use an object detection algorithm to separate the foreground and background of a video frame, then train the CNN branch to reconstruct the background and foreground frames, and connect the two branches. And synthesizing the enhanced HDR foreground and background to reconstruct a final HDR video frame. And the context perception loss constraint under an end-to-end framework is provided so as to eliminate the problem of inconsistent foreground and background colors in the synthesis process. The method provided by the invention is verified on a reference data set, and the result shows that the method can accurately reconstruct HDR video frames, and greatly shortens the time for reconstructing HDR video while ensuring the HDR image effect of each video frame.

TABLE-quantitative comparison of different methods

Fifth, detailed description of the invention

1. And making a target video scene. The invention provides a rapid method for reconstructing HDR video frames from LDR video frames, for a video sequence, a rapid target detection algorithm is firstly used for separating a background and a foreground; and then training the background network and the foreground network to generate a background HDR frame and a foreground HDR frame, synthesizing the frames and reconstructing a final HDR video frame.

2. Background/foreground separation. And extracting the boundary of each frame in the video sequence by using the trained model, obtaining position coordinates in the boundary, and segmenting the foreground from the whole scene. If no object is detected, the background HDR image will be the current HDR frame output, which is the easiest and fastest processing scheme.

3. Background/foreground reconstruction. After the foreground is separated from the scene with yolov5 algorithm, the background and foreground are trained separately with the network. Since the scene to be processed is a surveillance/still camera with moving objects, the background is reconstructed only once for a scene, while the number of foreground reconstructions is determined based on the number of frames contained in the scene.

4. Background/foreground frame synthesis. For a video scene, firstly reconstructing a background once, then fusing a reconstructed foreground HDR frame and a reconstructed background HDR frame, and recording the coordinate position of a foreground in the scene when the foreground is extracted by using a yolov5 algorithm. After the foreground reconstruction is completed, the scene in the background HDR is replaced according to the position given by the scene boundary, so as to form an HDR video frame corresponding to the original LDR video frame.

Claims

1. The invention provides a method for converting an LDR video into an HDR video and accelerating the reconstruction of the HDR video, which is characterized by comprising the following steps:

1) a fast method for reconstructing HDR video frames from LDR video frames is proposed;

2) the invention uses the target detection algorithm of foreground and background, can improve the reconstruction time of HDR video frame;

3) after the foreground and the background are separated, the background and the foreground are trained respectively by using a network;

4) for a video scene, firstly, reconstructing a background once, and then fusing a reconstructed foreground HDR frame and a reconstructed background HDR frame.

2. A fast method for reconstructing HDR video frames from LDR video frames as claimed in claim 1, wherein step 1) comprises:

(1) firstly, separating a foreground and a background by using a rapid target detection algorithm;

(2) training a background network and a foreground network to generate background HDR and foreground HDR frames;

(3) and synthesizing the generated background HDR and foreground HDR frames to reconstruct a final HDR video frame.

3. The method for separately training the background and the foreground according to claim 1, wherein step 3) comprises:

(1) in order to avoid the difference between the background and the foreground boundary after synthesis, the invention provides a loss algorithm based on context sensing, for a video scene, the background is only reconstructed once, and the frequency of foreground reconstruction is determined according to the frame number contained in the scene;

(2) the reconstruction network proposed by the present invention trains the foreground and background separately,

is the LDR video of the t-th frame, the corresponding HDR video frame at this moment is

The invention is to

Predicting as input

To achieve the goal of reconstructing HDR video;

(3) the foreground network and the background network have the same network structure, and the only difference is the difference between the training data and the loss function;

(4) after the foreground reconstruction is completed, the scene in the background HDR is replaced according to the position given by the scene bounding box, so as to form an HDR video frame corresponding to the original LDR video frame.

4. Fusing the reconstructed foreground HDR frame and the reconstructed background HDR frame as claimed in claim 1, wherein the step 4) comprises:

(1) when the background and the foreground are fused, spliced light beams of the boundary are eliminated so as to ensure the consistency of the color of the boundary, and a specified loss function corresponding to the image segmentation and combination stage is designed so as to output a more natural and clearer image;

(2) after the foreground reconstruction is completed, the scene in the background HDR is replaced according to the position given by the scene boundary, so as to form an HDR video frame corresponding to the original LDR video frame.