CN116389912A

CN116389912A - Method for reconstructing high-frame-rate high-dynamic-range video by fusing pulse camera with common camera

Info

Publication number: CN116389912A
Application number: CN202310448820.6A
Authority: CN
Inventors: 施柏鑫; 常亚坤; 黄铁军; 许超; 周矗; 洪雨辰; 胡力文
Original assignee: Peking University
Current assignee: Peking University
Priority date: 2023-04-24
Filing date: 2023-04-24
Publication date: 2023-07-04
Anticipated expiration: 2043-04-24
Also published as: CN116389912B

Abstract

The invention discloses a method for reconstructing a high-frame-rate high-dynamic-range video by fusing a pulse camera with a common camera, which comprises the following steps of: s1, pulse signal processing: integrating photons accumulated by a pulse camera in a time domain to obtain a pulse reconstruction image, and performing optical flow estimation; s2, preprocessing a color image: setting a cyclic alternating exposure mode of different exposure time of a color industrial camera, carrying out brightness correction on an image, carrying out deblurring calculation on a blurred image to obtain a group of potential images, and reconstructing color high frame rate on the potential images to obtain a blur-free image; s3, pulse guided frame insertion and fusion: and (2) coloring the pulse reconstruction image obtained in the step (S1) by using a fusion frame inserting module based on a cyclic convolution neural network, and simultaneously inserting frames into the non-blurred image obtained in the step (S2), and outputting a color HDR video frame by frame. The invention effectively improves the quality and frame rate of the fused image and realizes high-frame rate and high dynamic range imaging.

Description

Method for reconstructing high-frame-rate high-dynamic-range video by fusing pulse camera with common camera

Technical Field

The invention relates to the technical field of video generation, in particular to a method for reconstructing a high-frame-rate high-dynamic-range video by fusing a pulse camera with a common camera.

Background

How to shoot high frame rate and high dynamic color video in a high-speed moving scene is a problem which is difficult to perfectly solve in the field of bottom imaging. This requires that the photographer's camera system have basic characteristics of high frame rate, high resolution, high dynamics, low blur, and the like. However, commercial cameras based on conventional image frames still suffer from the drawbacks of large data redundancy and low dynamic range in high speed scenarios: the continuous video frames in the form of a plurality of bits have a large amount of redundancy, and are limited by transmission technology, when a camera reads out at a high frame rate, the resolution of the video frames can be greatly reduced, and the problem is improved to a certain extent by adopting on-chip storage, but the data reading out has no real-time property; the high frame rate necessarily limits short exposure due to the frame rate/shutter limit, resulting in poor perception of the low light scene by the camera, and increasing shutter time results in a decrease in frame rate, which creates problems of motion blur and inter-frame information hopping. It is because of the defect that traditional camera faces under high-speed scene, makes it difficult to satisfy actual demand in the application field of urgent demand and the scientific exploration field of great front foundation.

The dynamic range that can be seen by the human eye is approximately 10000:1, but the dynamic range of a normal camera can only reach 1000:1. the dynamic range over which a single photo can be taken is extremely limited. In order to correctly expose the information with high brightness, the part with low brightness in the scene becomes dark and generates noise by using too short exposure time, otherwise, in order to correctly expose the information with low brightness, the part with high brightness in the scene is overexposed and details are lost by using too long exposure time. LDR video frames by fusing alternating exposures are the dominant method of obtaining HDR video, but are limited to long exposures, the frame rate is typically limited to tens of FPSs, and thus cannot capture scenes moving at high speed.

Classical high dynamic range imaging mainly has two methods, namely, a special device is built to capture a wider dynamic range, a complex optical path system is required to be arranged in a camera, light is split into a plurality of photosensitive devices, and a plurality of groups of LDR videos are acquired and HDR synthesis is carried out by different exposure parameters of the devices. The other is to alternately shoot images with different exposure time lengths through a single camera cycle, map the images to a linear space, align any group of adjacent multiple alternative exposure images by adopting pixel-level motion alignment, calculate fusion parameters, weight each pixel of each image with different weights, and obtain the HDR video frame.

With the development of the deep learning method, the use of the comprehensive modeling capability of the neural network to the implicit data distribution to solve the underlying visual problem gradually becomes the mainstream method in recent years, and a series of inverse tone mapping high dynamic range imaging methods for designing network structures are generated. Most of the methods utilize a neural network with a plurality of LDR images trained to calculate an optical flow field so as to obtain alignment parameters among video frames, and further utilize a fusion network to obtain an HDR video frame. Compared with a method for acquiring HDR video by adopting special equipment, the method utilizes lighter equipment to acquire data, thereby reducing the hardware cost of shooting. However, the process of fusing the alternate exposure LDR video frames is very complex, the output result is easy to generate virtual images, the frame rate is greatly limited in improvement, for long exposure images, inaccuracy of image deblurring can be caused due to nonlinearity of scene motion, and the dependence of the deep learning method on training data makes the performance of the long exposure images unstable when the long exposure images are tested in scenes which are not contained in the training data set.

Disclosure of Invention

Aiming at the problems of insufficient frame rate improvement, image deblurring and the like of the existing high-dynamic-range video generation technology, the invention provides a method for reconstructing a high-frame-rate high-dynamic-range video by fusing a pulse camera with a common camera, which successfully fuses the advantages of a pulse signal with a high frame rate and a low resolution with a color camera signal with a low frame rate and a high resolution, and obtains a high-quality high-dynamic-range color video.

In order to achieve the above object, the present invention provides the following technical solutions:

the invention provides a method for reconstructing a high-frame-rate high-dynamic-range video by fusing a pulse camera with a common camera, which comprises the following steps of:

s1, pulse signal processing: integrating photons asynchronously issued by a pulse camera in a time domain to obtain a pulse reconstruction image, and calculating a pulse light flow field;

s2, preprocessing a color image: setting a cyclic alternating exposure mode of different exposure time of a color industrial camera, carrying out brightness correction on an image, carrying out deblurring calculation on a blurred image to obtain a group of potential images, and reconstructing color high frame rate on the potential images to obtain a blur-free image;

s3, pulse guided frame insertion and fusion: and (2) coloring the pulse reconstruction image obtained in the step (S1) by using a fusion frame inserting module based on a cyclic convolution neural network, and simultaneously inserting frames into the non-blurred image obtained in the step (S2), and outputting a color HDR video frame by frame.

Further, in step S1, the method for representing the continuously emitted photons by the pulse camera in the time domain includes:

S(x，y)＝{s(x，y，t)}

where {.cndot } represents the set, (x, y) is the spatial coordinates, t is the temporal coordinates, and when the integral of the photon by the pixel reaches the set threshold, the value of s is 1, and zero is juxtaposed to wait for the next integration process.

Further, in step S1, motion field information included in the pulse stream is obtained by image reconstruction, expressed as:

wherein I is _i Is t _i Reconstructing an image from pulses corresponding in time, t _f Is the integration time window.

Further, in step S2, the color industrial camera shutter time is set to a cyclic alternating exposure mode of 1ms, 4ms, 12ms, and the frame rate is set to 60FPS.

Further, in step S2, the SG-deblur module is used to deblur the image, and the relationship between the blurred image and the potential image is:

wherein B is _blur To blur an image, N is the number of potential images that need to be solved, B _i Is the ith potential image;

and reconstructing a 1000FPS color high frame rate through an SG-interaction module to obtain a blur-free image.

Further, in step S3, super-resolution is performed on the pulse reconstructed image so as to match the spatial resolutions of the pulse reconstructed image and the color image.

Further, in step S3, the PixelShuffle module P (·) is used to perform feature extraction and spatial up-sampling on the pulse reconstructed image, the encoder based on the convolutional neural network performs feature extraction on the color image, the feature extracted from the pulse reconstructed image is fused with the feature of the color image, the ConvLSTM-based cyclic convolution operation is used to retain the information of the time domain, and finally the CNN-based decoder is used to obtain the pulse reconstructed image I _i Corresponding color image C _i 。

Further, training the neural network in the step S3 by using synthetic data, and synthesizing the training data as follows:

a) Setting the frame rate of a color industrial camera as 80FPS, and shooting video sequences in a low-speed moving scene with exposure time of 1ms, 4ms and 12ms in sequence, wherein each group of video comprises 303 LDR images, and 160 groups of training data and 40 groups of test data are shot in total;

b) Performing time dimension compression on the color images, wherein each 3 color images are combined into a group, 1 Jin Biaozhun HDR image is synthesized, and each group obtains 101 HDR images, which is equivalent to 0.101 second high-speed video;

c) Sampling from the synthesized 0.101 second video in 60 FPS; collecting 3 1ms image frames in total, wherein the three frames of images are respectively collected from 1ms exposure images in the 1 st, 50 th and 101 th groups of images; a total of 24 ms image frames are acquired, the first 4ms image is acquired from the average value of 4ms images in the 17 th to 20 th groups, and the second 4ms image is acquired from the average value of 4ms images in the 67 th to 70 th groups; a total of 2 12ms image frames are acquired, the first 12ms image is acquired from the average value of 4 12ms images in the 21 st to 24 th groups, and the second 4ms image is acquired from the average value of 4 12ms images in the 71 st to 74 th groups;

d) 101 color images were converted into gray scale images and downsampled to 400 x 250 resolution, and pulse data were synthesized according to the principle of pulse emission.

Further, the neural network in step S3 includes two SG-deblur modules, an SG-interpolation module and a fusion frame inserting module, where the SG-deblur module is used to deblur the long exposure image, the SG-interpolation module mainly obtains a moving light flow field by the Spike camera optical flow module, inserts frames into the deblurred color image, and fuses the color image after frame insertion with the pulse reconstruction image by using the fusion frame inserting module.

Further, the neural network training of step S3 is as follows:

a) Long exposure image B _blur And corresponding N pulses to reconstruct an image { I ] _j Cascade together to form a tensor of dimension (b, c+n, H, W), b being the number of images per training input batch, c being the number of color channels, (H, W) being the image size; this tensor is input to the SG-deblur module, features are extracted by the CNN-based encoder, the convolution operation of each layer of the encoder is:

wherein (1)>

For a convolution kernel with window size w, d is the number of channels of the output tensor, +.>

The convolution operation is represented, x is an input tensor or the output of an upper characteristic convolution layer, b is an offset term, IN (·) is a batch normalization operation, the existing tensor is normalized, f is a linear rectification function, and y is the tensor with the channel number d, namely the characteristic obtained by extracting the characteristic convolution layer. The PixelSheffle layer adjusts the space size of the features to be 1/2, the number of channels is increased to be 2 times, then Residual dense block is adopted to further extract the features, and finally a CNN-based decoder is adopted to form a blur-free image; after the deblurring operation, 4 images with the exposure time of 4ms generate 4 non-blurred images, 12 images with the exposure time of 12ms generate 12 images, and the frame rate of the color video is improved from original 60FPS to 340FPS;

b) Interpolating the image sequence of 340FPS obtained in step a) to 1000FPS, wherein the specific algorithm is as follows: for any time t _i Forward and backward query distances t _i Two-frame color image B ⁺ And B ^- Respectively calculate t _i Optical flow field of two frames of images from time to time and utilizing optical flow field pair B ⁺ And B ^- Performing deformation operation;

c) Each time 5 pulse reconstruction images are input into a fusion frame inserting module, the fusion frame inserting module firstly adopts PixelShellffe to convert the space size of the features into 2 times of the size, then adopts a CNN-based coding network to extract high-dimensional features from the extracted features, and simultaneously adopts another group of CNN-based encoders to extract the high-dimensional features from B ⁺ And B ^- Respectively extracting two groups of features, fusing the features from the pulse image and the color image by adopting a multi-scale fusion strategy, and finally obtaining an optimized color image C by adopting a CNN-based decoder _i ；

d) Color image C to be output by neural network _i G with real HDR image _i Dynamic range compressionUsing the formula

Wherein μ represents the degree of compression, μ is selected to be 5000; calculating L1 loss between two gray maps after compressing dynamic range as +.>

SSIM loss of

LPIPS loss as

Total loss of->

Wherein the weight beta is lost ₁ ＝1，β ₂ And (1) continuously updating the weights of all layers of the SG-deblur module, the SG-interaction module and the fusion frame inserting module by a back propagation algorithm.

Compared with the prior art, the invention has the beneficial effects that:

according to the method for reconstructing the high-frame-rate high-dynamic-range video by fusing the pulse camera with the common camera, disclosed by the invention, the frame rate of reconstructing the HDR from the LDR video of the single color camera is greatly improved by fusing a plurality of alternately exposed LDR images with the output of the pulse camera, and the high-frame-rate high-dynamic-range imaging is realized. Meanwhile, a deep learning method is used, a network module is designed independently aiming at the difference of the LDR color image blurring problem and the inter-frame color missing problem in each aspect, and corresponding fusion flow steps are provided. In addition, the invention reserves rich time domain information by utilizing the high-speed pulse signal, greatly reduces the difficulty of resolving the blurring of the color image, can more accurately solve the nonlinear motion and enhances the robustness of the algorithm.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are needed in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments described in the present invention, and other drawings may be obtained according to these drawings for a person having ordinary skill in the art.

Fig. 1 is a flowchart of a method for reconstructing a high-frame-rate high-dynamic-range video by fusing a pulse camera with a common camera.

Fig. 2 is a network structure diagram of a method for reconstructing a high-frame-rate high-dynamic-range video by fusing a pulse camera with a common camera.

Fig. 3 is a flowchart of a method according to embodiment 1 of the present invention.

Detailed Description

In order to better understand the technical solution, the technical solution in the embodiment of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiment of the present invention. It will be apparent that the described examples are only some embodiments, but not all embodiments, of the present invention. Based on the embodiments of the present invention, those of ordinary skill in the art will be able to devise all other embodiments that are obtained based on this application and are within the scope of the present invention.

The method for reconstructing the high-frame-rate high-dynamic-range video by fusing the pulse camera with the common camera provided by the invention, as shown in fig. 1, comprises three steps of pulse signal preprocessing (a in fig. 1), color image preprocessing (c in fig. 1), pulse guiding frame inserting and fusing (b in fig. 1).

S1, pulse signal processing: and integrating photons asynchronously emitted by the pulse camera in a time domain to obtain a pulse reconstruction image, and calculating a pulse optical flow field.

Specifically, the method for representing the continuously emitted photons by the pulse camera in the time domain comprises the following steps:

S(x，y)＝{s(x，y，t)}

The pulse stream does not have the property of being watched by human eyes, and the motion field information contained in the pulse stream is obtained by image reconstruction and expressed as:

S2, preprocessing a color image: setting a cyclic alternating exposure mode of different exposure time of a color industrial camera, carrying out brightness correction on the image, carrying out deblurring calculation on the blurred image to obtain a group of potential images, and reconstructing color high frame rate on the potential images to obtain a blur-free image.

Specifically, the color industrial camera shutter time is set to a cyclic alternating exposure mode of 1ms, 4ms, 12ms, and the frame rate is set to 60FPS.

On the premise of no overexposure, the response value of the pixel is proportional to the exposure time, so that image frames shot at different exposure times have different brightnesses, and brightness correction needs to be performed on the images. When the shutter time is long, the image is blurred due to the displacement of the motion, the deblurring calculation is needed to be carried out on the long exposure image, a group of potential images are obtained, the image is deblurred by adopting the SG-deblur module, and the relation between the blurred image and the potential images is as follows:

wherein B is _blur To blur an image, N is the number of potential images that need to be solved, B _i Is the ith potential image. Because the blurred image may have overexposed areas, there is an unequal relationship between the blurred image and the potential imageIs tied up.

Specifically, the resolution of the pulse reconstruction image is 400×250, and the resolution of the color image is 800×500, and super-resolution is required for the pulse reconstruction image to match the spatial resolutions of the pulse reconstruction image and the color image.

Performing feature extraction and spatial up-sampling on a pulse reconstruction image by using a PixelShuffle module P (), designing an encoder based on a convolutional neural network, performing feature extraction on a color image, fusing the features extracted from the pulse reconstruction image with the features of the color image, retaining time domain information by using ConvLSTM-based cyclic convolution operation, and finally obtaining a pulse reconstruction image I by using a CNN-based decoder _i Corresponding color image C _i ：

Is a fusion frame inserting module based on a cyclic convolution neural network.

As shown in FIG. 2, the cyclic convolutional neural network based on the design of the invention comprises two SG-deblur modules, a SG-interpolation module and a fusion frame inserting module, wherein the SG-deblur module is used for deblurring a long exposure image, the SG-interpolation module mainly obtains a moving light flow field by a pulse light flow (Spike camera optical flow) module, carries out frame inserting on the deblurred color image, and fuses the color image after frame inserting with a pulse reconstruction image by utilizing the fusion frame inserting module.

As shown in fig. 2, the neural network of step S3 is trained by using synthetic data, and the steps of synthesizing training data are as follows:

The training steps of the neural network of the invention are as follows:

wherein (1)>

The convolution operation is represented, x is an input tensor or an output of an upper characteristic convolution layer, b is an offset term, IN (·) is a batch normalization operation, an existing tensor is normalized, f is a linear rectification function (Rectified Linear Unit, reLU), and y is a tensor with a channel number d, namely, a characteristic obtained by extracting the characteristic convolution layer. The PixelSheffle layer adjusts the space size of the features to be 1/2, the number of channels is increased to be 2 times, then Residual dense block is adopted to further extract the features, and finally a CNN-based decoder is adopted to form a blur-free image; after deblurring operation, 4 images with exposure time of 4ms generate 4 non-blurred images, 12 images with exposure time of 12ms generate 12 images, and the frame rate of the color video is increased from original 60FPS to (1+4+12) multiplied by 20=340 FPS;

b) The pulse optical flow module is adopted to interpolate the image sequence of 340FPS obtained in the step a) to 1000FPS, and the specific algorithm is as follows: for any time t _i Forward and backward query distances t _i Two-frame color image B ⁺ And B ^- Respectively calculate t _i Optical flow field of two frames of images from time to time and utilizing optical flow field pair B ⁺ And B ^- Performing deformation operation;

d) Will beColor image C output by neural network _i G with real HDR image _i Dynamic range compression is performed using the formula

Wherein μ represents the compression degree, μ is selected to be 5000 in this embodiment; calculating L1 loss between two gray maps after compressing dynamic range as +.>

SSIM loss of

LPIPS loss as

Total loss of

Wherein the weight beta is lost ₁ ＝1，β ₂ =1, and then the weight of each layer of the SG-deblur module, the SG-interaction module and the fusion frame inserting module is continuously updated by a back propagation algorithm

Example 1

The method for reconstructing the high-frame-rate high-dynamic-range video by fusing the pulse camera with the common camera, as shown in fig. 3, comprises the following steps:

a) Building a hybrid camera system: the hybrid camera system is tested by a common RGB industrial camera with the model of Basler acA800-510uc and a pulse camera, the common RGB industrial camera and the pulse camera use the same HIKROBOOT lens, and the incident light is divided into two beams by a spectroscope, and the two beams synchronously enter the fields of view of two camera sensors.

b) Reconstructing HDR gray map reconstruction of pulse camera: the pulse data is integrated by using a sliding window method to obtain a gray scale map, the window size is 1ms, and the resolution is 400 multiplied by 250.

c) The method comprises the steps of inputting an alternately exposed LDR image shot by a common RGB industrial camera and an HDR gray level image shot and reconstructed by a pulse camera into a trained neural network, firstly deblurring a color image through an SG-deblur module, reconstructing a 1000FPS color high frame rate through the SG-interpolation module, finally fusing the color image and the pulse reconstructed image through a fusion frame inserting module, and outputting a final high-resolution high-frame rate color HDR video.

The foregoing description is only of the preferred embodiments of the present invention and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention are included in the protection scope of the present invention.

Claims

1. The method for reconstructing the high-frame-rate high-dynamic-range video by fusing the pulse camera with the common camera is characterized by comprising the following steps of:

2. The method for reconstructing a high-frame-rate high-dynamic-range video by fusing a pulse camera with a normal camera according to claim 1, wherein in step S1, the method for representing continuously emitted photons by the pulse camera in a time domain is as follows:

S(x，y)＝{s(x，y，t)}

3. The method for reconstructing a high frame rate high dynamic range video by a pulse camera fused with a normal camera according to claim 1, wherein in step S1, motion field information contained in a pulse stream is obtained by image reconstruction, expressed as:

4. The method for reconstructing a high frame rate high dynamic range video by a pulsed camera fused with a normal camera according to claim 1, wherein in step S2, the color industry camera shutter time is set to a cyclic alternating exposure mode of 1ms, 4ms, 12ms, and the frame rate is set to 60FPS.

5. The method for reconstructing a high-frame-rate high-dynamic-range video by fusing a pulse camera with a common camera according to claim 1, wherein in step S2, an SG-deblur module is adopted to deblur an image, and the relationship between the blurred image and a potential image is as follows:

6. The method of claim 1, wherein in step S3, super-resolution is performed on the pulse reconstructed image to match the spatial resolutions of the pulse reconstructed image and the color image.

7. The method for reconstructing high-frame-rate high-dynamic-range video by fusing a pulse camera with a common camera according to claim 1, wherein in step S3, a PixelShuffle module P (·) is used for performing feature extraction and spatial upsampling on a pulse reconstructed image, a convolutional neural network-based encoder is used for performing feature extraction on a color image, features extracted from the pulse reconstructed image are fused with features of the color image, a ConvLSTM-based cyclic convolution operation is used for retaining time domain information, and finally a CNN-based decoder is used for obtaining a pulse reconstructed image I _i Corresponding color image C _i 。

8. The method for reconstructing a high frame rate high dynamic range video by a pulse camera fused with a normal camera according to claim 1, wherein the neural network of step S3 is trained by using synthesized data, and the step of synthesizing the training data is as follows:

9. The method for reconstructing a high-frame-rate high-dynamic-range video by fusing a pulse camera with a common camera according to claim 1, wherein the neural network in the step S3 comprises two SG-deblur modules, one SG-interpolation module and one fusion frame insertion module, the SG-deblur module is used for deblurring a long exposure image, the SG-interpolation module mainly obtains a moving light flow field by the Spike camera optical flow module, carries out frame insertion on the deblurred color image, and utilizes the fusion frame insertion module to fuse the color image after frame insertion with a pulse reconstruction image.

10. The method for reconstructing a high frame rate high dynamic range video by a pulse camera fused with a normal camera according to claim 9, wherein the step of training the neural network of step S3 is as follows:

wherein (1)>

d) Color image C to be output by neural network _i G with real HDR image _i Dynamic range compression is performed using the formula

SSIMloss of

LPIPSLoss as

Total loss of->