CN116389912A - Method for reconstructing high-frame-rate high-dynamic-range video by fusing pulse camera with common camera - Google Patents

Method for reconstructing high-frame-rate high-dynamic-range video by fusing pulse camera with common camera Download PDF

Info

Publication number
CN116389912A
CN116389912A CN202310448820.6A CN202310448820A CN116389912A CN 116389912 A CN116389912 A CN 116389912A CN 202310448820 A CN202310448820 A CN 202310448820A CN 116389912 A CN116389912 A CN 116389912A
Authority
CN
China
Prior art keywords
image
pulse
images
frame
camera
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202310448820.6A
Other languages
Chinese (zh)
Other versions
CN116389912B (en
Inventor
施柏鑫
常亚坤
黄铁军
许超
周矗
洪雨辰
胡力文
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Peking University
Original Assignee
Peking University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Peking University filed Critical Peking University
Priority to CN202310448820.6A priority Critical patent/CN116389912B/en
Publication of CN116389912A publication Critical patent/CN116389912A/en
Application granted granted Critical
Publication of CN116389912B publication Critical patent/CN116389912B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/70Circuitry for compensating brightness variation in the scene
    • H04N23/741Circuitry for compensating brightness variation in the scene by increasing the dynamic range of the image compared to the dynamic range of the electronic image sensors
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/70Circuitry for compensating brightness variation in the scene
    • H04N23/73Circuitry for compensating brightness variation in the scene by influencing the exposure time
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/70Circuitry for compensating brightness variation in the scene
    • H04N23/76Circuitry for compensating brightness variation in the scene by influencing the image signals

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Image Processing (AREA)
  • Studio Devices (AREA)
  • Transforming Light Signals Into Electric Signals (AREA)

Abstract

The invention discloses a method for reconstructing a high-frame-rate high-dynamic-range video by fusing a pulse camera with a common camera, which comprises the following steps of: s1, pulse signal processing: integrating photons accumulated by a pulse camera in a time domain to obtain a pulse reconstruction image, and performing optical flow estimation; s2, preprocessing a color image: setting a cyclic alternating exposure mode of different exposure time of a color industrial camera, carrying out brightness correction on an image, carrying out deblurring calculation on a blurred image to obtain a group of potential images, and reconstructing color high frame rate on the potential images to obtain a blur-free image; s3, pulse guided frame insertion and fusion: and (2) coloring the pulse reconstruction image obtained in the step (S1) by using a fusion frame inserting module based on a cyclic convolution neural network, and simultaneously inserting frames into the non-blurred image obtained in the step (S2), and outputting a color HDR video frame by frame. The invention effectively improves the quality and frame rate of the fused image and realizes high-frame rate and high dynamic range imaging.

Description

Method for reconstructing high-frame-rate high-dynamic-range video by fusing pulse camera with common camera
Technical Field
The invention relates to the technical field of video generation, in particular to a method for reconstructing a high-frame-rate high-dynamic-range video by fusing a pulse camera with a common camera.
Background
How to shoot high frame rate and high dynamic color video in a high-speed moving scene is a problem which is difficult to perfectly solve in the field of bottom imaging. This requires that the photographer's camera system have basic characteristics of high frame rate, high resolution, high dynamics, low blur, and the like. However, commercial cameras based on conventional image frames still suffer from the drawbacks of large data redundancy and low dynamic range in high speed scenarios: the continuous video frames in the form of a plurality of bits have a large amount of redundancy, and are limited by transmission technology, when a camera reads out at a high frame rate, the resolution of the video frames can be greatly reduced, and the problem is improved to a certain extent by adopting on-chip storage, but the data reading out has no real-time property; the high frame rate necessarily limits short exposure due to the frame rate/shutter limit, resulting in poor perception of the low light scene by the camera, and increasing shutter time results in a decrease in frame rate, which creates problems of motion blur and inter-frame information hopping. It is because of the defect that traditional camera faces under high-speed scene, makes it difficult to satisfy actual demand in the application field of urgent demand and the scientific exploration field of great front foundation.
The dynamic range that can be seen by the human eye is approximately 10000:1, but the dynamic range of a normal camera can only reach 1000:1. the dynamic range over which a single photo can be taken is extremely limited. In order to correctly expose the information with high brightness, the part with low brightness in the scene becomes dark and generates noise by using too short exposure time, otherwise, in order to correctly expose the information with low brightness, the part with high brightness in the scene is overexposed and details are lost by using too long exposure time. LDR video frames by fusing alternating exposures are the dominant method of obtaining HDR video, but are limited to long exposures, the frame rate is typically limited to tens of FPSs, and thus cannot capture scenes moving at high speed.
Classical high dynamic range imaging mainly has two methods, namely, a special device is built to capture a wider dynamic range, a complex optical path system is required to be arranged in a camera, light is split into a plurality of photosensitive devices, and a plurality of groups of LDR videos are acquired and HDR synthesis is carried out by different exposure parameters of the devices. The other is to alternately shoot images with different exposure time lengths through a single camera cycle, map the images to a linear space, align any group of adjacent multiple alternative exposure images by adopting pixel-level motion alignment, calculate fusion parameters, weight each pixel of each image with different weights, and obtain the HDR video frame.
With the development of the deep learning method, the use of the comprehensive modeling capability of the neural network to the implicit data distribution to solve the underlying visual problem gradually becomes the mainstream method in recent years, and a series of inverse tone mapping high dynamic range imaging methods for designing network structures are generated. Most of the methods utilize a neural network with a plurality of LDR images trained to calculate an optical flow field so as to obtain alignment parameters among video frames, and further utilize a fusion network to obtain an HDR video frame. Compared with a method for acquiring HDR video by adopting special equipment, the method utilizes lighter equipment to acquire data, thereby reducing the hardware cost of shooting. However, the process of fusing the alternate exposure LDR video frames is very complex, the output result is easy to generate virtual images, the frame rate is greatly limited in improvement, for long exposure images, inaccuracy of image deblurring can be caused due to nonlinearity of scene motion, and the dependence of the deep learning method on training data makes the performance of the long exposure images unstable when the long exposure images are tested in scenes which are not contained in the training data set.
Disclosure of Invention
Aiming at the problems of insufficient frame rate improvement, image deblurring and the like of the existing high-dynamic-range video generation technology, the invention provides a method for reconstructing a high-frame-rate high-dynamic-range video by fusing a pulse camera with a common camera, which successfully fuses the advantages of a pulse signal with a high frame rate and a low resolution with a color camera signal with a low frame rate and a high resolution, and obtains a high-quality high-dynamic-range color video.
In order to achieve the above object, the present invention provides the following technical solutions:
the invention provides a method for reconstructing a high-frame-rate high-dynamic-range video by fusing a pulse camera with a common camera, which comprises the following steps of:
s1, pulse signal processing: integrating photons asynchronously issued by a pulse camera in a time domain to obtain a pulse reconstruction image, and calculating a pulse light flow field;
s2, preprocessing a color image: setting a cyclic alternating exposure mode of different exposure time of a color industrial camera, carrying out brightness correction on an image, carrying out deblurring calculation on a blurred image to obtain a group of potential images, and reconstructing color high frame rate on the potential images to obtain a blur-free image;
s3, pulse guided frame insertion and fusion: and (2) coloring the pulse reconstruction image obtained in the step (S1) by using a fusion frame inserting module based on a cyclic convolution neural network, and simultaneously inserting frames into the non-blurred image obtained in the step (S2), and outputting a color HDR video frame by frame.
Further, in step S1, the method for representing the continuously emitted photons by the pulse camera in the time domain includes:
S(x,y)={s(x,y,t)}
where {.cndot } represents the set, (x, y) is the spatial coordinates, t is the temporal coordinates, and when the integral of the photon by the pixel reaches the set threshold, the value of s is 1, and zero is juxtaposed to wait for the next integration process.
Further, in step S1, motion field information included in the pulse stream is obtained by image reconstruction, expressed as:
Figure BDA0004196763990000031
wherein I is i Is t i Reconstructing an image from pulses corresponding in time, t f Is the integration time window.
Further, in step S2, the color industrial camera shutter time is set to a cyclic alternating exposure mode of 1ms, 4ms, 12ms, and the frame rate is set to 60FPS.
Further, in step S2, the SG-deblur module is used to deblur the image, and the relationship between the blurred image and the potential image is:
Figure BDA0004196763990000032
wherein B is blur To blur an image, N is the number of potential images that need to be solved, B i Is the ith potential image;
and reconstructing a 1000FPS color high frame rate through an SG-interaction module to obtain a blur-free image.
Further, in step S3, super-resolution is performed on the pulse reconstructed image so as to match the spatial resolutions of the pulse reconstructed image and the color image.
Further, in step S3, the PixelShuffle module P (·) is used to perform feature extraction and spatial up-sampling on the pulse reconstructed image, the encoder based on the convolutional neural network performs feature extraction on the color image, the feature extracted from the pulse reconstructed image is fused with the feature of the color image, the ConvLSTM-based cyclic convolution operation is used to retain the information of the time domain, and finally the CNN-based decoder is used to obtain the pulse reconstructed image I i Corresponding color image C i
Further, training the neural network in the step S3 by using synthetic data, and synthesizing the training data as follows:
a) Setting the frame rate of a color industrial camera as 80FPS, and shooting video sequences in a low-speed moving scene with exposure time of 1ms, 4ms and 12ms in sequence, wherein each group of video comprises 303 LDR images, and 160 groups of training data and 40 groups of test data are shot in total;
b) Performing time dimension compression on the color images, wherein each 3 color images are combined into a group, 1 Jin Biaozhun HDR image is synthesized, and each group obtains 101 HDR images, which is equivalent to 0.101 second high-speed video;
c) Sampling from the synthesized 0.101 second video in 60 FPS; collecting 3 1ms image frames in total, wherein the three frames of images are respectively collected from 1ms exposure images in the 1 st, 50 th and 101 th groups of images; a total of 24 ms image frames are acquired, the first 4ms image is acquired from the average value of 4ms images in the 17 th to 20 th groups, and the second 4ms image is acquired from the average value of 4ms images in the 67 th to 70 th groups; a total of 2 12ms image frames are acquired, the first 12ms image is acquired from the average value of 4 12ms images in the 21 st to 24 th groups, and the second 4ms image is acquired from the average value of 4 12ms images in the 71 st to 74 th groups;
d) 101 color images were converted into gray scale images and downsampled to 400 x 250 resolution, and pulse data were synthesized according to the principle of pulse emission.
Further, the neural network in step S3 includes two SG-deblur modules, an SG-interpolation module and a fusion frame inserting module, where the SG-deblur module is used to deblur the long exposure image, the SG-interpolation module mainly obtains a moving light flow field by the Spike camera optical flow module, inserts frames into the deblurred color image, and fuses the color image after frame insertion with the pulse reconstruction image by using the fusion frame inserting module.
Further, the neural network training of step S3 is as follows:
a) Long exposure image B blur And corresponding N pulses to reconstruct an image { I ] j Cascade together to form a tensor of dimension (b, c+n, H, W), b being the number of images per training input batch, c being the number of color channels, (H, W) being the image size; this tensor is input to the SG-deblur module, features are extracted by the CNN-based encoder, the convolution operation of each layer of the encoder is:
Figure BDA0004196763990000041
Figure BDA0004196763990000051
wherein (1)>
Figure BDA0004196763990000052
For a convolution kernel with window size w, d is the number of channels of the output tensor, +.>
Figure BDA0004196763990000053
The convolution operation is represented, x is an input tensor or the output of an upper characteristic convolution layer, b is an offset term, IN (·) is a batch normalization operation, the existing tensor is normalized, f is a linear rectification function, and y is the tensor with the channel number d, namely the characteristic obtained by extracting the characteristic convolution layer. The PixelSheffle layer adjusts the space size of the features to be 1/2, the number of channels is increased to be 2 times, then Residual dense block is adopted to further extract the features, and finally a CNN-based decoder is adopted to form a blur-free image; after the deblurring operation, 4 images with the exposure time of 4ms generate 4 non-blurred images, 12 images with the exposure time of 12ms generate 12 images, and the frame rate of the color video is improved from original 60FPS to 340FPS;
b) Interpolating the image sequence of 340FPS obtained in step a) to 1000FPS, wherein the specific algorithm is as follows: for any time t i Forward and backward query distances t i Two-frame color image B + And B - Respectively calculate t i Optical flow field of two frames of images from time to time and utilizing optical flow field pair B + And B - Performing deformation operation;
c) Each time 5 pulse reconstruction images are input into a fusion frame inserting module, the fusion frame inserting module firstly adopts PixelShellffe to convert the space size of the features into 2 times of the size, then adopts a CNN-based coding network to extract high-dimensional features from the extracted features, and simultaneously adopts another group of CNN-based encoders to extract the high-dimensional features from B + And B - Respectively extracting two groups of features, fusing the features from the pulse image and the color image by adopting a multi-scale fusion strategy, and finally obtaining an optimized color image C by adopting a CNN-based decoder i
d) Color image C to be output by neural network i G with real HDR image i Dynamic range compressionUsing the formula
Figure BDA0004196763990000054
Wherein μ represents the degree of compression, μ is selected to be 5000; calculating L1 loss between two gray maps after compressing dynamic range as +.>
Figure BDA0004196763990000055
Figure BDA0004196763990000056
SSIM loss of
Figure BDA0004196763990000057
LPIPS loss as
Figure BDA0004196763990000058
Total loss of->
Figure BDA0004196763990000059
Figure BDA00041967639900000510
Wherein the weight beta is lost 1 =1,β 2 And (1) continuously updating the weights of all layers of the SG-deblur module, the SG-interaction module and the fusion frame inserting module by a back propagation algorithm.
Compared with the prior art, the invention has the beneficial effects that:
according to the method for reconstructing the high-frame-rate high-dynamic-range video by fusing the pulse camera with the common camera, disclosed by the invention, the frame rate of reconstructing the HDR from the LDR video of the single color camera is greatly improved by fusing a plurality of alternately exposed LDR images with the output of the pulse camera, and the high-frame-rate high-dynamic-range imaging is realized. Meanwhile, a deep learning method is used, a network module is designed independently aiming at the difference of the LDR color image blurring problem and the inter-frame color missing problem in each aspect, and corresponding fusion flow steps are provided. In addition, the invention reserves rich time domain information by utilizing the high-speed pulse signal, greatly reduces the difficulty of resolving the blurring of the color image, can more accurately solve the nonlinear motion and enhances the robustness of the algorithm.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are needed in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments described in the present invention, and other drawings may be obtained according to these drawings for a person having ordinary skill in the art.
Fig. 1 is a flowchart of a method for reconstructing a high-frame-rate high-dynamic-range video by fusing a pulse camera with a common camera.
Fig. 2 is a network structure diagram of a method for reconstructing a high-frame-rate high-dynamic-range video by fusing a pulse camera with a common camera.
Fig. 3 is a flowchart of a method according to embodiment 1 of the present invention.
Detailed Description
In order to better understand the technical solution, the technical solution in the embodiment of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiment of the present invention. It will be apparent that the described examples are only some embodiments, but not all embodiments, of the present invention. Based on the embodiments of the present invention, those of ordinary skill in the art will be able to devise all other embodiments that are obtained based on this application and are within the scope of the present invention.
The method for reconstructing the high-frame-rate high-dynamic-range video by fusing the pulse camera with the common camera provided by the invention, as shown in fig. 1, comprises three steps of pulse signal preprocessing (a in fig. 1), color image preprocessing (c in fig. 1), pulse guiding frame inserting and fusing (b in fig. 1).
S1, pulse signal processing: and integrating photons asynchronously emitted by the pulse camera in a time domain to obtain a pulse reconstruction image, and calculating a pulse optical flow field.
Specifically, the method for representing the continuously emitted photons by the pulse camera in the time domain comprises the following steps:
S(x,y)={s(x,y,t)}
where {.cndot } represents the set, (x, y) is the spatial coordinates, t is the temporal coordinates, and when the integral of the photon by the pixel reaches the set threshold, the value of s is 1, and zero is juxtaposed to wait for the next integration process.
The pulse stream does not have the property of being watched by human eyes, and the motion field information contained in the pulse stream is obtained by image reconstruction and expressed as:
Figure BDA0004196763990000071
wherein I is i Is t i Reconstructing an image from pulses corresponding in time, t f Is the integration time window.
S2, preprocessing a color image: setting a cyclic alternating exposure mode of different exposure time of a color industrial camera, carrying out brightness correction on the image, carrying out deblurring calculation on the blurred image to obtain a group of potential images, and reconstructing color high frame rate on the potential images to obtain a blur-free image.
Specifically, the color industrial camera shutter time is set to a cyclic alternating exposure mode of 1ms, 4ms, 12ms, and the frame rate is set to 60FPS.
On the premise of no overexposure, the response value of the pixel is proportional to the exposure time, so that image frames shot at different exposure times have different brightnesses, and brightness correction needs to be performed on the images. When the shutter time is long, the image is blurred due to the displacement of the motion, the deblurring calculation is needed to be carried out on the long exposure image, a group of potential images are obtained, the image is deblurred by adopting the SG-deblur module, and the relation between the blurred image and the potential images is as follows:
Figure BDA0004196763990000072
wherein B is blur To blur an image, N is the number of potential images that need to be solved, B i Is the ith potential image. Because the blurred image may have overexposed areas, there is an unequal relationship between the blurred image and the potential imageIs tied up.
And reconstructing a 1000FPS color high frame rate through an SG-interaction module to obtain a blur-free image.
S3, pulse guided frame insertion and fusion: and (2) coloring the pulse reconstruction image obtained in the step (S1) by using a fusion frame inserting module based on a cyclic convolution neural network, and simultaneously inserting frames into the non-blurred image obtained in the step (S2), and outputting a color HDR video frame by frame.
Specifically, the resolution of the pulse reconstruction image is 400×250, and the resolution of the color image is 800×500, and super-resolution is required for the pulse reconstruction image to match the spatial resolutions of the pulse reconstruction image and the color image.
Performing feature extraction and spatial up-sampling on a pulse reconstruction image by using a PixelShuffle module P (), designing an encoder based on a convolutional neural network, performing feature extraction on a color image, fusing the features extracted from the pulse reconstruction image with the features of the color image, retaining time domain information by using ConvLSTM-based cyclic convolution operation, and finally obtaining a pulse reconstruction image I by using a CNN-based decoder i Corresponding color image C i
Figure BDA0004196763990000081
Figure BDA0004196763990000082
Is a fusion frame inserting module based on a cyclic convolution neural network.
As shown in FIG. 2, the cyclic convolutional neural network based on the design of the invention comprises two SG-deblur modules, a SG-interpolation module and a fusion frame inserting module, wherein the SG-deblur module is used for deblurring a long exposure image, the SG-interpolation module mainly obtains a moving light flow field by a pulse light flow (Spike camera optical flow) module, carries out frame inserting on the deblurred color image, and fuses the color image after frame inserting with a pulse reconstruction image by utilizing the fusion frame inserting module.
As shown in fig. 2, the neural network of step S3 is trained by using synthetic data, and the steps of synthesizing training data are as follows:
a) Setting the frame rate of a color industrial camera as 80FPS, and shooting video sequences in a low-speed moving scene with exposure time of 1ms, 4ms and 12ms in sequence, wherein each group of video comprises 303 LDR images, and 160 groups of training data and 40 groups of test data are shot in total;
b) Performing time dimension compression on the color images, wherein each 3 color images are combined into a group, 1 Jin Biaozhun HDR image is synthesized, and each group obtains 101 HDR images, which is equivalent to 0.101 second high-speed video;
c) Sampling from the synthesized 0.101 second video in 60 FPS; collecting 3 1ms image frames in total, wherein the three frames of images are respectively collected from 1ms exposure images in the 1 st, 50 th and 101 th groups of images; a total of 24 ms image frames are acquired, the first 4ms image is acquired from the average value of 4ms images in the 17 th to 20 th groups, and the second 4ms image is acquired from the average value of 4ms images in the 67 th to 70 th groups; a total of 2 12ms image frames are acquired, the first 12ms image is acquired from the average value of 4 12ms images in the 21 st to 24 th groups, and the second 4ms image is acquired from the average value of 4 12ms images in the 71 st to 74 th groups;
d) 101 color images were converted into gray scale images and downsampled to 400 x 250 resolution, and pulse data were synthesized according to the principle of pulse emission.
The training steps of the neural network of the invention are as follows:
a) Long exposure image B blur And corresponding N pulses to reconstruct an image { I ] j Cascade together to form a tensor of dimension (b, c+n, H, W), b being the number of images per training input batch, c being the number of color channels, (H, W) being the image size; this tensor is input to the SG-deblur module, features are extracted by the CNN-based encoder, the convolution operation of each layer of the encoder is:
Figure BDA0004196763990000091
Figure BDA0004196763990000092
wherein (1)>
Figure BDA0004196763990000093
For a convolution kernel with window size w, d is the number of channels of the output tensor, +.>
Figure BDA0004196763990000094
The convolution operation is represented, x is an input tensor or an output of an upper characteristic convolution layer, b is an offset term, IN (·) is a batch normalization operation, an existing tensor is normalized, f is a linear rectification function (Rectified Linear Unit, reLU), and y is a tensor with a channel number d, namely, a characteristic obtained by extracting the characteristic convolution layer. The PixelSheffle layer adjusts the space size of the features to be 1/2, the number of channels is increased to be 2 times, then Residual dense block is adopted to further extract the features, and finally a CNN-based decoder is adopted to form a blur-free image; after deblurring operation, 4 images with exposure time of 4ms generate 4 non-blurred images, 12 images with exposure time of 12ms generate 12 images, and the frame rate of the color video is increased from original 60FPS to (1+4+12) multiplied by 20=340 FPS;
b) The pulse optical flow module is adopted to interpolate the image sequence of 340FPS obtained in the step a) to 1000FPS, and the specific algorithm is as follows: for any time t i Forward and backward query distances t i Two-frame color image B + And B - Respectively calculate t i Optical flow field of two frames of images from time to time and utilizing optical flow field pair B + And B - Performing deformation operation;
c) Each time 5 pulse reconstruction images are input into a fusion frame inserting module, the fusion frame inserting module firstly adopts PixelShellffe to convert the space size of the features into 2 times of the size, then adopts a CNN-based coding network to extract high-dimensional features from the extracted features, and simultaneously adopts another group of CNN-based encoders to extract the high-dimensional features from B + And B - Respectively extracting two groups of features, fusing the features from the pulse image and the color image by adopting a multi-scale fusion strategy, and finally obtaining an optimized color image C by adopting a CNN-based decoder i
d) Will beColor image C output by neural network i G with real HDR image i Dynamic range compression is performed using the formula
Figure BDA0004196763990000101
Wherein μ represents the compression degree, μ is selected to be 5000 in this embodiment; calculating L1 loss between two gray maps after compressing dynamic range as +.>
Figure BDA0004196763990000102
SSIM loss of
Figure BDA0004196763990000103
LPIPS loss as
Figure BDA0004196763990000104
Total loss of
Figure BDA0004196763990000105
Wherein the weight beta is lost 1 =1,β 2 =1, and then the weight of each layer of the SG-deblur module, the SG-interaction module and the fusion frame inserting module is continuously updated by a back propagation algorithm
Example 1
The method for reconstructing the high-frame-rate high-dynamic-range video by fusing the pulse camera with the common camera, as shown in fig. 3, comprises the following steps:
a) Building a hybrid camera system: the hybrid camera system is tested by a common RGB industrial camera with the model of Basler acA800-510uc and a pulse camera, the common RGB industrial camera and the pulse camera use the same HIKROBOOT lens, and the incident light is divided into two beams by a spectroscope, and the two beams synchronously enter the fields of view of two camera sensors.
b) Reconstructing HDR gray map reconstruction of pulse camera: the pulse data is integrated by using a sliding window method to obtain a gray scale map, the window size is 1ms, and the resolution is 400 multiplied by 250.
c) The method comprises the steps of inputting an alternately exposed LDR image shot by a common RGB industrial camera and an HDR gray level image shot and reconstructed by a pulse camera into a trained neural network, firstly deblurring a color image through an SG-deblur module, reconstructing a 1000FPS color high frame rate through the SG-interpolation module, finally fusing the color image and the pulse reconstructed image through a fusion frame inserting module, and outputting a final high-resolution high-frame rate color HDR video.
The foregoing description is only of the preferred embodiments of the present invention and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention are included in the protection scope of the present invention.

Claims (10)

1. The method for reconstructing the high-frame-rate high-dynamic-range video by fusing the pulse camera with the common camera is characterized by comprising the following steps of:
s1, pulse signal processing: integrating photons asynchronously issued by a pulse camera in a time domain to obtain a pulse reconstruction image, and calculating a pulse light flow field;
s2, preprocessing a color image: setting a cyclic alternating exposure mode of different exposure time of a color industrial camera, carrying out brightness correction on an image, carrying out deblurring calculation on a blurred image to obtain a group of potential images, and reconstructing color high frame rate on the potential images to obtain a blur-free image;
s3, pulse guided frame insertion and fusion: and (2) coloring the pulse reconstruction image obtained in the step (S1) by using a fusion frame inserting module based on a cyclic convolution neural network, and simultaneously inserting frames into the non-blurred image obtained in the step (S2), and outputting a color HDR video frame by frame.
2. The method for reconstructing a high-frame-rate high-dynamic-range video by fusing a pulse camera with a normal camera according to claim 1, wherein in step S1, the method for representing continuously emitted photons by the pulse camera in a time domain is as follows:
S(x,y)={s(x,y,t)}
where {.cndot } represents the set, (x, y) is the spatial coordinates, t is the temporal coordinates, and when the integral of the photon by the pixel reaches the set threshold, the value of s is 1, and zero is juxtaposed to wait for the next integration process.
3. The method for reconstructing a high frame rate high dynamic range video by a pulse camera fused with a normal camera according to claim 1, wherein in step S1, motion field information contained in a pulse stream is obtained by image reconstruction, expressed as:
Figure FDA0004196763980000011
wherein I is i Is t i Reconstructing an image from pulses corresponding in time, t f Is the integration time window.
4. The method for reconstructing a high frame rate high dynamic range video by a pulsed camera fused with a normal camera according to claim 1, wherein in step S2, the color industry camera shutter time is set to a cyclic alternating exposure mode of 1ms, 4ms, 12ms, and the frame rate is set to 60FPS.
5. The method for reconstructing a high-frame-rate high-dynamic-range video by fusing a pulse camera with a common camera according to claim 1, wherein in step S2, an SG-deblur module is adopted to deblur an image, and the relationship between the blurred image and a potential image is as follows:
Figure FDA0004196763980000021
wherein B is blur To blur an image, N is the number of potential images that need to be solved, B i Is the ith potential image;
and reconstructing a 1000FPS color high frame rate through an SG-interaction module to obtain a blur-free image.
6. The method of claim 1, wherein in step S3, super-resolution is performed on the pulse reconstructed image to match the spatial resolutions of the pulse reconstructed image and the color image.
7. The method for reconstructing high-frame-rate high-dynamic-range video by fusing a pulse camera with a common camera according to claim 1, wherein in step S3, a PixelShuffle module P (·) is used for performing feature extraction and spatial upsampling on a pulse reconstructed image, a convolutional neural network-based encoder is used for performing feature extraction on a color image, features extracted from the pulse reconstructed image are fused with features of the color image, a ConvLSTM-based cyclic convolution operation is used for retaining time domain information, and finally a CNN-based decoder is used for obtaining a pulse reconstructed image I i Corresponding color image C i
8. The method for reconstructing a high frame rate high dynamic range video by a pulse camera fused with a normal camera according to claim 1, wherein the neural network of step S3 is trained by using synthesized data, and the step of synthesizing the training data is as follows:
a) Setting the frame rate of a color industrial camera as 80FPS, and shooting video sequences in a low-speed moving scene with exposure time of 1ms, 4ms and 12ms in sequence, wherein each group of video comprises 303 LDR images, and 160 groups of training data and 40 groups of test data are shot in total;
b) Performing time dimension compression on the color images, wherein each 3 color images are combined into a group, 1 Jin Biaozhun HDR image is synthesized, and each group obtains 101 HDR images, which is equivalent to 0.101 second high-speed video;
c) Sampling from the synthesized 0.101 second video in 60 FPS; collecting 3 1ms image frames in total, wherein the three frames of images are respectively collected from 1ms exposure images in the 1 st, 50 th and 101 th groups of images; a total of 24 ms image frames are acquired, the first 4ms image is acquired from the average value of 4ms images in the 17 th to 20 th groups, and the second 4ms image is acquired from the average value of 4ms images in the 67 th to 70 th groups; a total of 2 12ms image frames are acquired, the first 12ms image is acquired from the average value of 4 12ms images in the 21 st to 24 th groups, and the second 4ms image is acquired from the average value of 4 12ms images in the 71 st to 74 th groups;
d) 101 color images were converted into gray scale images and downsampled to 400 x 250 resolution, and pulse data were synthesized according to the principle of pulse emission.
9. The method for reconstructing a high-frame-rate high-dynamic-range video by fusing a pulse camera with a common camera according to claim 1, wherein the neural network in the step S3 comprises two SG-deblur modules, one SG-interpolation module and one fusion frame insertion module, the SG-deblur module is used for deblurring a long exposure image, the SG-interpolation module mainly obtains a moving light flow field by the Spike camera optical flow module, carries out frame insertion on the deblurred color image, and utilizes the fusion frame insertion module to fuse the color image after frame insertion with a pulse reconstruction image.
10. The method for reconstructing a high frame rate high dynamic range video by a pulse camera fused with a normal camera according to claim 9, wherein the step of training the neural network of step S3 is as follows:
a) Long exposure image B blur And corresponding N pulses to reconstruct an image { I ] j Cascade together to form a tensor of dimension (b, c+n, H, W), b being the number of images per training input batch, c being the number of color channels, (H, W) being the image size; this tensor is input to the SG-deblur module, features are extracted by the CNN-based encoder, the convolution operation of each layer of the encoder is:
Figure FDA0004196763980000031
Figure FDA0004196763980000032
wherein (1)>
Figure FDA0004196763980000033
For a convolution kernel with window size w, d is the number of channels of the output tensor, +.>
Figure FDA0004196763980000034
The convolution operation is represented, x is an input tensor or the output of an upper characteristic convolution layer, b is an offset term, IN (·) is a batch normalization operation, the existing tensor is normalized, f is a linear rectification function, and y is the tensor with the channel number d, namely the characteristic obtained by extracting the characteristic convolution layer. The PixelSheffle layer adjusts the space size of the features to be 1/2, the number of channels is increased to be 2 times, then Residual dense block is adopted to further extract the features, and finally a CNN-based decoder is adopted to form a blur-free image; after the deblurring operation, 4 images with the exposure time of 4ms generate 4 non-blurred images, 12 images with the exposure time of 12ms generate 12 images, and the frame rate of the color video is improved from original 60FPS to 340FPS;
b) Interpolating the image sequence of 340FPS obtained in step a) to 1000FPS, wherein the specific algorithm is as follows: for any time t i Forward and backward query distances t i Two-frame color image B + And B - Respectively calculate t i Optical flow field of two frames of images from time to time and utilizing optical flow field pair B + And B - Performing deformation operation;
c) Each time 5 pulse reconstruction images are input into a fusion frame inserting module, the fusion frame inserting module firstly adopts PixelShellffe to convert the space size of the features into 2 times of the size, then adopts a CNN-based coding network to extract high-dimensional features from the extracted features, and simultaneously adopts another group of CNN-based encoders to extract the high-dimensional features from B + And B - Respectively extracting two groups of features, fusing the features from the pulse image and the color image by adopting a multi-scale fusion strategy, and finally obtaining an optimized color image C by adopting a CNN-based decoder i
d) Color image C to be output by neural network i G with real HDR image i Dynamic range compression is performed using the formula
Figure FDA0004196763980000041
Wherein μ represents the degree of compression, μ is selected to be 5000; calculating L1 loss between two gray maps after compressing dynamic range as +.>
Figure FDA0004196763980000042
Figure FDA0004196763980000043
SSIMloss of
Figure FDA0004196763980000044
LPIPSLoss as
Figure FDA0004196763980000045
Total loss of->
Figure FDA0004196763980000046
Figure FDA0004196763980000047
Wherein the weight beta is lost 1 =1,β 2 And (1) continuously updating the weights of all layers of the SG-deblur module, the SG-interaction module and the fusion frame inserting module by a back propagation algorithm.
CN202310448820.6A 2023-04-24 2023-04-24 Method for reconstructing high-frame-rate high-dynamic-range video by fusing pulse camera with common camera Active CN116389912B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310448820.6A CN116389912B (en) 2023-04-24 2023-04-24 Method for reconstructing high-frame-rate high-dynamic-range video by fusing pulse camera with common camera

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310448820.6A CN116389912B (en) 2023-04-24 2023-04-24 Method for reconstructing high-frame-rate high-dynamic-range video by fusing pulse camera with common camera

Publications (2)

Publication Number Publication Date
CN116389912A true CN116389912A (en) 2023-07-04
CN116389912B CN116389912B (en) 2023-10-10

Family

ID=86965669

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310448820.6A Active CN116389912B (en) 2023-04-24 2023-04-24 Method for reconstructing high-frame-rate high-dynamic-range video by fusing pulse camera with common camera

Country Status (1)

Country Link
CN (1) CN116389912B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117745584A (en) * 2024-02-21 2024-03-22 北京大学 Image deblurring method and electronic equipment

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1370289A (en) * 1999-08-17 2002-09-18 应用视像系统有限公司 Improved dynamic range video camera, recording system, and recording method
CN109978808A (en) * 2019-04-25 2019-07-05 北京迈格威科技有限公司 A kind of method, apparatus and electronic equipment for image co-registration
CN111669514A (en) * 2020-06-08 2020-09-15 北京大学 High dynamic range imaging method and apparatus
CN113329146A (en) * 2021-04-25 2021-08-31 北京大学 Pulse camera simulation method and device
US20220156532A1 (en) * 2020-11-13 2022-05-19 Samsung Electronics Co., Ltd Fusing fbis & dvs data streams using a neural network
CN115883764A (en) * 2023-02-08 2023-03-31 吉林大学 Underwater high-speed video frame interpolation method and system based on data cooperation
CN115984124A (en) * 2022-11-29 2023-04-18 北京大学 Method and device for de-noising and super-resolution of neuromorphic pulse signals

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1370289A (en) * 1999-08-17 2002-09-18 应用视像系统有限公司 Improved dynamic range video camera, recording system, and recording method
CN109978808A (en) * 2019-04-25 2019-07-05 北京迈格威科技有限公司 A kind of method, apparatus and electronic equipment for image co-registration
CN111669514A (en) * 2020-06-08 2020-09-15 北京大学 High dynamic range imaging method and apparatus
US20220156532A1 (en) * 2020-11-13 2022-05-19 Samsung Electronics Co., Ltd Fusing fbis & dvs data streams using a neural network
CN113329146A (en) * 2021-04-25 2021-08-31 北京大学 Pulse camera simulation method and device
CN115984124A (en) * 2022-11-29 2023-04-18 北京大学 Method and device for de-noising and super-resolution of neuromorphic pulse signals
CN115883764A (en) * 2023-02-08 2023-03-31 吉林大学 Underwater high-speed video frame interpolation method and system based on data cooperation

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
J.HAN ET AL: "《Neuromorphic Camera Guided High Dynamic Range Imaging》", 《2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR)》, pages 1727 - 1736 *
Y. CHANG ET AL: "《1000 FPS HDR Video with a Spike-RGB Hybrid Camera》", 《2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR)》, pages 1 - 14 *
都琳;孙华燕;王帅;高宇轩;齐莹莹;: "《 针对动态目标的高动态范围图像融合算法研》", 《光学学报》, pages 1 - 9 *
黄铁军, 余肇飞, 李源等: "《脉冲视觉研究进展》", 《中国图象图形学报》, vol. 27, no. 6, pages 1823 - 1839 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117745584A (en) * 2024-02-21 2024-03-22 北京大学 Image deblurring method and electronic equipment

Also Published As

Publication number Publication date
CN116389912B (en) 2023-10-10

Similar Documents

Publication Publication Date Title
Mildenhall et al. Nerf in the dark: High dynamic range view synthesis from noisy raw images
US11721054B2 (en) Systems, methods, and media for high dynamic range quanta burst imaging
CN111986084B (en) Multi-camera low-illumination image quality enhancement method based on multi-task fusion
CN108055452A (en) Image processing method, device and equipment
CN107493432A (en) Image processing method, device, mobile terminal and computer-readable recording medium
CN108024054A (en) Image processing method, device and equipment
CN111986106A (en) High dynamic image reconstruction method based on neural network
CN110225260B (en) Three-dimensional high dynamic range imaging method based on generation countermeasure network
CN116389912B (en) Method for reconstructing high-frame-rate high-dynamic-range video by fusing pulse camera with common camera
CN114972061B (en) Method and system for denoising and enhancing dim light video
CN111724317A (en) Method for constructing Raw domain video denoising supervision data set
CN115393227B (en) Low-light full-color video image self-adaptive enhancement method and system based on deep learning
CN114245007A (en) High frame rate video synthesis method, device, equipment and storage medium
Yang et al. Learning event guided high dynamic range video reconstruction
CN115082341A (en) Low-light image enhancement method based on event camera
CN116563183A (en) High dynamic range image reconstruction method and system based on single RAW image
Fu et al. Low-light raw video denoising with a high-quality realistic motion dataset
Chang et al. 1000 fps hdr video with a spike-rgb hybrid camera
CN115115516A (en) Real-world video super-resolution algorithm based on Raw domain
Zou et al. Rawhdr: High dynamic range image reconstruction from a single raw image
US20230325974A1 (en) Image processing method, apparatus, and non-transitory computer-readable medium
CN117237207A (en) Ghost-free high dynamic range light field imaging method for dynamic scene
Suda et al. Deep snapshot hdr imaging using multi-exposure color filter array
CN116402908A (en) Dense light field image reconstruction method based on heterogeneous imaging
CN116208812A (en) Video frame inserting method and system based on stereo event and intensity camera

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant