CN116389912A - Method for reconstructing high-frame-rate high-dynamic-range video by fusing pulse camera with common camera - Google Patents
Method for reconstructing high-frame-rate high-dynamic-range video by fusing pulse camera with common camera Download PDFInfo
- Publication number
- CN116389912A CN116389912A CN202310448820.6A CN202310448820A CN116389912A CN 116389912 A CN116389912 A CN 116389912A CN 202310448820 A CN202310448820 A CN 202310448820A CN 116389912 A CN116389912 A CN 116389912A
- Authority
- CN
- China
- Prior art keywords
- image
- pulse
- images
- frame
- camera
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 40
- 230000004927 fusion Effects 0.000 claims abstract description 31
- 238000013528 artificial neural network Methods 0.000 claims abstract description 19
- 125000004122 cyclic group Chemical group 0.000 claims abstract description 16
- 230000003287 optical effect Effects 0.000 claims abstract description 14
- 238000003780 insertion Methods 0.000 claims abstract description 9
- 230000037431 insertion Effects 0.000 claims abstract description 9
- 238000007781 pre-processing Methods 0.000 claims abstract description 6
- 238000004364 calculation method Methods 0.000 claims abstract description 5
- 238000012937 correction Methods 0.000 claims abstract description 5
- 238000004040 coloring Methods 0.000 claims abstract description 4
- 238000012545 processing Methods 0.000 claims abstract description 4
- 238000013527 convolutional neural network Methods 0.000 claims description 22
- 238000012549 training Methods 0.000 claims description 15
- 230000006835 compression Effects 0.000 claims description 8
- 238000007906 compression Methods 0.000 claims description 8
- 238000000605 extraction Methods 0.000 claims description 6
- 230000010354 integration Effects 0.000 claims description 6
- 238000005070 sampling Methods 0.000 claims description 5
- 230000008569 process Effects 0.000 claims description 4
- 238000010606 normalization Methods 0.000 claims description 3
- 230000002194 synthesizing effect Effects 0.000 claims description 3
- 230000002123 temporal effect Effects 0.000 claims description 3
- 238000012360 testing method Methods 0.000 claims description 3
- 238000003384 imaging method Methods 0.000 abstract description 5
- 208000033962 Fontaine progeroid syndrome Diseases 0.000 description 4
- 238000013135 deep learning Methods 0.000 description 3
- 230000006872 improvement Effects 0.000 description 3
- 230000006870 function Effects 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000006073 displacement reaction Methods 0.000 description 1
- 210000000887 face Anatomy 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000008447 perception Effects 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 238000003786 synthesis reaction Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N23/00—Cameras or camera modules comprising electronic image sensors; Control thereof
- H04N23/70—Circuitry for compensating brightness variation in the scene
- H04N23/741—Circuitry for compensating brightness variation in the scene by increasing the dynamic range of the image compared to the dynamic range of the electronic image sensors
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N23/00—Cameras or camera modules comprising electronic image sensors; Control thereof
- H04N23/70—Circuitry for compensating brightness variation in the scene
- H04N23/73—Circuitry for compensating brightness variation in the scene by influencing the exposure time
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N23/00—Cameras or camera modules comprising electronic image sensors; Control thereof
- H04N23/70—Circuitry for compensating brightness variation in the scene
- H04N23/76—Circuitry for compensating brightness variation in the scene by influencing the image signals
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Image Processing (AREA)
- Studio Devices (AREA)
- Transforming Light Signals Into Electric Signals (AREA)
Abstract
The invention discloses a method for reconstructing a high-frame-rate high-dynamic-range video by fusing a pulse camera with a common camera, which comprises the following steps of: s1, pulse signal processing: integrating photons accumulated by a pulse camera in a time domain to obtain a pulse reconstruction image, and performing optical flow estimation; s2, preprocessing a color image: setting a cyclic alternating exposure mode of different exposure time of a color industrial camera, carrying out brightness correction on an image, carrying out deblurring calculation on a blurred image to obtain a group of potential images, and reconstructing color high frame rate on the potential images to obtain a blur-free image; s3, pulse guided frame insertion and fusion: and (2) coloring the pulse reconstruction image obtained in the step (S1) by using a fusion frame inserting module based on a cyclic convolution neural network, and simultaneously inserting frames into the non-blurred image obtained in the step (S2), and outputting a color HDR video frame by frame. The invention effectively improves the quality and frame rate of the fused image and realizes high-frame rate and high dynamic range imaging.
Description
Technical Field
The invention relates to the technical field of video generation, in particular to a method for reconstructing a high-frame-rate high-dynamic-range video by fusing a pulse camera with a common camera.
Background
How to shoot high frame rate and high dynamic color video in a high-speed moving scene is a problem which is difficult to perfectly solve in the field of bottom imaging. This requires that the photographer's camera system have basic characteristics of high frame rate, high resolution, high dynamics, low blur, and the like. However, commercial cameras based on conventional image frames still suffer from the drawbacks of large data redundancy and low dynamic range in high speed scenarios: the continuous video frames in the form of a plurality of bits have a large amount of redundancy, and are limited by transmission technology, when a camera reads out at a high frame rate, the resolution of the video frames can be greatly reduced, and the problem is improved to a certain extent by adopting on-chip storage, but the data reading out has no real-time property; the high frame rate necessarily limits short exposure due to the frame rate/shutter limit, resulting in poor perception of the low light scene by the camera, and increasing shutter time results in a decrease in frame rate, which creates problems of motion blur and inter-frame information hopping. It is because of the defect that traditional camera faces under high-speed scene, makes it difficult to satisfy actual demand in the application field of urgent demand and the scientific exploration field of great front foundation.
The dynamic range that can be seen by the human eye is approximately 10000:1, but the dynamic range of a normal camera can only reach 1000:1. the dynamic range over which a single photo can be taken is extremely limited. In order to correctly expose the information with high brightness, the part with low brightness in the scene becomes dark and generates noise by using too short exposure time, otherwise, in order to correctly expose the information with low brightness, the part with high brightness in the scene is overexposed and details are lost by using too long exposure time. LDR video frames by fusing alternating exposures are the dominant method of obtaining HDR video, but are limited to long exposures, the frame rate is typically limited to tens of FPSs, and thus cannot capture scenes moving at high speed.
Classical high dynamic range imaging mainly has two methods, namely, a special device is built to capture a wider dynamic range, a complex optical path system is required to be arranged in a camera, light is split into a plurality of photosensitive devices, and a plurality of groups of LDR videos are acquired and HDR synthesis is carried out by different exposure parameters of the devices. The other is to alternately shoot images with different exposure time lengths through a single camera cycle, map the images to a linear space, align any group of adjacent multiple alternative exposure images by adopting pixel-level motion alignment, calculate fusion parameters, weight each pixel of each image with different weights, and obtain the HDR video frame.
With the development of the deep learning method, the use of the comprehensive modeling capability of the neural network to the implicit data distribution to solve the underlying visual problem gradually becomes the mainstream method in recent years, and a series of inverse tone mapping high dynamic range imaging methods for designing network structures are generated. Most of the methods utilize a neural network with a plurality of LDR images trained to calculate an optical flow field so as to obtain alignment parameters among video frames, and further utilize a fusion network to obtain an HDR video frame. Compared with a method for acquiring HDR video by adopting special equipment, the method utilizes lighter equipment to acquire data, thereby reducing the hardware cost of shooting. However, the process of fusing the alternate exposure LDR video frames is very complex, the output result is easy to generate virtual images, the frame rate is greatly limited in improvement, for long exposure images, inaccuracy of image deblurring can be caused due to nonlinearity of scene motion, and the dependence of the deep learning method on training data makes the performance of the long exposure images unstable when the long exposure images are tested in scenes which are not contained in the training data set.
Disclosure of Invention
Aiming at the problems of insufficient frame rate improvement, image deblurring and the like of the existing high-dynamic-range video generation technology, the invention provides a method for reconstructing a high-frame-rate high-dynamic-range video by fusing a pulse camera with a common camera, which successfully fuses the advantages of a pulse signal with a high frame rate and a low resolution with a color camera signal with a low frame rate and a high resolution, and obtains a high-quality high-dynamic-range color video.
In order to achieve the above object, the present invention provides the following technical solutions:
the invention provides a method for reconstructing a high-frame-rate high-dynamic-range video by fusing a pulse camera with a common camera, which comprises the following steps of:
s1, pulse signal processing: integrating photons asynchronously issued by a pulse camera in a time domain to obtain a pulse reconstruction image, and calculating a pulse light flow field;
s2, preprocessing a color image: setting a cyclic alternating exposure mode of different exposure time of a color industrial camera, carrying out brightness correction on an image, carrying out deblurring calculation on a blurred image to obtain a group of potential images, and reconstructing color high frame rate on the potential images to obtain a blur-free image;
s3, pulse guided frame insertion and fusion: and (2) coloring the pulse reconstruction image obtained in the step (S1) by using a fusion frame inserting module based on a cyclic convolution neural network, and simultaneously inserting frames into the non-blurred image obtained in the step (S2), and outputting a color HDR video frame by frame.
Further, in step S1, the method for representing the continuously emitted photons by the pulse camera in the time domain includes:
S(x,y)={s(x,y,t)}
where {.cndot } represents the set, (x, y) is the spatial coordinates, t is the temporal coordinates, and when the integral of the photon by the pixel reaches the set threshold, the value of s is 1, and zero is juxtaposed to wait for the next integration process.
Further, in step S1, motion field information included in the pulse stream is obtained by image reconstruction, expressed as:
wherein I is i Is t i Reconstructing an image from pulses corresponding in time, t f Is the integration time window.
Further, in step S2, the color industrial camera shutter time is set to a cyclic alternating exposure mode of 1ms, 4ms, 12ms, and the frame rate is set to 60FPS.
Further, in step S2, the SG-deblur module is used to deblur the image, and the relationship between the blurred image and the potential image is:
wherein B is blur To blur an image, N is the number of potential images that need to be solved, B i Is the ith potential image;
and reconstructing a 1000FPS color high frame rate through an SG-interaction module to obtain a blur-free image.
Further, in step S3, super-resolution is performed on the pulse reconstructed image so as to match the spatial resolutions of the pulse reconstructed image and the color image.
Further, in step S3, the PixelShuffle module P (·) is used to perform feature extraction and spatial up-sampling on the pulse reconstructed image, the encoder based on the convolutional neural network performs feature extraction on the color image, the feature extracted from the pulse reconstructed image is fused with the feature of the color image, the ConvLSTM-based cyclic convolution operation is used to retain the information of the time domain, and finally the CNN-based decoder is used to obtain the pulse reconstructed image I i Corresponding color image C i 。
Further, training the neural network in the step S3 by using synthetic data, and synthesizing the training data as follows:
a) Setting the frame rate of a color industrial camera as 80FPS, and shooting video sequences in a low-speed moving scene with exposure time of 1ms, 4ms and 12ms in sequence, wherein each group of video comprises 303 LDR images, and 160 groups of training data and 40 groups of test data are shot in total;
b) Performing time dimension compression on the color images, wherein each 3 color images are combined into a group, 1 Jin Biaozhun HDR image is synthesized, and each group obtains 101 HDR images, which is equivalent to 0.101 second high-speed video;
c) Sampling from the synthesized 0.101 second video in 60 FPS; collecting 3 1ms image frames in total, wherein the three frames of images are respectively collected from 1ms exposure images in the 1 st, 50 th and 101 th groups of images; a total of 24 ms image frames are acquired, the first 4ms image is acquired from the average value of 4ms images in the 17 th to 20 th groups, and the second 4ms image is acquired from the average value of 4ms images in the 67 th to 70 th groups; a total of 2 12ms image frames are acquired, the first 12ms image is acquired from the average value of 4 12ms images in the 21 st to 24 th groups, and the second 4ms image is acquired from the average value of 4 12ms images in the 71 st to 74 th groups;
d) 101 color images were converted into gray scale images and downsampled to 400 x 250 resolution, and pulse data were synthesized according to the principle of pulse emission.
Further, the neural network in step S3 includes two SG-deblur modules, an SG-interpolation module and a fusion frame inserting module, where the SG-deblur module is used to deblur the long exposure image, the SG-interpolation module mainly obtains a moving light flow field by the Spike camera optical flow module, inserts frames into the deblurred color image, and fuses the color image after frame insertion with the pulse reconstruction image by using the fusion frame inserting module.
Further, the neural network training of step S3 is as follows:
a) Long exposure image B blur And corresponding N pulses to reconstruct an image { I ] j Cascade together to form a tensor of dimension (b, c+n, H, W), b being the number of images per training input batch, c being the number of color channels, (H, W) being the image size; this tensor is input to the SG-deblur module, features are extracted by the CNN-based encoder, the convolution operation of each layer of the encoder is: wherein (1)>For a convolution kernel with window size w, d is the number of channels of the output tensor, +.>The convolution operation is represented, x is an input tensor or the output of an upper characteristic convolution layer, b is an offset term, IN (·) is a batch normalization operation, the existing tensor is normalized, f is a linear rectification function, and y is the tensor with the channel number d, namely the characteristic obtained by extracting the characteristic convolution layer. The PixelSheffle layer adjusts the space size of the features to be 1/2, the number of channels is increased to be 2 times, then Residual dense block is adopted to further extract the features, and finally a CNN-based decoder is adopted to form a blur-free image; after the deblurring operation, 4 images with the exposure time of 4ms generate 4 non-blurred images, 12 images with the exposure time of 12ms generate 12 images, and the frame rate of the color video is improved from original 60FPS to 340FPS;
b) Interpolating the image sequence of 340FPS obtained in step a) to 1000FPS, wherein the specific algorithm is as follows: for any time t i Forward and backward query distances t i Two-frame color image B + And B - Respectively calculate t i Optical flow field of two frames of images from time to time and utilizing optical flow field pair B + And B - Performing deformation operation;
c) Each time 5 pulse reconstruction images are input into a fusion frame inserting module, the fusion frame inserting module firstly adopts PixelShellffe to convert the space size of the features into 2 times of the size, then adopts a CNN-based coding network to extract high-dimensional features from the extracted features, and simultaneously adopts another group of CNN-based encoders to extract the high-dimensional features from B + And B - Respectively extracting two groups of features, fusing the features from the pulse image and the color image by adopting a multi-scale fusion strategy, and finally obtaining an optimized color image C by adopting a CNN-based decoder i ;
d) Color image C to be output by neural network i G with real HDR image i Dynamic range compressionUsing the formulaWherein μ represents the degree of compression, μ is selected to be 5000; calculating L1 loss between two gray maps after compressing dynamic range as +.> SSIM loss ofLPIPS loss asTotal loss of-> Wherein the weight beta is lost 1 =1,β 2 And (1) continuously updating the weights of all layers of the SG-deblur module, the SG-interaction module and the fusion frame inserting module by a back propagation algorithm.
Compared with the prior art, the invention has the beneficial effects that:
according to the method for reconstructing the high-frame-rate high-dynamic-range video by fusing the pulse camera with the common camera, disclosed by the invention, the frame rate of reconstructing the HDR from the LDR video of the single color camera is greatly improved by fusing a plurality of alternately exposed LDR images with the output of the pulse camera, and the high-frame-rate high-dynamic-range imaging is realized. Meanwhile, a deep learning method is used, a network module is designed independently aiming at the difference of the LDR color image blurring problem and the inter-frame color missing problem in each aspect, and corresponding fusion flow steps are provided. In addition, the invention reserves rich time domain information by utilizing the high-speed pulse signal, greatly reduces the difficulty of resolving the blurring of the color image, can more accurately solve the nonlinear motion and enhances the robustness of the algorithm.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are needed in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments described in the present invention, and other drawings may be obtained according to these drawings for a person having ordinary skill in the art.
Fig. 1 is a flowchart of a method for reconstructing a high-frame-rate high-dynamic-range video by fusing a pulse camera with a common camera.
Fig. 2 is a network structure diagram of a method for reconstructing a high-frame-rate high-dynamic-range video by fusing a pulse camera with a common camera.
Fig. 3 is a flowchart of a method according to embodiment 1 of the present invention.
Detailed Description
In order to better understand the technical solution, the technical solution in the embodiment of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiment of the present invention. It will be apparent that the described examples are only some embodiments, but not all embodiments, of the present invention. Based on the embodiments of the present invention, those of ordinary skill in the art will be able to devise all other embodiments that are obtained based on this application and are within the scope of the present invention.
The method for reconstructing the high-frame-rate high-dynamic-range video by fusing the pulse camera with the common camera provided by the invention, as shown in fig. 1, comprises three steps of pulse signal preprocessing (a in fig. 1), color image preprocessing (c in fig. 1), pulse guiding frame inserting and fusing (b in fig. 1).
S1, pulse signal processing: and integrating photons asynchronously emitted by the pulse camera in a time domain to obtain a pulse reconstruction image, and calculating a pulse optical flow field.
Specifically, the method for representing the continuously emitted photons by the pulse camera in the time domain comprises the following steps:
S(x,y)={s(x,y,t)}
where {.cndot } represents the set, (x, y) is the spatial coordinates, t is the temporal coordinates, and when the integral of the photon by the pixel reaches the set threshold, the value of s is 1, and zero is juxtaposed to wait for the next integration process.
The pulse stream does not have the property of being watched by human eyes, and the motion field information contained in the pulse stream is obtained by image reconstruction and expressed as:
wherein I is i Is t i Reconstructing an image from pulses corresponding in time, t f Is the integration time window.
S2, preprocessing a color image: setting a cyclic alternating exposure mode of different exposure time of a color industrial camera, carrying out brightness correction on the image, carrying out deblurring calculation on the blurred image to obtain a group of potential images, and reconstructing color high frame rate on the potential images to obtain a blur-free image.
Specifically, the color industrial camera shutter time is set to a cyclic alternating exposure mode of 1ms, 4ms, 12ms, and the frame rate is set to 60FPS.
On the premise of no overexposure, the response value of the pixel is proportional to the exposure time, so that image frames shot at different exposure times have different brightnesses, and brightness correction needs to be performed on the images. When the shutter time is long, the image is blurred due to the displacement of the motion, the deblurring calculation is needed to be carried out on the long exposure image, a group of potential images are obtained, the image is deblurred by adopting the SG-deblur module, and the relation between the blurred image and the potential images is as follows:
wherein B is blur To blur an image, N is the number of potential images that need to be solved, B i Is the ith potential image. Because the blurred image may have overexposed areas, there is an unequal relationship between the blurred image and the potential imageIs tied up.
And reconstructing a 1000FPS color high frame rate through an SG-interaction module to obtain a blur-free image.
S3, pulse guided frame insertion and fusion: and (2) coloring the pulse reconstruction image obtained in the step (S1) by using a fusion frame inserting module based on a cyclic convolution neural network, and simultaneously inserting frames into the non-blurred image obtained in the step (S2), and outputting a color HDR video frame by frame.
Specifically, the resolution of the pulse reconstruction image is 400×250, and the resolution of the color image is 800×500, and super-resolution is required for the pulse reconstruction image to match the spatial resolutions of the pulse reconstruction image and the color image.
Performing feature extraction and spatial up-sampling on a pulse reconstruction image by using a PixelShuffle module P (), designing an encoder based on a convolutional neural network, performing feature extraction on a color image, fusing the features extracted from the pulse reconstruction image with the features of the color image, retaining time domain information by using ConvLSTM-based cyclic convolution operation, and finally obtaining a pulse reconstruction image I by using a CNN-based decoder i Corresponding color image C i :
As shown in FIG. 2, the cyclic convolutional neural network based on the design of the invention comprises two SG-deblur modules, a SG-interpolation module and a fusion frame inserting module, wherein the SG-deblur module is used for deblurring a long exposure image, the SG-interpolation module mainly obtains a moving light flow field by a pulse light flow (Spike camera optical flow) module, carries out frame inserting on the deblurred color image, and fuses the color image after frame inserting with a pulse reconstruction image by utilizing the fusion frame inserting module.
As shown in fig. 2, the neural network of step S3 is trained by using synthetic data, and the steps of synthesizing training data are as follows:
a) Setting the frame rate of a color industrial camera as 80FPS, and shooting video sequences in a low-speed moving scene with exposure time of 1ms, 4ms and 12ms in sequence, wherein each group of video comprises 303 LDR images, and 160 groups of training data and 40 groups of test data are shot in total;
b) Performing time dimension compression on the color images, wherein each 3 color images are combined into a group, 1 Jin Biaozhun HDR image is synthesized, and each group obtains 101 HDR images, which is equivalent to 0.101 second high-speed video;
c) Sampling from the synthesized 0.101 second video in 60 FPS; collecting 3 1ms image frames in total, wherein the three frames of images are respectively collected from 1ms exposure images in the 1 st, 50 th and 101 th groups of images; a total of 24 ms image frames are acquired, the first 4ms image is acquired from the average value of 4ms images in the 17 th to 20 th groups, and the second 4ms image is acquired from the average value of 4ms images in the 67 th to 70 th groups; a total of 2 12ms image frames are acquired, the first 12ms image is acquired from the average value of 4 12ms images in the 21 st to 24 th groups, and the second 4ms image is acquired from the average value of 4 12ms images in the 71 st to 74 th groups;
d) 101 color images were converted into gray scale images and downsampled to 400 x 250 resolution, and pulse data were synthesized according to the principle of pulse emission.
The training steps of the neural network of the invention are as follows:
a) Long exposure image B blur And corresponding N pulses to reconstruct an image { I ] j Cascade together to form a tensor of dimension (b, c+n, H, W), b being the number of images per training input batch, c being the number of color channels, (H, W) being the image size; this tensor is input to the SG-deblur module, features are extracted by the CNN-based encoder, the convolution operation of each layer of the encoder is: wherein (1)>For a convolution kernel with window size w, d is the number of channels of the output tensor, +.>The convolution operation is represented, x is an input tensor or an output of an upper characteristic convolution layer, b is an offset term, IN (·) is a batch normalization operation, an existing tensor is normalized, f is a linear rectification function (Rectified Linear Unit, reLU), and y is a tensor with a channel number d, namely, a characteristic obtained by extracting the characteristic convolution layer. The PixelSheffle layer adjusts the space size of the features to be 1/2, the number of channels is increased to be 2 times, then Residual dense block is adopted to further extract the features, and finally a CNN-based decoder is adopted to form a blur-free image; after deblurring operation, 4 images with exposure time of 4ms generate 4 non-blurred images, 12 images with exposure time of 12ms generate 12 images, and the frame rate of the color video is increased from original 60FPS to (1+4+12) multiplied by 20=340 FPS;
b) The pulse optical flow module is adopted to interpolate the image sequence of 340FPS obtained in the step a) to 1000FPS, and the specific algorithm is as follows: for any time t i Forward and backward query distances t i Two-frame color image B + And B - Respectively calculate t i Optical flow field of two frames of images from time to time and utilizing optical flow field pair B + And B - Performing deformation operation;
c) Each time 5 pulse reconstruction images are input into a fusion frame inserting module, the fusion frame inserting module firstly adopts PixelShellffe to convert the space size of the features into 2 times of the size, then adopts a CNN-based coding network to extract high-dimensional features from the extracted features, and simultaneously adopts another group of CNN-based encoders to extract the high-dimensional features from B + And B - Respectively extracting two groups of features, fusing the features from the pulse image and the color image by adopting a multi-scale fusion strategy, and finally obtaining an optimized color image C by adopting a CNN-based decoder i ;
d) Will beColor image C output by neural network i G with real HDR image i Dynamic range compression is performed using the formulaWherein μ represents the compression degree, μ is selected to be 5000 in this embodiment; calculating L1 loss between two gray maps after compressing dynamic range as +.>SSIM loss ofLPIPS loss asTotal loss ofWherein the weight beta is lost 1 =1,β 2 =1, and then the weight of each layer of the SG-deblur module, the SG-interaction module and the fusion frame inserting module is continuously updated by a back propagation algorithm
Example 1
The method for reconstructing the high-frame-rate high-dynamic-range video by fusing the pulse camera with the common camera, as shown in fig. 3, comprises the following steps:
a) Building a hybrid camera system: the hybrid camera system is tested by a common RGB industrial camera with the model of Basler acA800-510uc and a pulse camera, the common RGB industrial camera and the pulse camera use the same HIKROBOOT lens, and the incident light is divided into two beams by a spectroscope, and the two beams synchronously enter the fields of view of two camera sensors.
b) Reconstructing HDR gray map reconstruction of pulse camera: the pulse data is integrated by using a sliding window method to obtain a gray scale map, the window size is 1ms, and the resolution is 400 multiplied by 250.
c) The method comprises the steps of inputting an alternately exposed LDR image shot by a common RGB industrial camera and an HDR gray level image shot and reconstructed by a pulse camera into a trained neural network, firstly deblurring a color image through an SG-deblur module, reconstructing a 1000FPS color high frame rate through the SG-interpolation module, finally fusing the color image and the pulse reconstructed image through a fusion frame inserting module, and outputting a final high-resolution high-frame rate color HDR video.
The foregoing description is only of the preferred embodiments of the present invention and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention are included in the protection scope of the present invention.
Claims (10)
1. The method for reconstructing the high-frame-rate high-dynamic-range video by fusing the pulse camera with the common camera is characterized by comprising the following steps of:
s1, pulse signal processing: integrating photons asynchronously issued by a pulse camera in a time domain to obtain a pulse reconstruction image, and calculating a pulse light flow field;
s2, preprocessing a color image: setting a cyclic alternating exposure mode of different exposure time of a color industrial camera, carrying out brightness correction on an image, carrying out deblurring calculation on a blurred image to obtain a group of potential images, and reconstructing color high frame rate on the potential images to obtain a blur-free image;
s3, pulse guided frame insertion and fusion: and (2) coloring the pulse reconstruction image obtained in the step (S1) by using a fusion frame inserting module based on a cyclic convolution neural network, and simultaneously inserting frames into the non-blurred image obtained in the step (S2), and outputting a color HDR video frame by frame.
2. The method for reconstructing a high-frame-rate high-dynamic-range video by fusing a pulse camera with a normal camera according to claim 1, wherein in step S1, the method for representing continuously emitted photons by the pulse camera in a time domain is as follows:
S(x,y)={s(x,y,t)}
where {.cndot } represents the set, (x, y) is the spatial coordinates, t is the temporal coordinates, and when the integral of the photon by the pixel reaches the set threshold, the value of s is 1, and zero is juxtaposed to wait for the next integration process.
3. The method for reconstructing a high frame rate high dynamic range video by a pulse camera fused with a normal camera according to claim 1, wherein in step S1, motion field information contained in a pulse stream is obtained by image reconstruction, expressed as:
wherein I is i Is t i Reconstructing an image from pulses corresponding in time, t f Is the integration time window.
4. The method for reconstructing a high frame rate high dynamic range video by a pulsed camera fused with a normal camera according to claim 1, wherein in step S2, the color industry camera shutter time is set to a cyclic alternating exposure mode of 1ms, 4ms, 12ms, and the frame rate is set to 60FPS.
5. The method for reconstructing a high-frame-rate high-dynamic-range video by fusing a pulse camera with a common camera according to claim 1, wherein in step S2, an SG-deblur module is adopted to deblur an image, and the relationship between the blurred image and a potential image is as follows:
wherein B is blur To blur an image, N is the number of potential images that need to be solved, B i Is the ith potential image;
and reconstructing a 1000FPS color high frame rate through an SG-interaction module to obtain a blur-free image.
6. The method of claim 1, wherein in step S3, super-resolution is performed on the pulse reconstructed image to match the spatial resolutions of the pulse reconstructed image and the color image.
7. The method for reconstructing high-frame-rate high-dynamic-range video by fusing a pulse camera with a common camera according to claim 1, wherein in step S3, a PixelShuffle module P (·) is used for performing feature extraction and spatial upsampling on a pulse reconstructed image, a convolutional neural network-based encoder is used for performing feature extraction on a color image, features extracted from the pulse reconstructed image are fused with features of the color image, a ConvLSTM-based cyclic convolution operation is used for retaining time domain information, and finally a CNN-based decoder is used for obtaining a pulse reconstructed image I i Corresponding color image C i 。
8. The method for reconstructing a high frame rate high dynamic range video by a pulse camera fused with a normal camera according to claim 1, wherein the neural network of step S3 is trained by using synthesized data, and the step of synthesizing the training data is as follows:
a) Setting the frame rate of a color industrial camera as 80FPS, and shooting video sequences in a low-speed moving scene with exposure time of 1ms, 4ms and 12ms in sequence, wherein each group of video comprises 303 LDR images, and 160 groups of training data and 40 groups of test data are shot in total;
b) Performing time dimension compression on the color images, wherein each 3 color images are combined into a group, 1 Jin Biaozhun HDR image is synthesized, and each group obtains 101 HDR images, which is equivalent to 0.101 second high-speed video;
c) Sampling from the synthesized 0.101 second video in 60 FPS; collecting 3 1ms image frames in total, wherein the three frames of images are respectively collected from 1ms exposure images in the 1 st, 50 th and 101 th groups of images; a total of 24 ms image frames are acquired, the first 4ms image is acquired from the average value of 4ms images in the 17 th to 20 th groups, and the second 4ms image is acquired from the average value of 4ms images in the 67 th to 70 th groups; a total of 2 12ms image frames are acquired, the first 12ms image is acquired from the average value of 4 12ms images in the 21 st to 24 th groups, and the second 4ms image is acquired from the average value of 4 12ms images in the 71 st to 74 th groups;
d) 101 color images were converted into gray scale images and downsampled to 400 x 250 resolution, and pulse data were synthesized according to the principle of pulse emission.
9. The method for reconstructing a high-frame-rate high-dynamic-range video by fusing a pulse camera with a common camera according to claim 1, wherein the neural network in the step S3 comprises two SG-deblur modules, one SG-interpolation module and one fusion frame insertion module, the SG-deblur module is used for deblurring a long exposure image, the SG-interpolation module mainly obtains a moving light flow field by the Spike camera optical flow module, carries out frame insertion on the deblurred color image, and utilizes the fusion frame insertion module to fuse the color image after frame insertion with a pulse reconstruction image.
10. The method for reconstructing a high frame rate high dynamic range video by a pulse camera fused with a normal camera according to claim 9, wherein the step of training the neural network of step S3 is as follows:
a) Long exposure image B blur And corresponding N pulses to reconstruct an image { I ] j Cascade together to form a tensor of dimension (b, c+n, H, W), b being the number of images per training input batch, c being the number of color channels, (H, W) being the image size; this tensor is input to the SG-deblur module, features are extracted by the CNN-based encoder, the convolution operation of each layer of the encoder is: wherein (1)>For a convolution kernel with window size w, d is the number of channels of the output tensor, +.>The convolution operation is represented, x is an input tensor or the output of an upper characteristic convolution layer, b is an offset term, IN (·) is a batch normalization operation, the existing tensor is normalized, f is a linear rectification function, and y is the tensor with the channel number d, namely the characteristic obtained by extracting the characteristic convolution layer. The PixelSheffle layer adjusts the space size of the features to be 1/2, the number of channels is increased to be 2 times, then Residual dense block is adopted to further extract the features, and finally a CNN-based decoder is adopted to form a blur-free image; after the deblurring operation, 4 images with the exposure time of 4ms generate 4 non-blurred images, 12 images with the exposure time of 12ms generate 12 images, and the frame rate of the color video is improved from original 60FPS to 340FPS;
b) Interpolating the image sequence of 340FPS obtained in step a) to 1000FPS, wherein the specific algorithm is as follows: for any time t i Forward and backward query distances t i Two-frame color image B + And B - Respectively calculate t i Optical flow field of two frames of images from time to time and utilizing optical flow field pair B + And B - Performing deformation operation;
c) Each time 5 pulse reconstruction images are input into a fusion frame inserting module, the fusion frame inserting module firstly adopts PixelShellffe to convert the space size of the features into 2 times of the size, then adopts a CNN-based coding network to extract high-dimensional features from the extracted features, and simultaneously adopts another group of CNN-based encoders to extract the high-dimensional features from B + And B - Respectively extracting two groups of features, fusing the features from the pulse image and the color image by adopting a multi-scale fusion strategy, and finally obtaining an optimized color image C by adopting a CNN-based decoder i ;
d) Color image C to be output by neural network i G with real HDR image i Dynamic range compression is performed using the formulaWherein μ represents the degree of compression, μ is selected to be 5000; calculating L1 loss between two gray maps after compressing dynamic range as +.> SSIMloss ofLPIPSLoss asTotal loss of-> Wherein the weight beta is lost 1 =1,β 2 And (1) continuously updating the weights of all layers of the SG-deblur module, the SG-interaction module and the fusion frame inserting module by a back propagation algorithm.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310448820.6A CN116389912B (en) | 2023-04-24 | 2023-04-24 | Method for reconstructing high-frame-rate high-dynamic-range video by fusing pulse camera with common camera |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310448820.6A CN116389912B (en) | 2023-04-24 | 2023-04-24 | Method for reconstructing high-frame-rate high-dynamic-range video by fusing pulse camera with common camera |
Publications (2)
Publication Number | Publication Date |
---|---|
CN116389912A true CN116389912A (en) | 2023-07-04 |
CN116389912B CN116389912B (en) | 2023-10-10 |
Family
ID=86965669
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310448820.6A Active CN116389912B (en) | 2023-04-24 | 2023-04-24 | Method for reconstructing high-frame-rate high-dynamic-range video by fusing pulse camera with common camera |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116389912B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117745584A (en) * | 2024-02-21 | 2024-03-22 | 北京大学 | Image deblurring method and electronic equipment |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1370289A (en) * | 1999-08-17 | 2002-09-18 | 应用视像系统有限公司 | Improved dynamic range video camera, recording system, and recording method |
CN109978808A (en) * | 2019-04-25 | 2019-07-05 | 北京迈格威科技有限公司 | A kind of method, apparatus and electronic equipment for image co-registration |
CN111669514A (en) * | 2020-06-08 | 2020-09-15 | 北京大学 | High dynamic range imaging method and apparatus |
CN113329146A (en) * | 2021-04-25 | 2021-08-31 | 北京大学 | Pulse camera simulation method and device |
US20220156532A1 (en) * | 2020-11-13 | 2022-05-19 | Samsung Electronics Co., Ltd | Fusing fbis & dvs data streams using a neural network |
CN115883764A (en) * | 2023-02-08 | 2023-03-31 | 吉林大学 | Underwater high-speed video frame interpolation method and system based on data cooperation |
CN115984124A (en) * | 2022-11-29 | 2023-04-18 | 北京大学 | Method and device for de-noising and super-resolution of neuromorphic pulse signals |
-
2023
- 2023-04-24 CN CN202310448820.6A patent/CN116389912B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1370289A (en) * | 1999-08-17 | 2002-09-18 | 应用视像系统有限公司 | Improved dynamic range video camera, recording system, and recording method |
CN109978808A (en) * | 2019-04-25 | 2019-07-05 | 北京迈格威科技有限公司 | A kind of method, apparatus and electronic equipment for image co-registration |
CN111669514A (en) * | 2020-06-08 | 2020-09-15 | 北京大学 | High dynamic range imaging method and apparatus |
US20220156532A1 (en) * | 2020-11-13 | 2022-05-19 | Samsung Electronics Co., Ltd | Fusing fbis & dvs data streams using a neural network |
CN113329146A (en) * | 2021-04-25 | 2021-08-31 | 北京大学 | Pulse camera simulation method and device |
CN115984124A (en) * | 2022-11-29 | 2023-04-18 | 北京大学 | Method and device for de-noising and super-resolution of neuromorphic pulse signals |
CN115883764A (en) * | 2023-02-08 | 2023-03-31 | 吉林大学 | Underwater high-speed video frame interpolation method and system based on data cooperation |
Non-Patent Citations (4)
Title |
---|
J.HAN ET AL: "《Neuromorphic Camera Guided High Dynamic Range Imaging》", 《2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR)》, pages 1727 - 1736 * |
Y. CHANG ET AL: "《1000 FPS HDR Video with a Spike-RGB Hybrid Camera》", 《2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR)》, pages 1 - 14 * |
都琳;孙华燕;王帅;高宇轩;齐莹莹;: "《 针对动态目标的高动态范围图像融合算法研》", 《光学学报》, pages 1 - 9 * |
黄铁军, 余肇飞, 李源等: "《脉冲视觉研究进展》", 《中国图象图形学报》, vol. 27, no. 6, pages 1823 - 1839 * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117745584A (en) * | 2024-02-21 | 2024-03-22 | 北京大学 | Image deblurring method and electronic equipment |
Also Published As
Publication number | Publication date |
---|---|
CN116389912B (en) | 2023-10-10 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Mildenhall et al. | Nerf in the dark: High dynamic range view synthesis from noisy raw images | |
US11721054B2 (en) | Systems, methods, and media for high dynamic range quanta burst imaging | |
CN111986084B (en) | Multi-camera low-illumination image quality enhancement method based on multi-task fusion | |
CN108055452A (en) | Image processing method, device and equipment | |
CN107493432A (en) | Image processing method, device, mobile terminal and computer-readable recording medium | |
CN108024054A (en) | Image processing method, device and equipment | |
CN111986106A (en) | High dynamic image reconstruction method based on neural network | |
CN110225260B (en) | Three-dimensional high dynamic range imaging method based on generation countermeasure network | |
CN116389912B (en) | Method for reconstructing high-frame-rate high-dynamic-range video by fusing pulse camera with common camera | |
CN114972061B (en) | Method and system for denoising and enhancing dim light video | |
CN111724317A (en) | Method for constructing Raw domain video denoising supervision data set | |
CN115393227B (en) | Low-light full-color video image self-adaptive enhancement method and system based on deep learning | |
CN114245007A (en) | High frame rate video synthesis method, device, equipment and storage medium | |
Yang et al. | Learning event guided high dynamic range video reconstruction | |
CN115082341A (en) | Low-light image enhancement method based on event camera | |
CN116563183A (en) | High dynamic range image reconstruction method and system based on single RAW image | |
Fu et al. | Low-light raw video denoising with a high-quality realistic motion dataset | |
Chang et al. | 1000 fps hdr video with a spike-rgb hybrid camera | |
CN115115516A (en) | Real-world video super-resolution algorithm based on Raw domain | |
Zou et al. | Rawhdr: High dynamic range image reconstruction from a single raw image | |
US20230325974A1 (en) | Image processing method, apparatus, and non-transitory computer-readable medium | |
CN117237207A (en) | Ghost-free high dynamic range light field imaging method for dynamic scene | |
Suda et al. | Deep snapshot hdr imaging using multi-exposure color filter array | |
CN116402908A (en) | Dense light field image reconstruction method based on heterogeneous imaging | |
CN116208812A (en) | Video frame inserting method and system based on stereo event and intensity camera |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |