CN114885144A - High frame rate 3D video generation method and device based on data fusion - Google Patents
High frame rate 3D video generation method and device based on data fusion Download PDFInfo
- Publication number
- CN114885144A CN114885144A CN202210293645.3A CN202210293645A CN114885144A CN 114885144 A CN114885144 A CN 114885144A CN 202210293645 A CN202210293645 A CN 202210293645A CN 114885144 A CN114885144 A CN 114885144A
- Authority
- CN
- China
- Prior art keywords
- frame rate
- video
- event stream
- event
- frames
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 230000004927 fusion Effects 0.000 title claims abstract description 63
- 238000000034 method Methods 0.000 title claims abstract description 50
- 239000013598 vector Substances 0.000 claims abstract description 36
- 238000013528 artificial neural network Methods 0.000 claims abstract description 34
- 238000012421 spiking Methods 0.000 claims description 33
- 238000004364 calculation method Methods 0.000 claims description 23
- 230000015572 biosynthetic process Effects 0.000 claims description 16
- 238000003786 synthesis reaction Methods 0.000 claims description 16
- 210000002569 neuron Anatomy 0.000 claims description 12
- 238000004590 computer program Methods 0.000 claims description 6
- 230000004044 response Effects 0.000 claims description 6
- 238000007781 pre-processing Methods 0.000 claims description 3
- 230000001902 propagating effect Effects 0.000 claims 1
- 238000005516 engineering process Methods 0.000 abstract description 2
- 230000008569 process Effects 0.000 description 9
- 230000006870 function Effects 0.000 description 8
- 238000004891 communication Methods 0.000 description 7
- 238000010586 diagram Methods 0.000 description 7
- 230000008859 change Effects 0.000 description 4
- 238000012545 processing Methods 0.000 description 4
- 230000006872 improvement Effects 0.000 description 3
- 238000003491 array Methods 0.000 description 2
- 238000010276 construction Methods 0.000 description 2
- 238000013527 convolutional neural network Methods 0.000 description 2
- 230000007423 decrease Effects 0.000 description 2
- 239000000463 material Substances 0.000 description 2
- 239000012528 membrane Substances 0.000 description 2
- 230000003247 decreasing effect Effects 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000000802 evaporation-induced self-assembly Methods 0.000 description 1
- 239000000835 fiber Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
- 238000012805 post-processing Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000002123 temporal effect Effects 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N13/00—Stereoscopic video systems; Multi-view video systems; Details thereof
- H04N13/10—Processing, recording or transmission of stereoscopic or multi-view image signals
- H04N13/106—Processing image signals
- H04N13/111—Transformation of image signals corresponding to virtual viewpoints, e.g. spatial image interpolation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/049—Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N7/00—Television systems
- H04N7/01—Conversion of standards, e.g. involving analogue television standards or digital television standards processed at pixel level
- H04N7/0127—Conversion of standards, e.g. involving analogue television standards or digital television standards processed at pixel level by changing the field or frame frequency of the incoming video signal, e.g. frame rate converter
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Molecular Biology (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Testing, Inspecting, Measuring Of Stereoscopic Televisions And Televisions (AREA)
Abstract
Description
技术领域technical field
本申请涉及计算机视觉及神经形态计算技术领域,特别涉及一种基于数据融合的高帧率3D视频生成方法及装置。The present application relates to the technical fields of computer vision and neuromorphic computing, and in particular, to a method and device for generating high frame rate 3D video based on data fusion.
背景技术Background technique
一方面,传统相机受帧率限制,拍摄高帧率视频所需的专业高速摄像机成本极高;另一方面,从低帧率视频生成高帧率的3D视频,即高帧率深度图视频,实现高速3D观测存在一定缺陷。On the one hand, traditional cameras are limited by the frame rate, and the professional high-speed cameras required to shoot high-frame-rate videos are extremely expensive; There are certain defects in realizing high-speed 3D observation.
相关技术使用纯事件流生成视频,将事件流使用堆叠的方式转换成为网格状张量表示,从而使用深度学习方法生成图像,实现高速3D观测的目的。Related technologies use pure event streams to generate videos, convert event streams into grid-like tensor representations in a stacking manner, and use deep learning methods to generate images to achieve high-speed 3D observation.
然而,相关技术仅使用事件流作为输入,缺乏每个像素点的初始亮度值,仅依靠亮度变化记录去估计亮度是一种欠定问题,进而导致生成的图像质量较低,有待改善。However, the related art only uses the event stream as input, lacks the initial brightness value of each pixel, and only relies on the brightness change record to estimate the brightness is an underdetermined problem, which in turn leads to low quality of the generated image, which needs to be improved.
发明内容SUMMARY OF THE INVENTION
本申请提供一种基于数据融合的高帧率3D视频生成方法及装置,以解决相关技术中仅使用事件流作为输入,缺乏每个像素点的初始亮度值,从而导致生成的图像质量较低的技术问题。The present application provides a method and device for generating high frame rate 3D video based on data fusion, so as to solve the problem that the related art only uses event stream as input, lacks the initial brightness value of each pixel point, thus resulting in low quality of the generated image. technical problem.
本申请第一方面实施例提供一种基于数据融合的高帧率3D视频生成方法,包括以下步骤:从事件相机获取低于预设帧率的视频和事件数据;将所述视频中相邻图像帧进行两两组合,生成多组相邻图像帧,并计算期望得到所有中间帧的时间戳集合;根据所述时间戳集合截取从两个边界帧到期望中间帧的第一事件流和第二事件流,并将所述第一事件流和第二事件流输入至预设的脉冲神经网络进行前向传播,得到第一事件流数据特征向量和第二事件流数据特征向量;拼接所述相邻图像帧、所述第一事件流数据特征向量和所述第二事件流数据特征向量,并输入至预设的多模态融合网络进行前向传播,得到所有中间帧,生成高于第二预设帧率的高帧率视频;基于所述高帧率视频,利用预设的3D深度估计网络进行前向传播,得到所有高帧率深度图,并组合所述所有高帧率深度图,构成高帧率3D视频。The embodiment of the first aspect of the present application provides a high frame rate 3D video generation method based on data fusion, including the following steps: acquiring video and event data with a frame rate lower than a preset frame rate from an event camera; The frames are combined in pairs to generate multiple groups of adjacent image frames, and calculate the timestamp set of all intermediate frames expected to be obtained; intercept the first event stream and the second event stream from the two boundary frames to the expected intermediate frame according to the timestamp set. event stream, and input the first event stream and the second event stream into a preset spiking neural network for forward propagation, and obtain the first event stream data feature vector and the second event stream data feature vector; splicing the phase The adjacent image frames, the first event stream data feature vector and the second event stream data feature vector are input to the preset multi-modal fusion network for forward propagation, and all intermediate frames are obtained. A high frame rate video with a preset frame rate; based on the high frame rate video, forward propagation is performed using a preset 3D depth estimation network to obtain all high frame rate depth maps, and combine all the high frame rate depth maps, Compose high frame rate 3D video.
可选地,在本申请的一个实施例中,在将所述第一事件流和第二事件流输入至所述预设的脉冲神经网络进行前向传播之前,还包括:基于Spike Response模型作为神经元动力学模型,构建所述脉冲神经网络。Optionally, in an embodiment of the present application, before inputting the first event stream and the second event stream into the preset spiking neural network for forward propagation, the method further includes: based on the Spike Response model as a Neuron dynamics model to construct the spiking neural network.
可选地,在本申请的一个实施例中,所述多模态融合网络包含粗合成子网络和微调子网络,其中,所述粗合成子网络使用第一U-Net结构,输入层的输入通道数为64+2×k,输出层的输出通道数为k,且所述微调子网络使用第二U-Net结构,输入层的输入通道数为3×k,输出层的输出通道数为k,k为所述低于预设帧率的视频的图像帧的通道数。Optionally, in an embodiment of the present application, the multi-modal fusion network includes a coarse synthesis sub-network and a fine-tuning sub-network, wherein the coarse synthesis sub-network uses the first U-Net structure, and the input of the input layer The number of channels is 64+2×k, the number of output channels of the output layer is k, and the fine-tuning sub-network uses the second U-Net structure, the number of input channels of the input layer is 3×k, and the number of output channels of the output layer is k, k is the channel number of the image frame of the video whose frame rate is lower than the preset frame rate.
可选地,在本申请的一个实施例中,所述3D深度估计网络使用第三U-Net结构,且输入层的输入通道数为3×k,输出层的输出通道数为1。Optionally, in an embodiment of the present application, the 3D depth estimation network uses a third U-Net structure, and the number of input channels of the input layer is 3×k, and the number of output channels of the output layer is 1.
可选地,在本申请的一个实施例中,所述所有中间帧的时间戳集合的计算公式为:Optionally, in an embodiment of the present application, the calculation formula of the timestamp set of all intermediate frames is:
其中,N为输入低帧率视频的总帧数,n为期望帧率提升的倍数,tj为输入低帧率视频第j帧的时间戳。Among them, N is the total number of frames of the input low frame rate video, n is the multiple of the expected frame rate increase, and t j is the timestamp of the jth frame of the input low frame rate video.
本申请第二方面实施例提供一种基于数据融合的高帧率3D视频生成装置,包括:第一获取模块,用于从事件相机获取低于预设帧率的视频和事件数据;计算模块,用于将所述视频中相邻图像帧进行两两组合,生成多组相邻图像帧,并计算期望得到所有中间帧的时间戳集合;第二获取模块,用于根据所述时间戳集合截取从两个边界帧到期望中间帧的第一事件流和第二事件流,并将所述第一事件流和第二事件流输入至预设的脉冲神经网络进行前向传播,得到第一事件流数据特征向量和第二事件流数据特征向量;融合模块,用于拼接所述相邻图像帧、所述第一事件流数据特征向量和所述第二事件流数据特征向量,并输入至预设的多模态融合网络进行前向传播,得到所有中间帧,生成高于第二预设帧率的高帧率视频;生成模块,用于基于所述高帧率视频,利用预设的3D深度估计网络进行前向传播,得到所有高帧率深度图,并组合所述所有高帧率深度图,构成高帧率3D视频。An embodiment of the second aspect of the present application provides a high frame rate 3D video generation device based on data fusion, including: a first acquisition module, configured to acquire video and event data with a frame rate lower than a preset frame rate from an event camera; a calculation module, For combining adjacent image frames in the video in pairs, generating multiple groups of adjacent image frames, and calculating a set of timestamps expected to obtain all intermediate frames; a second acquisition module, used for intercepting according to the set of timestamps From the two boundary frames to the first event stream and the second event stream of the expected intermediate frame, and input the first event stream and the second event stream to the preset spiking neural network for forward propagation to obtain the first event The stream data feature vector and the second event stream data feature vector; the fusion module is used for splicing the adjacent image frames, the first event stream data feature vector and the second event stream data feature vector, and input to the pre- The set multi-modal fusion network performs forward propagation, obtains all intermediate frames, and generates a high frame rate video higher than the second preset frame rate; the generation module is used for using the preset 3D video based on the high frame rate video. The depth estimation network performs forward propagation to obtain all high frame rate depth maps, and combines all the high frame rate depth maps to form a high frame rate 3D video.
可选地,在本申请的一个实施例中,还包括:构建模块,用于基于Spike Response模型作为神经元动力学模型,构建所述脉冲神经网络。Optionally, in an embodiment of the present application, it further includes: a building module for building the spiking neural network based on the Spike Response model as a neuron dynamics model.
可选地,在本申请的一个实施例中,所述多模态融合网络包含粗合成子网络和微调子网络,其中,所述粗合成子网络使用第一U-Net结构,输入层的输入通道数为64+2×k,输出层的输出通道数为k,且所述微调子网络使用第二U-Net结构,输入层的输入通道数为3×k,输出层的输出通道数为k,k为所述低于预设帧率的视频的图像帧的通道数。Optionally, in an embodiment of the present application, the multi-modal fusion network includes a coarse synthesis sub-network and a fine-tuning sub-network, wherein the coarse synthesis sub-network uses the first U-Net structure, and the input of the input layer The number of channels is 64+2×k, the number of output channels of the output layer is k, and the fine-tuning sub-network uses the second U-Net structure, the number of input channels of the input layer is 3×k, and the number of output channels of the output layer is k, k is the channel number of the image frame of the video whose frame rate is lower than the preset frame rate.
可选地,在本申请的一个实施例中,所述3D深度估计网络使用第三U-Net结构,且输入层的输入通道数为3×k,输出层的输出通道数为1。Optionally, in an embodiment of the present application, the 3D depth estimation network uses a third U-Net structure, and the number of input channels of the input layer is 3×k, and the number of output channels of the output layer is 1.
可选地,在本申请的一个实施例中,所述第一事件流和所述第二事件流的计算公式为:Optionally, in an embodiment of the present application, the calculation formulas of the first event flow and the second event flow are:
其中,N为输入低帧率视频的总帧数,n为期望帧率提升的倍数,tj为输入低帧率视频第j帧的时间戳。Among them, N is the total number of frames of the input low frame rate video, n is the multiple of the expected frame rate increase, and t j is the timestamp of the jth frame of the input low frame rate video.
本申请第三方面实施例提供一种电子设备,包括:存储器、处理器及存储在所述存储器上并可在所述处理器上运行的计算机程序,所述处理器执行所述程序,以实现如上述实施例所述的基于数据融合的高帧率3D视频生成方法。An embodiment of a third aspect of the present application provides an electronic device, including: a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor executes the program to achieve The method for generating high frame rate 3D video based on data fusion as described in the above embodiments.
本申请第四方面实施例提供一种计算机可读存储介质,所述计算机可读存储介质存储计算机指令,所述计算机指令用于使所述计算机执行如上述实施例所述的基于数据融合的高帧率3D视频生成方法。Embodiments of the fourth aspect of the present application provide a computer-readable storage medium, where the computer-readable storage medium stores computer instructions, and the computer instructions are used to cause the computer to execute the high-speed data fusion-based high-level data fusion described in the foregoing embodiments. Frame rate 3D video generation method.
本申请实施例可以使用事件数据提供帧间运动信息,利用脉冲神经网络对事件流进行编码,并通过多模态融合网络得到所有中间帧,生成高帧率视频,进而利用3D深度估计网络构成高帧率3D视频,实现对于高速场景的有效的立体观测,通过使用事件流和低帧率视频图像帧作为输入,可以更好地使用多模态数据信息,进而提升高帧率3D视频的质量。由此,解决了相关技术中仅使用事件流作为输入,缺乏每个像素点的初始亮度值,从而导致生成的图像质量较低的技术问题。In this embodiment of the present application, event data can be used to provide inter-frame motion information, a spiking neural network can be used to encode the event stream, all intermediate frames can be obtained through a multi-modal fusion network, and a high-frame-rate video can be generated, and then a 3D depth estimation network can be used to form a high-speed video. Frame rate 3D video enables effective stereoscopic observation of high-speed scenes. By using event streams and low frame rate video image frames as input, multi-modal data information can be better used, thereby improving the quality of high frame rate 3D video. As a result, the technical problem in the related art that only the event stream is used as an input and the initial brightness value of each pixel is lacking, thus resulting in a lower quality of the generated image, is solved.
本申请附加的方面和优点将在下面的描述中部分给出,部分将从下面的描述中变得明显,或通过本申请的实践了解到。Additional aspects and advantages of the present application will be set forth, in part, in the following description, and in part will be apparent from the following description, or learned by practice of the present application.
附图说明Description of drawings
本申请上述的和/或附加的方面和优点从下面结合附图对实施例的描述中将变得明显和容易理解,其中:The above and/or additional aspects and advantages of the present application will become apparent and readily understood from the following description of embodiments taken in conjunction with the accompanying drawings, wherein:
图1为根据本申请实施例提供的一种基于数据融合的高帧率3D视频生成方法的流程图;1 is a flowchart of a method for generating a high frame rate 3D video based on data fusion provided according to an embodiment of the present application;
图2为根据本申请一个实施例的基于数据融合的高帧率3D视频生成方法的流程图;2 is a flowchart of a method for generating a high frame rate 3D video based on data fusion according to an embodiment of the present application;
图3为根据本申请一个实施例的基于数据融合的高帧率3D视频生成方法的低帧率视频数据及事件流数据示意图;3 is a schematic diagram of low frame rate video data and event stream data of a high frame rate 3D video generation method based on data fusion according to an embodiment of the present application;
图4为根据本申请一个实施例的基于数据融合的高帧率3D视频生成方法的中间帧视频数据示意图;4 is a schematic diagram of intermediate frame video data of a method for generating high frame rate 3D video based on data fusion according to an embodiment of the present application;
图5为根据本申请一个实施例的基于数据融合的高帧率3D视频生成方法的输入事件流、低帧率视频和生成的高帧率视频数据示意图;5 is a schematic diagram of an input event stream, a low frame rate video, and the generated high frame rate video data of a method for generating a high frame rate 3D video based on data fusion according to an embodiment of the present application;
图6为根据本申请一个实施例的基于数据融合的高帧率3D视频生成方法的10倍帧率提升下的高帧率深度图;6 is a high frame rate depth map under a 10-fold frame rate improvement of a method for generating a high frame rate 3D video based on data fusion according to an embodiment of the present application;
图7为根据本申请实施例提供的一种基于数据融合的高帧率3D视频生成装置的结构示意图;7 is a schematic structural diagram of a device for generating high frame rate 3D video based on data fusion provided according to an embodiment of the present application;
图8为根据本申请实施例提供的电子设备的结构示意图FIG. 8 is a schematic structural diagram of an electronic device provided according to an embodiment of the present application
具体实施方式Detailed ways
下面详细描述本申请的实施例,所述实施例的示例在附图中示出,其中自始至终相同或类似的标号表示相同或类似的元件或具有相同或类似功能的元件。下面通过参考附图描述的实施例是示例性的,旨在用于解释本申请,而不能理解为对本申请的限制。The following describes in detail the embodiments of the present application, examples of which are illustrated in the accompanying drawings, wherein the same or similar reference numerals refer to the same or similar elements or elements having the same or similar functions throughout. The embodiments described below with reference to the accompanying drawings are exemplary, and are intended to be used to explain the present application, but should not be construed as a limitation to the present application.
下面参考附图描述本申请实施例的基于数据融合的高帧率3D视频生成方法及装置。针对上述背景技术中心提到的相关技术中仅使用事件流作为输入,缺乏每个像素点的初始亮度值,从而导致生成的图像质量较低的技术问题,本申请提供了一种基于数据融合的高帧率3D视频生成方法,在该方法中,可以使用事件数据提供帧间运动信息,利用脉冲神经网络对事件流进行编码,并通过多模态融合网络得到所有中间帧,生成高帧率视频,进而利用3D深度估计网络构成高帧率3D视频,实现对于高速场景的有效的立体观测,通过使用事件流和低帧率视频图像帧作为输入,可以更好地使用多模态数据信息,进而提升高帧率3D视频的质量。由此,解决了相关技术中仅使用事件流作为输入,缺乏每个像素点的初始亮度值,从而导致生成的图像质量较低的技术问题。The following describes the method and apparatus for generating a high frame rate 3D video based on data fusion according to the embodiments of the present application with reference to the accompanying drawings. Aiming at the technical problem that the related art mentioned in the above-mentioned Background Art Center only uses the event stream as input and lacks the initial brightness value of each pixel, thus resulting in low quality of the generated image, the present application provides a data fusion-based A high frame rate 3D video generation method, in which event data can be used to provide inter-frame motion information, a spiking neural network is used to encode the event stream, and all intermediate frames are obtained through a multi-modal fusion network to generate a high frame rate video , and then use the 3D depth estimation network to form high frame rate 3D video to achieve effective stereoscopic observation for high-speed scenes. By using event streams and low frame rate video image frames as input, multi-modal data information can be better used, and then Improve the quality of high frame rate 3D videos. As a result, the technical problem in the related art that only the event stream is used as an input and the initial brightness value of each pixel is lacking, thus resulting in a lower quality of the generated image, is solved.
具体而言,图1为本申请实施例所提供的一种基于数据融合的高帧率3D视频生成方法的流程示意图。Specifically, FIG. 1 is a schematic flowchart of a method for generating a high frame rate 3D video based on data fusion provided by an embodiment of the present application.
如图1所示,该基于数据融合的高帧率3D视频生成方法包括以下步骤:As shown in Figure 1, the high frame rate 3D video generation method based on data fusion includes the following steps:
在步骤S101中,从事件相机获取低于预设帧率的视频和事件数据。In step S101, video and event data below a preset frame rate are acquired from the event camera.
在实际执行过程中,本申请实施例可以从事件相机获取低于预设帧率的视频和事件数据,实现原始数据的获取,为后续生成高帧率视频奠定数据基础。In the actual execution process, the embodiment of the present application can acquire video and event data with a lower frame rate than a preset frame rate from the event camera, realize the acquisition of raw data, and lay a data foundation for the subsequent generation of high frame rate video.
可以理解的是,事件相机是一种受生物启发的传感器,工作原理与传统的相机有很大的差别,与传统相机以固定帧率采集场景绝对光强不同,事件相机仅在场景光强变化时输出事件流,与传统相机相比,事件相机有着高动态范围、高时间分辨率、无动态模糊等优点,有利于保证高帧率视频的生成。Understandably, an event camera is a bio-inspired sensor that works very differently from a traditional camera. Unlike a traditional camera that captures the absolute light intensity of a scene at a fixed frame rate, an event camera only changes when the scene light intensity changes. Compared with traditional cameras, event cameras have the advantages of high dynamic range, high temporal resolution, and no motion blur, which is conducive to ensuring the generation of high frame rate video.
事件相机作为一种新型视觉传感器,无法直接应用传统相机及图像的各种算法,事件相机没有帧率的概念,其每个像素点异步工作,当检测到光强变化时输出一条事件,每条事件为一个四元组(x,y,t,p),包含像素横纵坐标(x,y)、时间戳t和事件极性p(其中,p=-1表示该像素点光强减小,p=1表示该像素点光强增大),将所有像素点输出的事件数据进行汇总,可以形成由一条条事件组成的事件列表,作为相机输出的事件流数据。事件相机则没有帧率的概念,其每个像素点异步工作,当检测到光强变化时输出一条事件。所有像素点输出的事件数据汇总起来,形成由若干条事件组成的事件列表,作为相机输出的事件流数据。As a new type of visual sensor, the event camera cannot directly apply various algorithms of traditional cameras and images. The event camera has no concept of frame rate. Each pixel of the event camera works asynchronously. When a change in light intensity is detected, an event is output. The event is a quadruple (x, y, t, p), including the horizontal and vertical coordinates of the pixel (x, y), the timestamp t and the event polarity p (where, p=-1 indicates that the light intensity of the pixel decreases , p=1 indicates that the light intensity of the pixel increases), and the event data output by all the pixels is aggregated to form an event list composed of events as the event stream data output by the camera. The event camera has no concept of frame rate, and each pixel works asynchronously, and outputs an event when a change in light intensity is detected. The event data output by all pixel points is aggregated to form an event list composed of several events, which is used as the event stream data output by the camera.
其中,预设帧率可以由本领域技术人员进行相应设置,在此不做具体限制。The preset frame rate can be set correspondingly by those skilled in the art, which is not specifically limited here.
在步骤S102中,将视频中相邻图像帧进行两两组合,生成多组相邻图像帧,并计算期望得到所有中间帧的时间戳集合。In step S102, adjacent image frames in the video are combined in pairs to generate multiple groups of adjacent image frames, and a set of time stamps of all intermediate frames expected to be obtained is calculated.
作为一种可能实现的方式,本申请实施例可以将低帧率视频中,相邻图像帧两两组合,生成多组相邻图像帧,且对于每一组相邻的图像帧,计算期望得到所有中间帧的时间戳集合T,记为:As a possible implementation method, in this embodiment of the present application, adjacent image frames in a low frame rate video can be combined in pairs to generate multiple groups of adjacent image frames, and for each group of adjacent image frames, the calculation is expected to obtain The timestamp set T of all intermediate frames, denoted as:
T={τ1 1,2,τ2 1,2,...,τn 1,2,τ1 2,3,τ2 2,3,...,τn 2,3,...,τ1 N-1,N,τ2 N-1,N,...,τn N-1,N}。T={τ 1 1,2 ,τ 2 1,2 ,...,τ n 1,2 ,τ 1 2,3 ,τ 2 2,3 ,...,τ n 2,3 ,... ,τ 1 N-1,N ,τ 2 N-1,N ,...,τ n N-1,N }.
可选地,在本申请的一个实施例中,所有中间帧的时间戳集合的计算公式为:Optionally, in an embodiment of the present application, the calculation formula of the timestamp set of all intermediate frames is:
其中,N为输入低帧率视频的总帧数,n为期望帧率提升的倍数,tj为输入低帧率视频第j帧的时间戳。Among them, N is the total number of frames of the input low frame rate video, n is the multiple of the expected frame rate increase, and t j is the timestamp of the jth frame of the input low frame rate video.
具体地,期望得到所有中间帧的时间戳的计算公式可以如下:Specifically, the calculation formula for obtaining the timestamps of all intermediate frames may be as follows:
其中,N是输入低帧率视频的总帧数,n是期望帧率提升的倍数,tj是输入低帧率视频第j帧的时间戳。Among them, N is the total number of frames of the input low frame rate video, n is the multiple of the expected frame rate improvement, and t j is the timestamp of the jth frame of the input low frame rate video.
本申请实施例可以通过计算期望得到所有中间帧的时间戳集合,实现对数据的预处理,为后续进行数据融合提供基础。In the embodiment of the present application, the set of timestamps of all intermediate frames can be expected to be obtained by calculation, so as to realize the preprocessing of the data, and provide a basis for the subsequent data fusion.
在步骤S103中,根据时间戳集合截取从两个边界帧到期望中间帧的第一事件流和第二事件流,并将第一事件流和第二事件流输入至预设的脉冲神经网络进行前向传播,得到第一事件流数据特征向量和第二事件流数据特征向量。In step S103, intercept the first event stream and the second event stream from the two boundary frames to the desired intermediate frame according to the timestamp set, and input the first event stream and the second event stream into a preset spiking neural network for Forward propagation to obtain the first event stream data feature vector and the second event stream data feature vector.
进一步地,本申请实施例可以根据步骤S102中计算获得的中间帧时间戳集合,截取从两个边界帧到期望中间帧的第一事件流ε1和第二事件流ε2,并将第一事件流和第二事件流输入至预设的脉冲神经网络进行前向传播,得到第一事件流数据特征向量F1和第二事件流数据特征向量F2。本申请实施例通过使用脉冲神经网络对于事件流进行编码,可以更好地起到事件流数据去噪的效果,进而提高生成视频的质量。Further, this embodiment of the present application may intercept the first event stream ε 1 and the second event stream ε 2 from the two boundary frames to the expected intermediate frame according to the intermediate frame timestamp set calculated in step S102 , and convert the first
其中,第一事件流ε1和第二事件流ε2的计算公式可以分别如下:Wherein, the calculation formulas of the first event flow ε 1 and the second event flow ε 2 can be respectively as follows:
其中,τi j,j+1为期望中间帧的时间戳,tj和tj+1为期望中间帧相邻输入低帧率视频帧的时间戳。Among them, τ i j,j+1 is the time stamp of the desired intermediate frame, and t j and t j+1 are the time stamps of the adjacent input low frame rate video frames of the desired intermediate frame.
需要注意的是,预设的脉冲神经网络会在下文进行详细阐述。It should be noted that the preset spiking neural network will be described in detail below.
可选地,在本申请的一个实施例中,在将第一事件流和第二事件流输入至预设的脉冲神经网络进行前向传播之前,还包括:基于Spike Response模型作为神经元动力学模型,构建脉冲神经网络。Optionally, in an embodiment of the present application, before inputting the first event stream and the second event stream into a preset spiking neural network for forward propagation, the method further includes: using a Spike Response model as a neuron dynamics model to build a spiking neural network.
在此对脉冲神经网络进行详细阐述。The spiking neural network is described in detail here.
可以理解的是,脉冲神经网络是第三代人工神经网络,脉冲神经网络中的神经元不是在每一次迭代传播中都被激活,而是在它的膜电位达到某一个特定值才被激活,当一个神经元被激活,脉冲神经网络会产生一个信号传递给其他神经元,提高或降低其膜电位,因此脉冲神经网络模拟神经元更加接近实际,更加适用于处理时序脉冲信号。It is understandable that the spiking neural network is the third generation of artificial neural network. The neurons in the spiking neural network are not activated in each iterative propagation, but are activated when their membrane potential reaches a certain value. When a neuron is activated, the spiking neural network will generate a signal that is transmitted to other neurons, increasing or decreasing its membrane potential, so the simulated neurons of the spiking neural network are closer to reality and are more suitable for processing timing pulse signals.
在实际执行过程中,本申请实施例可以使用Spike Response模型作为神经元动力学模型,构建脉冲卷积神经网络。In the actual execution process, the embodiment of the present application may use the Spike Response model as a neuron dynamics model to construct a spiking convolutional neural network.
具体地,脉冲神经网络可以包括输入卷积层、隐藏卷积层和输出卷积层。其中,输入卷积层的输入通道数为2,对应事件流的正极性事件和负极性事件,卷积核的尺寸为3×3,步长为1,输出通道数为16;隐藏卷积层的输入通道数为16,卷积核的尺寸为3×3,步长为1,输出通道数为16;输出卷积层的输入通道数为16,卷积核的尺寸为3×3,步长为1,输出通道数为32。Specifically, a spiking neural network can include an input convolutional layer, a hidden convolutional layer, and an output convolutional layer. Among them, the number of input channels of the input convolution layer is 2, corresponding to the positive events and negative events of the event stream, the size of the convolution kernel is 3 × 3, the stride is 1, and the number of output channels is 16; the hidden convolution layer The number of input channels is 16, the size of the convolution kernel is 3 × 3, the stride is 1, and the number of output channels is 16; the number of input channels of the output convolution layer is 16, the size of the convolution kernel is 3 × 3, the step The length is 1, and the number of output channels is 32.
在步骤S104,拼接相邻图像帧、第一事件流数据特征向量和第二事件流数据特征向量,并输入至预设的多模态融合网络进行前向传播,得到所有中间帧,生成高于第二预设帧率的高帧率视频。In step S104, the adjacent image frames, the first event stream data feature vector and the second event stream data feature vector are spliced, and input to a preset multi-modal fusion network for forward propagation to obtain all intermediate frames, and generate higher than High frame rate video of the second preset frame rate.
作为一种可能实现的方式,本申请实施例可以将从步骤S102获得的低帧率视频的相邻图像帧和从步骤S103获得的第一事件流数据特征向量F1和第二事件流数据特征向量F2进行拼接,并输入至预设的多模态融合网络进行前向传播,生成一帧中间帧,以完成单一高帧率图像帧计算。As a possible implementation manner, in this embodiment of the present application, the adjacent image frames of the low frame rate video obtained from step S102 and the first event stream data feature vector F 1 and the second event stream data feature obtained from step S103 can be The vector F2 is spliced and input to the preset multi-modal fusion network for forward propagation, and an intermediate frame is generated to complete the calculation of a single high frame rate image frame.
具体地,本申请实施例可以首先将低帧率视频相邻图像帧和事件流数据特征向量F1和F2拼接起来,输入到粗合成子网络中得到粗输出结果;随后将粗输出结果与输入相邻图像帧拼接起来,输入到微调子网络中得到最终输出结果。Specifically, in the embodiment of the present application, the adjacent image frames of the low frame rate video and the event stream data feature vectors F 1 and F 2 can be spliced together, and input into the coarse synthesis sub-network to obtain a coarse output result; then the coarse output result is combined with The input adjacent image frames are stitched together and input to the fine-tuning sub-network to get the final output.
进一步地,本申请实施例可以对于步骤S102中计算的期望每一个中间帧的时间戳,重复上述步骤,完成所有中间帧的计算,进而生成高于第二预设帧率的高帧率视频。Further, in this embodiment of the present application, the above steps may be repeated for the timestamp of each expected intermediate frame calculated in step S102 to complete the calculation of all intermediate frames, thereby generating a high frame rate video higher than the second preset frame rate.
需要注意的是,预设的多模态融合网络会在下文进行详细阐述。It should be noted that the preset multimodal fusion network will be described in detail below.
可选地,在本申请的一个实施例中,多模态融合网络包含粗合成子网络和微调子网络,其中,粗合成子网络使用第一U-Net结构,输入层的输入通道数为64+2×k,输出层的输出通道数为k,且微调子网络使用第二U-Net结构,输入层的输入通道数为3×k,输出层的输出通道数为k,k为低于预设帧率的视频的图像帧的通道数。Optionally, in an embodiment of the present application, the multimodal fusion network includes a coarse synthesis sub-network and a fine-tuning sub-network, wherein the coarse synthesis sub-network uses the first U-Net structure, and the number of input channels of the input layer is 64. +2×k, the number of output channels of the output layer is k, and the fine-tuning sub-network uses the second U-Net structure, the number of input channels of the input layer is 3×k, the number of output channels of the output layer is k, and k is lower than The number of channels of the image frame of the video with the preset frame rate.
在此对多模态融合网络进行详细阐述。The multimodal fusion network is elaborated here.
可以理解的是,数据融合网络包含一个粗合成子网络和一个微调子网络。其中,粗合成子网络使用第一U-Net结构,输入层的输入通道数为64+2×k,输出层的输出通道数为k;微调子网络使用第二U-Net结构,输入层的输入通道数为3×k,输出层的输出通道数为k。Understandably, the data fusion network consists of a coarse synthesis sub-network and a fine-tuned sub-network. Among them, the coarse synthesis sub-network uses the first U-Net structure, the number of input channels of the input layer is 64+2×k, and the number of output channels of the output layer is k; the fine-tuning sub-network uses the second U-Net structure, and the number of input channels of the input layer is k The number of input channels is 3×k, and the number of output channels of the output layer is k.
其中,k为步骤S101中输入的低帧率视频的图像帧的通道数,即当步骤S101中输入的低帧率视频的图像帧为灰度图时,k=1,当步骤S101中输入的低帧率视频的图像帧为RGB图像时,k=3。Among them, k is the channel number of the image frame of the low frame rate video input in step S101, that is, when the image frame of the low frame rate video input in step S101 is a grayscale image, k=1, when the input in step S101 When the image frame of the low frame rate video is an RGB image, k=3.
在步骤S105中,基于高帧率视频,利用预设的3D深度估计网络进行前向传播,得到所有高帧率深度图,并组合所有高帧率深度图,构成高帧率3D视频。In step S105, based on the high frame rate video, forward propagation is performed using a preset 3D depth estimation network to obtain all high frame rate depth maps, and all high frame rate depth maps are combined to form a high frame rate 3D video.
在实际执行过程中,本申请实施例可以将上述步骤中获得的高帧率图像帧,与其前后相邻高帧率图像帧进行拼接,使用预设的3D深度估计网络进行前向传播,生成一系列高帧率深度图,并将生成的一系列高帧率深度图进行组合,构成高帧率3D视频,实现高帧率3D视频生成。本申请实施例可以使用事件数据提供帧间运动信息,利用脉冲神经网络对事件流进行编码,并通过多模态融合网络得到所有中间帧,生成高帧率视频,进而利用3D深度估计网络构成高帧率3D视频,实现对于高速场景的有效的立体观测,通过使用事件流和低帧率视频图像帧作为输入,可以更好地使用多模态数据信息,进而提升高帧率3D视频的质量。In the actual execution process, the embodiment of the present application can splicing the high frame rate image frame obtained in the above steps with the adjacent high frame rate image frames before and after, and use the preset 3D depth estimation network for forward propagation to generate a A series of high frame rate depth maps, and the generated series of high frame rate depth maps are combined to form a high frame rate 3D video to achieve high frame rate 3D video generation. In this embodiment of the present application, event data can be used to provide inter-frame motion information, a spiking neural network can be used to encode the event stream, all intermediate frames can be obtained through a multi-modal fusion network, and a high-frame-rate video can be generated, and then a 3D depth estimation network can be used to form a high-speed video. Frame rate 3D video enables effective stereoscopic observation of high-speed scenes. By using event streams and low frame rate video image frames as input, multi-modal data information can be better used, thereby improving the quality of high frame rate 3D video.
可选地,在本申请的一个实施例中,3D深度估计网络使用第三U-Net结构,且输入层的输入通道数为3×k,输出层的输出通道数为1。Optionally, in an embodiment of the present application, the 3D depth estimation network uses a third U-Net structure, and the number of input channels of the input layer is 3×k, and the number of output channels of the output layer is 1.
在此对3D深度估计网络的构建进行详细阐述。The construction of the 3D depth estimation network is elaborated here.
具体地,本申请实施例构建的3D深度估计网络可以使用第三U-Net结构,输入层的输入通道数为3×k,输出层的输出通道数为1,其中,k为步骤S101中输入的低帧率视频的图像帧的通道数,即当步骤S101中输入的低帧率视频的图像帧为灰度图时,k=1,当步骤S101中输入的低帧率视频的图像帧为RGB图像时,k=3。Specifically, the 3D depth estimation network constructed in the embodiment of the present application can use the third U-Net structure, the number of input channels of the input layer is 3×k, and the number of output channels of the output layer is 1, where k is the input in step S101 The number of channels of the image frame of the low frame rate video, that is, when the image frame of the low frame rate video input in step S101 is a grayscale image, k=1, and when the image frame of the low frame rate video input in step S101 is For RGB images, k=3.
下面结合图2至7所示,以一个实施例对本申请实施例进行详细阐述。如图2所示,本申请实施例包括以下步骤:Hereinafter, with reference to FIGS. 2 to 7 , the embodiments of the present application will be described in detail with an embodiment. As shown in Figure 2, the embodiment of the present application includes the following steps:
步骤S201:低帧率视频数据及事件流数据获取。在实际执行过程中,本申请实施例可以从事件相机获取帧率的视频和事件数据,实现原始数据的获取,为后续生成高帧率视频奠定数据基础。Step S201 : acquiring low frame rate video data and event stream data. In the actual execution process, the embodiments of the present application can acquire video and event data at a frame rate from an event camera, realize the acquisition of original data, and lay a data foundation for subsequent generation of high-frame rate videos.
可以理解的是,事件相机没有帧率的概念,其每个像素点异步工作,当检测到光强变化时输出一条事件,每条事件为一个四元组(x,y,t,p),包含像素横纵坐标(x,y)、时间戳t和事件极性p(其中,p=-1表示该像素点光强减小,p=1表示该像素点光强增大),将所有像素点输出的事件数据进行汇总,可以形成由一条条事件组成的事件列表,作为相机输出的事件流数据。It can be understood that the event camera does not have the concept of frame rate, and each pixel of it works asynchronously. When a change in light intensity is detected, an event is output, and each event is a quadruple (x, y, t, p), Including the horizontal and vertical coordinates of the pixel (x, y), the time stamp t and the event polarity p (where, p=-1 indicates that the light intensity of the pixel decreases, and p=1 indicates that the light intensity of the pixel increases). The event data output by the pixel points can be aggregated to form an event list composed of events, which can be used as the event stream data output by the camera.
举例而言,如图3所示,本申请实施例从事件相机获取的低帧率视频的帧率可以为20FPS(Frames Per Second,每秒传输帧数),共计31帧,对应的事件流持续时间为1500ms。For example, as shown in FIG. 3 , the frame rate of the low frame rate video obtained from the event camera in this embodiment of the present application may be 20 FPS (Frames Per Second, the number of frames transmitted per second), a total of 31 frames, and the corresponding event stream continues The time is 1500ms.
步骤S202:数据预处理。本申请实施例可以将低帧率视频中相邻图像帧两两组合,对于每一组相邻图像帧,计算期望得到所有中间帧的时间戳集合T,记为:Step S202: data preprocessing. In this embodiment of the present application, adjacent image frames in the low frame rate video may be combined in pairs, and for each group of adjacent image frames, the time stamp set T of all intermediate frames expected to be obtained is calculated as:
T={τ1 1,2,τ2 1,2,...,τn 1,2,τ1 2,3,τ2 2,3,...,τn 2,3,...,τ1 N-1,N,τ2 N-1,N,...,τn N-1,N},T={τ 1 1,2 ,τ 2 1,2 ,...,τ n 1,2 ,τ 1 2,3 ,τ 2 2,3 ,...,τ n 2,3 ,... ,τ 1 N-1,N ,τ 2 N-1,N ,...,τ n N-1,N },
其中,每个期望得到的中间帧时间戳的计算公式如下:Among them, the calculation formula of each expected intermediate frame timestamp is as follows:
其中,N是输入低帧率视频的总帧数,n是期望帧率提升的倍数,tj是输入低帧率视频第j帧的时间戳。Among them, N is the total number of frames of the input low frame rate video, n is the multiple of the expected frame rate improvement, and t j is the timestamp of the jth frame of the input low frame rate video.
举例而言,本申请实施例输入低帧率视频可以包含N=31帧图像,帧率为20FPS,则输入低帧率视频第j帧的时间戳为tj=(j-1)×50ms。若得到帧率提升n=10倍的高帧率视频,则计算得到的所有中间帧的时间戳集合可以为T={0,5,10,15,20,...,1495},包含300个元素。For example, the input low frame rate video in this embodiment of the present application may include N=31 frames of images and the frame rate is 20FPS, and the timestamp of the jth frame of the input low frame rate video is t j =(j-1)×50ms. If a high frame rate video with a frame rate increase of n=10 times is obtained, the calculated timestamp set of all intermediate frames can be T={0,5,10,15,20,...,1495}, including 300 elements.
步骤S203:脉冲神经网络构建。在实际执行过程中,本申请实施例可以使用SpikeResponse模型作为神经元动力学模型,构建脉冲卷积神经网络。Step S203: constructing a spiking neural network. In the actual execution process, the embodiment of the present application may use the SpikeResponse model as a neuron dynamics model to construct a spiking convolutional neural network.
具体地,脉冲神经网络可以包括输入卷积层、隐藏卷积层和输出卷积层。其中,输入卷积层的输入通道数为2,对应事件流的正极性事件和负极性事件,卷积核的尺寸为3×3,步长为1,输出通道数为16;隐藏卷积层的输入通道数为16,卷积核的尺寸为3×3,步长为1,输出通道数为16;输出卷积层的输入通道数为16,卷积核的尺寸为3×3,步长为1,输出通道数为32。Specifically, a spiking neural network can include an input convolutional layer, a hidden convolutional layer, and an output convolutional layer. Among them, the number of input channels of the input convolution layer is 2, corresponding to the positive events and negative events of the event stream, the size of the convolution kernel is 3 × 3, the stride is 1, and the number of output channels is 16; the hidden convolution layer The number of input channels is 16, the size of the convolution kernel is 3 × 3, the stride is 1, and the number of output channels is 16; the number of input channels of the output convolution layer is 16, the size of the convolution kernel is 3 × 3, the step The length is 1, and the number of output channels is 32.
步骤S204:事件流编码计算。本申请实施例可以根据步骤S202计算得到的中间帧的时间戳τi j,j+1,截取从两个边界帧到期望中间帧的事件流ε1,ε2,并将ε1,ε2分别输入通过步骤S203得到的脉冲神经网络进行前向传播,得到事件流数据特征向量F1和F2。Step S204: Event stream encoding calculation. In this embodiment of the present application, according to the timestamp τ i j,j+1 of the intermediate frame calculated in step S202 , intercept the event streams ε 1 , ε 2 from the two boundary frames to the expected intermediate frame, and combine ε 1 , ε 2 The spiking neural network obtained in step S203 is respectively input to carry out forward propagation to obtain event stream data feature vectors F 1 and F 2 .
其中,两个边界帧到期望中间帧的事件流ε1和ε2的计算公式如下:Among them, the calculation formulas of event streams ε 1 and ε 2 from two boundary frames to expected intermediate frames are as follows:
其中,τi j,j+1是期望中间帧的时间戳,tj和tj+1是期望中间帧相邻输入低帧率视频帧的时间戳。where τ i j,j+1 is the timestamp of the expected intermediate frame, and t j and t j+1 are the timestamps of the input low frame rate video frames adjacent to the expected intermediate frame.
举例而言,以第15个期望得到的中间帧的时间戳,即本申请实施例在输入低帧率视频第2帧和第3帧中插入的第5帧,τ5 2,3=75ms为例,两个边界帧到期望中间帧的事件流ε1和ε2可以如表1所示。其中,表1和表2分别为事件流ε1和ε2的数据表。For example, taking the timestamp of the 15th expected intermediate frame, that is, the 5th frame inserted in the 2nd and 3rd frames of the input low frame rate video in the embodiment of the present application, τ 5 2,3 =75ms is For example, event streams ε 1 and ε 2 from two boundary frames to desired intermediate frames can be as shown in Table 1. Among them, Table 1 and Table 2 are the data tables of event streams ε 1 and ε 2 , respectively.
表一Table I
步骤S205:多模态融合网络构建。可以理解的是,数据融合网络包含一个粗合成子网络和一个微调子网络。其中,粗合成子网络使用U-Net结构,输入层的输入通道数为64+2×k,输出层的输出通道数为k;微调子网络使用U-Net结构,输入层的输入通道数为3×k,输出层的输出通道数为k。Step S205: building a multimodal fusion network. Understandably, the data fusion network consists of a coarse synthesis sub-network and a fine-tuned sub-network. Among them, the coarse synthesis sub-network uses the U-Net structure, the number of input channels of the input layer is 64+2×k, and the number of output channels of the output layer is k; the fine-tuning sub-network uses the U-Net structure, and the number of input channels of the input layer is 3×k, the number of output channels of the output layer is k.
其中,k为步骤S201中输入的低帧率视频的图像帧的通道数,即当步骤S201中输入的低帧率视频的图像帧为灰度图时,k=1,当步骤S201中输入的低帧率视频的图像帧为RGB图像时,k=3。Wherein, k is the channel number of the image frame of the low frame rate video input in step S201, that is, when the image frame of the low frame rate video input in step S201 is a grayscale image, k=1, when the input in step S201 When the image frame of the low frame rate video is an RGB image, k=3.
举例而言,本申请实施例可以输入步骤S201中输入的低帧率视频的图像帧为灰度图,即k=1,此时,粗合成子网络输入层的输入通道数为66,输出层的输出通道数为1;微调子网络输入层的输入通道数为3;输出层的输出通道数为1。For example, in this embodiment of the present application, the image frame of the low frame rate video input in step S201 can be input as a grayscale image, that is, k=1. The number of output channels is 1; the number of input channels of the fine-tuning sub-network input layer is 3; the number of output channels of the output layer is 1.
步骤S206:单一高帧率图像帧计算。作为一种可能实现的方式,本申请实施例可以将从步骤S202获得的低帧率视频的相邻图像帧和从步骤S203获得的第一事件流数据特征向量F1和第二事件流数据特征向量F2进行拼接,并输入至预设的多模态融合网络进行前向传播,生成一帧中间帧,以完成单一高帧率图像帧计算。Step S206: Single high frame rate image frame calculation. As a possible implementation manner, in this embodiment of the present application, the adjacent image frames of the low frame rate video obtained from step S202 and the first event stream data feature vector F 1 and the second event stream data feature obtained from step S203 can be The vector F2 is spliced and input to the preset multi-modal fusion network for forward propagation, and an intermediate frame is generated to complete the calculation of a single high frame rate image frame.
具体地,本申请实施例可以首先将低帧率视频相邻图像帧和事件流数据特征向量F1和F2拼接起来,输入到粗合成子网络中得到粗输出结果;随后将粗输出结果与输入相邻图像帧拼接起来,输入到微调子网络中得到最终输出结果。Specifically, in the embodiment of the present application, the adjacent image frames of the low frame rate video and the event stream data feature vectors F 1 and F 2 can be spliced together, and input into the coarse synthesis sub-network to obtain a coarse output result; then the coarse output result is combined with The input adjacent image frames are stitched together and input to the fine-tuning sub-network to get the final output.
举例而言,以第15个期望得到的中间帧为例,生成的中间帧如图4所示。For example, taking the 15th expected intermediate frame as an example, the generated intermediate frame is shown in FIG. 4 .
步骤S207:全部高帧率图像帧计算。进一步地,本申请实施例可以对于步骤S302中计算的期望每一个中间帧的时间戳,重复上述步骤S302至步骤S306,完成所有中间帧的计算。Step S207: Calculate all high frame rate image frames. Further, in this embodiment of the present application, the above steps S302 to S306 may be repeated for the timestamp of each expected intermediate frame calculated in step S302 to complete the calculation of all intermediate frames.
举例而言,本申请实施例可以输入低帧率视频包含N=31帧图像,若得到帧率提升n=10倍的高帧率视频,则需要重复步骤S202至步骤S206共计300次。For example, in this embodiment of the present application, a low frame rate video can be input including N=31 frames of images. If a high frame rate video with a frame rate increased by n=10 times is obtained, steps S202 to S206 need to be repeated a total of 300 times.
本申请实施例将步骤S207中得到的所有中间帧进行组合,构成高帧率视频,实现高帧率视频生成。In this embodiment of the present application, all intermediate frames obtained in step S207 are combined to form a high frame rate video, so as to realize the generation of a high frame rate video.
其中,以得到帧率提升n=10倍的高帧率视频为例,输入事件流、低帧率视频和生成的高帧率视频可以如图5所示。Wherein, taking obtaining a high frame rate video with a frame rate increase of n=10 times as an example, the input event stream, the low frame rate video and the generated high frame rate video may be as shown in FIG. 5 .
步骤S208:3D深度估计网络构建。具体地,本申请实施例构建的3D深度估计网络可以使用第三U-Net结构,输入层的输入通道数为3×k,输出层的输出通道数为1,其中,k为步骤S201中输入的低帧率视频的图像帧的通道数,即当步骤S201中输入的低帧率视频的图像帧为灰度图时,k=1,当步骤S201中输入的低帧率视频的图像帧为RGB图像时,k=3。Step S208: 3D depth estimation network construction. Specifically, the 3D depth estimation network constructed in the embodiment of the present application can use the third U-Net structure, the number of input channels of the input layer is 3×k, and the number of output channels of the output layer is 1, where k is the input in step S201 The number of channels of the image frame of the low frame rate video, that is, when the image frame of the low frame rate video input in step S201 is a grayscale image, k=1, and when the image frame of the low frame rate video input in step S201 is For RGB images, k=3.
步骤S209:高帧率3D深度估计计算。Step S209: high frame rate 3D depth estimation calculation.
步骤S210:数据后处理。在实际执行过程中,本申请实施例可以将上述步骤中获得的高帧率图像帧,与其前后相邻高帧率图像帧进行拼接,使用预设的3D深度估计网络进行前向传播,生成一系列高帧率深度图,并将生成的一系列高帧率深度图进行组合,构成高帧率3D视频,实现高帧率3D视频生成。Step S210: data post-processing. In the actual execution process, the embodiment of the present application can splicing the high frame rate image frame obtained in the above steps with the adjacent high frame rate image frames before and after, and use the preset 3D depth estimation network for forward propagation to generate a A series of high frame rate depth maps, and the generated series of high frame rate depth maps are combined to form a high frame rate 3D video to achieve high frame rate 3D video generation.
举例而言,如图6所示,本申请实施例可以实现10倍帧率提升下的高帧率深度图视频生成,实现高速环境下有效立体场景观测。For example, as shown in FIG. 6 , the embodiment of the present application can realize the generation of a high frame rate depth map video with a frame rate increase of 10 times, and realize effective stereoscopic scene observation in a high-speed environment.
根据本申请实施例提出的基于数据融合的高帧率3D视频生成方法,可以使用事件数据提供帧间运动信息,利用脉冲神经网络对事件流进行编码,并通过多模态融合网络得到所有中间帧,生成高帧率视频,进而利用3D深度估计网络构成高帧率3D视频,实现对于高速场景的有效的立体观测,通过使用事件流和低帧率视频图像帧作为输入,可以更好地使用多模态数据信息,进而提升高帧率3D视频的质量。由此,解决了相关技术中仅使用事件流作为输入,缺乏每个像素点的初始亮度值,从而导致生成的图像质量较低的技术问题。According to the method for generating high frame rate 3D video based on data fusion proposed in the embodiment of the present application, event data can be used to provide inter-frame motion information, a spiking neural network can be used to encode the event stream, and all intermediate frames can be obtained through a multi-modal fusion network. , generate a high frame rate video, and then use the 3D depth estimation network to form a high frame rate 3D video to achieve effective stereo observation for high-speed scenes. By using event streams and low frame rate video image frames as input, you can better use Modal data information, thereby improving the quality of high frame rate 3D video. As a result, the technical problem in the related art that only the event stream is used as an input and the initial brightness value of each pixel is lacking, thus resulting in a lower quality of the generated image, is solved.
其次参照附图描述根据本申请实施例提出的基于数据融合的高帧率3D视频生成装置。Next, the device for generating high frame rate 3D video based on data fusion proposed according to the embodiments of the present application will be described with reference to the accompanying drawings.
图7是本申请实施例的基于数据融合的高帧率3D视频生成装置的方框示意图。FIG. 7 is a schematic block diagram of an apparatus for generating a high frame rate 3D video based on data fusion according to an embodiment of the present application.
如图7所示,该基于数据融合的高帧率3D视频生成装置10包括:第一获取模块100、计算模块200、第二获取模块300、融合模块400和生成模块500。As shown in FIG. 7 , the high frame rate 3D
具体地,第一获取模块100,用于从事件相机获取低于预设帧率的视频和事件数据。Specifically, the first obtaining
计算模块200,用于将视频中相邻图像帧进行两两组合,生成多组相邻图像帧,并计算期望得到所有中间帧的时间戳集合。The
第二获取模块300,用于根据时间戳集合截取从两个边界帧到期望中间帧的第一事件流和第二事件流,并将第一事件流和第二事件流输入至预设的脉冲神经网络进行前向传播,得到第一事件流数据特征向量和第二事件流数据特征向量。The second acquiring
融合模块400,用于拼接相邻图像帧、第一事件流数据特征向量和第二事件流数据特征向量,并输入至预设的多模态融合网络进行前向传播,得到所有中间帧,生成高于第二预设帧率的高帧率视频。The
生成模块500,用于基于高帧率视频,利用预设的3D深度估计网络进行前向传播,得到所有高帧率深度图,并组合所有高帧率深度图,构成高帧率3D视频。The
可选地,在本申请的一个实施例中,基于数据融合的高帧率3D视频生成装置10还包括:构建模块。Optionally, in an embodiment of the present application, the
其中,构建模块,用于基于Spike Response模型作为神经元动力学模型,构建脉冲神经网络。Among them, the building block is used to build a spiking neural network based on the Spike Response model as a neuron dynamics model.
可选地,在本申请的一个实施例中,多模态融合网络包含粗合成子网络和微调子网络,其中,粗合成子网络使用第一U-Net结构,输入层的输入通道数为64+2×k,输出层的输出通道数为k,且微调子网络使用第二U-Net结构,输入层的输入通道数为3×k,输出层的输出通道数为k,k为低于预设帧率的视频的图像帧的通道数。Optionally, in an embodiment of the present application, the multimodal fusion network includes a coarse synthesis sub-network and a fine-tuning sub-network, wherein the coarse synthesis sub-network uses the first U-Net structure, and the number of input channels of the input layer is 64. +2×k, the number of output channels of the output layer is k, and the fine-tuning sub-network uses the second U-Net structure, the number of input channels of the input layer is 3×k, the number of output channels of the output layer is k, and k is lower than The number of channels of the image frame of the video with the preset frame rate.
可选地,在本申请的一个实施例中,3D深度估计网络使用第三U-Net结构,且输入层的输入通道数为3×k,输出层的输出通道数为1。Optionally, in an embodiment of the present application, the 3D depth estimation network uses a third U-Net structure, and the number of input channels of the input layer is 3×k, and the number of output channels of the output layer is 1.
可选地,在本申请的一个实施例中,所有中间帧的时间戳集合的计算公式为:Optionally, in an embodiment of the present application, the calculation formula of the timestamp set of all intermediate frames is:
其中,N为输入低帧率视频的总帧数,n为期望帧率提升的倍数,tj为输入低帧率视频第j帧的时间戳。Among them, N is the total number of frames of the input low frame rate video, n is the multiple of the expected frame rate increase, and t j is the timestamp of the jth frame of the input low frame rate video.
需要说明的是,前述对基于数据融合的高帧率3D视频生成方法实施例的解释说明也适用于该实施例的基于数据融合的高帧率3D视频生成装置,此处不再赘述。It should be noted that, the foregoing explanations of the embodiment of the method for generating high frame rate 3D video based on data fusion are also applicable to the apparatus for generating high frame rate 3D video based on data fusion in this embodiment, which will not be repeated here.
根据本申请实施例提出的基于数据融合的高帧率3D视频生成装置,可以使用事件数据提供帧间运动信息,利用脉冲神经网络对事件流进行编码,并通过多模态融合网络得到所有中间帧,生成高帧率视频,进而利用3D深度估计网络构成高帧率3D视频,实现对于高速场景的有效的立体观测,通过使用事件流和低帧率视频图像帧作为输入,可以更好地使用多模态数据信息,进而提升高帧率3D视频的质量。由此,解决了相关技术中仅使用事件流作为输入,缺乏每个像素点的初始亮度值,从而导致生成的图像质量较低的技术问题。According to the high frame rate 3D video generation device based on data fusion proposed in the embodiment of the present application, the event data can be used to provide inter-frame motion information, the spiking neural network can be used to encode the event stream, and all intermediate frames can be obtained through the multi-modal fusion network. , generate a high frame rate video, and then use the 3D depth estimation network to form a high frame rate 3D video to achieve effective stereo observation for high-speed scenes. By using event streams and low frame rate video image frames as input, you can better use Modal data information, thereby improving the quality of high frame rate 3D video. As a result, the technical problem in the related art that only the event stream is used as an input and the initial brightness value of each pixel is lacking, thus resulting in a lower quality of the generated image, is solved.
图8为本申请实施例提供的电子设备的结构示意图。该电子设备可以包括:FIG. 8 is a schematic structural diagram of an electronic device provided by an embodiment of the present application. The electronic device may include:
存储器801、处理器802及存储在存储器801上并可在处理器802上运行的计算机程序。
处理器802执行程序时实现上述实施例中提供的基于数据融合的高帧率3D视频生成方法。When the
进一步地,电子设备还包括:Further, the electronic device also includes:
通信接口803,用于存储器801和处理器802之间的通信。The
存储器801,用于存放可在处理器802上运行的计算机程序。The
存储器801可能包含高速RAM存储器,也可能还包括非易失性存储器(non-volatile memory),例如至少一个磁盘存储器。The
如果存储器801、处理器802和通信接口803独立实现,则通信接口803、存储器801和处理器802可以通过总线相互连接并完成相互间的通信。总线可以是工业标准体系结构(Industry Standard Architecture,简称为ISA)总线、外部设备互连(PeripheralComponent,简称为PCI)总线或扩展工业标准体系结构(Extended Industry StandardArchitecture,简称为EISA)总线等。总线可以分为地址总线、数据总线、控制总线等。为便于表示,图8中仅用一条粗线表示,但并不表示仅有一根总线或一种类型的总线。If the
可选地,在具体实现上,如果存储器801、处理器802及通信接口803,集成在一块芯片上实现,则存储器801、处理器802及通信接口803可以通过内部接口完成相互间的通信。Optionally, in terms of specific implementation, if the
处理器802可能是一个中央处理器(Central Processing Unit,简称为CPU),或者是特定集成电路(Application Specific Integrated Circuit,简称为ASIC),或者是被配置成实施本申请实施例的一个或多个集成电路。The
本实施例还提供一种计算机可读存储介质,其上存储有计算机程序,该程序被处理器执行时实现如上的基于数据融合的高帧率3D视频生成方法。This embodiment also provides a computer-readable storage medium on which a computer program is stored, and when the program is executed by a processor, implements the above method for generating a high frame rate 3D video based on data fusion.
在本说明书的描述中,参考术语“一个实施例”、“一些实施例”、“示例”、“具体示例”、或“一些示例”等的描述意指结合该实施例或示例描述的具体特征、结构、材料或者特点包含于本申请的至少一个实施例或示例中。在本说明书中,对上述术语的示意性表述不必须针对的是相同的实施例或示例。而且,描述的具体特征、结构、材料或者特点可以在任一个或N个实施例或示例中以合适的方式结合。此外,在不相互矛盾的情况下,本领域的技术人员可以将本说明书中描述的不同实施例或示例以及不同实施例或示例的特征进行结合和组合。In the description of this specification, description with reference to the terms "one embodiment," "some embodiments," "example," "specific example," or "some examples", etc., mean specific features described in connection with the embodiment or example , structure, material or feature is included in at least one embodiment or example of the present application. In this specification, schematic representations of the above terms are not necessarily directed to the same embodiment or example. Furthermore, the particular features, structures, materials or characteristics described may be combined in any suitable manner in any one or N of the embodiments or examples. Furthermore, those skilled in the art may combine and combine the different embodiments or examples described in this specification, as well as the features of the different embodiments or examples, without conflicting each other.
此外,术语“第一”、“第二”仅用于描述目的,而不能理解为指示或暗示相对重要性或者隐含指明所指示的技术特征的数量。由此,限定有“第一”、“第二”的特征可以明示或者隐含地包括至少一个该特征。在本申请的描述中,“N个”的含义是至少两个,例如两个,三个等,除非另有明确具体的限定。In addition, the terms "first" and "second" are only used for descriptive purposes, and should not be construed as indicating or implying relative importance or implying the number of indicated technical features. Thus, a feature delimited with "first", "second" may expressly or implicitly include at least one of that feature. In the description of the present application, "N" means at least two, such as two, three, etc., unless otherwise expressly and specifically defined.
流程图中或在此以其他方式描述的任何过程或方法描述可以被理解为,表示包括一个或更N个用于实现定制逻辑功能或过程的步骤的可执行指令的代码的模块、片段或部分,并且本申请的优选实施方式的范围包括另外的实现,其中可以不按所示出或讨论的顺序,包括根据所涉及的功能按基本同时的方式或按相反的顺序,来执行功能,这应被本申请的实施例所属技术领域的技术人员所理解。Any process or method description in the flowchart or otherwise described herein may be understood to represent a module, segment or portion of code comprising one or N more executable instructions for implementing custom logical functions or steps of the process , and the scope of the preferred embodiments of the present application includes alternative implementations in which the functions may be performed out of the order shown or discussed, including performing the functions substantially concurrently or in the reverse order depending upon the functions involved, which should It is understood by those skilled in the art to which the embodiments of the present application belong.
在流程图中表示或在此以其他方式描述的逻辑和/或步骤,例如,可以被认为是用于实现逻辑功能的可执行指令的定序列表,可以具体实现在任何计算机可读介质中,以供指令执行系统、装置或设备(如基于计算机的系统、包括处理器的系统或其他可以从指令执行系统、装置或设备取指令并执行指令的系统)使用,或结合这些指令执行系统、装置或设备而使用。就本说明书而言,"计算机可读介质"可以是任何可以包含、存储、通信、传播或传输程序以供指令执行系统、装置或设备或结合这些指令执行系统、装置或设备而使用的装置。计算机可读介质的更具体的示例(非穷尽性列表)包括以下:具有一个或N个布线的电连接部(电子装置),便携式计算机盘盒(磁装置),随机存取存储器(RAM),只读存储器(ROM),可擦除可编辑只读存储器(EPROM或闪速存储器),光纤装置,以及便携式光盘只读存储器(CDROM)。另外,计算机可读介质甚至可以是可在其上打印所述程序的纸或其他合适的介质,因为可以例如通过对纸或其他介质进行光学扫描,接着进行编辑、解译或必要时以其他合适方式进行处理来以电子方式获得所述程序,然后将其存储在计算机存储器中。The logic and/or steps represented in flowcharts or otherwise described herein, for example, may be considered an ordered listing of executable instructions for implementing the logical functions, may be embodied in any computer-readable medium, For use with, or in conjunction with, an instruction execution system, apparatus, or device (such as a computer-based system, a system including a processor, or other system that can fetch instructions from and execute instructions from an instruction execution system, apparatus, or apparatus) or equipment. For the purposes of this specification, a "computer-readable medium" can be any device that can contain, store, communicate, propagate, or transport the program for use by or in connection with an instruction execution system, apparatus, or apparatus. More specific examples (non-exhaustive list) of computer readable media include the following: electrical connections (electronic devices) with one or N wires, portable computer disk cartridges (magnetic devices), random access memory (RAM), Read Only Memory (ROM), Erasable Editable Read Only Memory (EPROM or Flash Memory), Fiber Optic Devices, and Portable Compact Disc Read Only Memory (CDROM). In addition, the computer readable medium may even be paper or other suitable medium on which the program may be printed, as the paper or other medium may be optically scanned, for example, followed by editing, interpretation, or other suitable medium as necessary process to obtain the program electronically and then store it in computer memory.
应当理解,本申请的各部分可以用硬件、软件、固件或它们的组合来实现。在上述实施方式中,N个步骤或方法可以用存储在存储器中且由合适的指令执行系统执行的软件或固件来实现。如,如果用硬件来实现和在另一实施方式中一样,可用本领域公知的下列技术中的任一项或他们的组合来实现:具有用于对数据信号实现逻辑功能的逻辑门电路的离散逻辑电路,具有合适的组合逻辑门电路的专用集成电路,可编程门阵列(PGA),现场可编程门阵列(FPGA)等。It should be understood that various parts of this application may be implemented in hardware, software, firmware, or a combination thereof. In the above-described embodiments, the N steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. For example, if implemented in hardware as in another embodiment, it can be implemented by any one of the following techniques known in the art, or a combination thereof: discrete with logic gates for implementing logic functions on data signals Logic circuits, application specific integrated circuits with suitable combinational logic gates, Programmable Gate Arrays (PGA), Field Programmable Gate Arrays (FPGA), etc.
本技术领域的普通技术人员可以理解实现上述实施例方法携带的全部或部分步骤是可以通过程序来指令相关的硬件完成,所述的程序可以存储于一种计算机可读存储介质中,该程序在执行时,包括方法实施例的步骤之一或其组合。Those skilled in the art can understand that all or part of the steps carried by the methods of the above embodiments can be completed by instructing the relevant hardware through a program, and the program can be stored in a computer-readable storage medium, and the program can be stored in a computer-readable storage medium. When executed, one or a combination of the steps of the method embodiment is included.
此外,在本申请各个实施例中的各功能单元可以集成在一个处理模块中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个模块中。上述集成的模块既可以采用硬件的形式实现,也可以采用软件功能模块的形式实现。所述集成的模块如果以软件功能模块的形式实现并作为独立的产品销售或使用时,也可以存储在一个计算机可读取存储介质中。In addition, each functional unit in each embodiment of the present application may be integrated into one processing module, or each unit may exist physically alone, or two or more units may be integrated into one module. The above-mentioned integrated modules can be implemented in the form of hardware, and can also be implemented in the form of software function modules. If the integrated modules are implemented in the form of software functional modules and sold or used as independent products, they may also be stored in a computer-readable storage medium.
上述提到的存储介质可以是只读存储器,磁盘或光盘等。尽管上面已经示出和描述了本申请的实施例,可以理解的是,上述实施例是示例性的,不能理解为对本申请的限制,本领域的普通技术人员在本申请的范围内可以对上述实施例进行变化、修改、替换和变型。The above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, and the like. Although the embodiments of the present application have been shown and described above, it should be understood that the above embodiments are exemplary and should not be construed as limitations to the present application. Embodiments are subject to variations, modifications, substitutions and variations.
Claims (10)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210293645.3A CN114885144B (en) | 2022-03-23 | 2022-03-23 | High frame rate 3D video generation method and device based on data fusion |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210293645.3A CN114885144B (en) | 2022-03-23 | 2022-03-23 | High frame rate 3D video generation method and device based on data fusion |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114885144A true CN114885144A (en) | 2022-08-09 |
CN114885144B CN114885144B (en) | 2023-02-07 |
Family
ID=82667857
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210293645.3A Active CN114885144B (en) | 2022-03-23 | 2022-03-23 | High frame rate 3D video generation method and device based on data fusion |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114885144B (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115883764A (en) * | 2023-02-08 | 2023-03-31 | 吉林大学 | Underwater high-speed video frame interpolation method and system based on data cooperation |
WO2024179078A1 (en) * | 2023-02-28 | 2024-09-06 | 万有引力(宁波)电子科技有限公司 | Fused display method and system, and storage medium |
CN118628892A (en) * | 2024-06-05 | 2024-09-10 | 桂林康基大数据智能研究院 | Image processing system with image sensor |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20160140733A1 (en) * | 2014-11-13 | 2016-05-19 | Futurewei Technologies, Inc. | Method and systems for multi-view high-speed motion capture |
CN111667442A (en) * | 2020-05-21 | 2020-09-15 | 武汉大学 | High-quality high-frame-rate image reconstruction method based on event camera |
CN113888639A (en) * | 2021-10-22 | 2022-01-04 | 上海科技大学 | Visual odometry positioning method and system based on event camera and depth camera |
CN114071114A (en) * | 2022-01-17 | 2022-02-18 | 季华实验室 | Event camera, depth event point diagram acquisition method, device, equipment and medium |
-
2022
- 2022-03-23 CN CN202210293645.3A patent/CN114885144B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20160140733A1 (en) * | 2014-11-13 | 2016-05-19 | Futurewei Technologies, Inc. | Method and systems for multi-view high-speed motion capture |
CN111667442A (en) * | 2020-05-21 | 2020-09-15 | 武汉大学 | High-quality high-frame-rate image reconstruction method based on event camera |
CN113888639A (en) * | 2021-10-22 | 2022-01-04 | 上海科技大学 | Visual odometry positioning method and system based on event camera and depth camera |
CN114071114A (en) * | 2022-01-17 | 2022-02-18 | 季华实验室 | Event camera, depth event point diagram acquisition method, device, equipment and medium |
Non-Patent Citations (1)
Title |
---|
宋彩霞等: "5G网络下VR视频的传输方法研究", 《电视技术》 * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115883764A (en) * | 2023-02-08 | 2023-03-31 | 吉林大学 | Underwater high-speed video frame interpolation method and system based on data cooperation |
WO2024179078A1 (en) * | 2023-02-28 | 2024-09-06 | 万有引力(宁波)电子科技有限公司 | Fused display method and system, and storage medium |
CN118628892A (en) * | 2024-06-05 | 2024-09-10 | 桂林康基大数据智能研究院 | Image processing system with image sensor |
Also Published As
Publication number | Publication date |
---|---|
CN114885144B (en) | 2023-02-07 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN114885144B (en) | High frame rate 3D video generation method and device based on data fusion | |
CN109803175B (en) | Video processing method and device, video processing equipment and storage medium | |
US11663691B2 (en) | Method and apparatus for restoring image | |
CN110610486B (en) | Monocular image depth estimation method and device | |
CN111835983B (en) | A method and system for multi-exposure high dynamic range imaging based on generative adversarial network | |
CN114881921B (en) | De-occlusion imaging method and device based on event and video fusion | |
US20200195910A1 (en) | Apparatus including multiple cameras and image processing method | |
CN111652921B (en) | Monocular depth prediction model generation method and monocular depth prediction method | |
Han et al. | Hybrid high dynamic range imaging fusing neuromorphic and conventional images | |
WO2019056549A1 (en) | Image enhancement method, and image processing device | |
CN114885112B (en) | Method and device for generating high frame rate video based on data fusion | |
CN111079507B (en) | Behavior recognition method and device, computer device and readable storage medium | |
CN116485863A (en) | Depth image video generation method and device based on data fusion | |
WO2022247394A1 (en) | Image splicing method and apparatus, and storage medium and electronic device | |
CN114782596A (en) | Voice-driven facial animation generation method, device, device and storage medium | |
WO2023160426A1 (en) | Video frame interpolation method and apparatus, training method and apparatus, and electronic device | |
CN117956130A (en) | Video processing method, device, equipment, system and readable storage medium | |
WO2023061187A1 (en) | Optical flow estimation method and device | |
Ding et al. | Video Frame Interpolation with Stereo Event and Intensity Cameras | |
JP2008113292A (en) | Motion estimation method and device, program thereof and recording medium thereof | |
CN113962964B (en) | Specified object erasing method and device based on time sequence image data | |
CN114881866B (en) | De-occlusion 3D imaging method and device based on event data | |
CN110516681A (en) | Image Feature Extraction Method and Salient Object Prediction Method | |
CN114881868B (en) | De-occlusion 3D imaging method and device based on event and video fusion | |
CN115239584A (en) | Method and device for removing portrait glasses and shadows thereof by using three-dimensional synthetic data |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |