CN116452649A

CN116452649A - Event data enhancement-based moving object reconstruction method and device

Info

Publication number: CN116452649A
Application number: CN202310269687.8A
Authority: CN
Inventors: 高跃; 卢嘉轩; 万海; 赵曦滨
Original assignee: Tsinghua University
Current assignee: Tsinghua University
Priority date: 2023-03-16
Filing date: 2023-03-16
Publication date: 2023-07-18

Abstract

The application relates to a method and a device for reconstructing a moving object based on event data enhancement, wherein the method comprises the following steps: collecting motion video data containing a moving object; generating a high frame rate reconstructed image sequence satisfying a first preset frame rate condition based on the motion video data; inputting the single-view depth map into a preset point cloud generation network to generate a single-view dense point cloud; inputting the single-view dense point cloud under each view angle into a preset multi-view point cloud fusion network to obtain fused dense point cloud; and processing the fused dense point cloud obtained at each moment, and rendering to obtain the three-dimensional motion trail of the object. Therefore, the problems that in the related art, accurate motion trail of a high-speed moving object is difficult to obtain through a detection or segmentation algorithm, tracking capability of the high-speed moving object is blocked based on a fixed frame rate when a traditional camera collects data, and the refinement level of reconstruction of the moving object is reduced are solved.

Description

Event data enhancement-based moving object reconstruction method and device

Technical Field

The present disclosure relates to the field of computer vision and motion object reconstruction technologies, and in particular, to a motion object reconstruction method and device based on event data enhancement.

Background

Three-dimensional reconstruction is a research direction of computer vision and graphics, and is characterized in that three-dimensional scenes are reconstructed through images or videos acquired by monocular or binocular cameras, and the three-dimensional reconstruction has wide application potential in metauniverse, virtual reality and man-machine interaction. While there have been many related prior efforts for reconstruction of objects or scenes, there has been a lack of related effective methods for reconstruction of moving objects.

In the related art, a traditional camera can be used, and the reconstruction of a moving object with low frame rate can be realized through a detection or segmentation algorithm, or different sensors are combined for data fusion, so that high-quality perception information comprising detection, tracking and the like of an object target can be obtained.

However, in the related art, it is difficult to obtain an accurate motion track of a high-speed moving object through a detection or segmentation algorithm, and in view of the fact that the tracking capability of the high-speed moving object is hindered based on a fixed frame rate when the traditional camera collects data, the refinement level of reconstruction of the moving object is reduced, and improvement is needed.

Disclosure of Invention

The application provides a method and a device for reconstructing a moving object based on event data enhancement, which are used for solving the problems that in the related art, an accurate motion track of a high-speed moving object is difficult to obtain through a detection or segmentation algorithm, the tracking capability of the traditional camera on the high-speed motion is hindered due to the fact that the traditional camera is based on a fixed frame rate when data are acquired, the refinement level of the reconstruction of the moving object is reduced, and the like.

An embodiment of a first aspect of the present application provides a method for reconstructing a moving object based on event data enhancement, including the steps of: collecting motion video data containing a moving object; generating a high frame rate reconstructed image sequence meeting a first preset frame rate condition based on the motion video data; reconstructing an image sequence for the high frame rate at each view angle, generating a single view depth map based on the corresponding event intermediate representation; inputting the single-view depth map into a preset point cloud generation network to generate a single-view dense point cloud; inputting the single-view dense point cloud under each view angle into a preset multi-view point cloud fusion network to obtain fused dense point cloud; and processing the fused dense point cloud obtained at each moment, and rendering to obtain the three-dimensional motion trail of the object.

Optionally, in one embodiment of the present application, the generating, based on the motion video data, a high frame rate reconstructed image sequence that satisfies a preset frame rate condition includes: preprocessing the motion video data to respectively obtain a low-frame-rate image sequence and an event stream which meet a second preset frame rate condition; inputting the low-frame-rate image sequence and the event stream into a preset video interpolation network to generate a high-frame-rate reconstructed image sequence meeting the preset frame rate condition.

Optionally, in an embodiment of the present application, the set of event points of the event stream includes event point coordinates, event point time stamps and event point polarities.

Optionally, in one embodiment of the present application, the inputting the low frame rate image sequence and the event stream into a preset video interpolation network generates a high frame rate reconstructed image sequence that meets the preset frame rate condition, including: according to each adjacent image frame in the low-frame-rate image sequence, respectively accumulating the event stream in the forward direction and the reverse direction to obtain a forward event intermediate representation and a reverse event intermediate representation; respectively sending the forward event intermediate representation and the backward event intermediate representation into a preset optical flow prediction network to obtain a forward optical flow and a backward optical flow; for each adjacent image frame in the low frame rate image sequence, splitting the forward optical flow and the backward optical flow according to a linear proportion, adding the image frame at the previous moment and the split forward optical flow to obtain a forward reconstructed image frame, and subtracting the image frame at the next moment and the split backward optical flow to obtain a backward reconstructed image frame; and inputting the forward reconstructed image frame and the reverse reconstructed image frame into a preset bidirectional image composition layer to obtain a reconstructed image so as to generate the high-frame-rate reconstructed image sequence.

Optionally, in one embodiment of the present application, the reconstructing the image sequence for the high frame rate at each view angle, generating a single view depth map based on the corresponding event intermediate representation includes: traversing the high frame rate reconstructed image sequence under each view angle, accumulating the event stream according to the frame rate, and obtaining the event intermediate representation under each frame rate; and traversing each image frame in the high frame rate reconstruction image sequence, representing the image frame and the corresponding event intermediate representation together into a depth map prediction network, and generating the single-view depth map.

Optionally, in an embodiment of the present application, the inputting the single-view depth map into a preset point cloud generating network generates a single-view dense point cloud, including: traversing each view angle, and inputting the single-view depth map and the corresponding event intermediate representation into the point cloud generation network to obtain a single-view sparse point cloud; and traversing each view angle, and carrying out up-sampling and fine tuning on the single-view sparse point cloud to obtain the single-view dense point cloud.

Optionally, in an embodiment of the present application, the processing the fused dense point cloud obtained at each moment, and rendering to obtain a three-dimensional motion track of an object, includes: splicing the fused dense point clouds obtained at each moment to generate a continuous object point cloud motion track; and converting the object point cloud motion trail into a grid model to obtain the object three-dimensional motion trail.

An embodiment of a second aspect of the present application provides a moving object reconstruction device based on event data enhancement, including: the acquisition module is used for acquiring motion video data containing a moving object; the first generation module is used for generating a high-frame-rate reconstructed image sequence meeting a first preset frame rate condition based on the motion video data; the second generation module is used for reconstructing an image sequence at the high frame rate under each view angle and generating a single-view depth map based on the corresponding event intermediate representation; the third generation module is used for inputting the single-view depth map into a preset point cloud generation network to generate a single-view dense point cloud; the fusion module is used for inputting the single-view dense point cloud under each view angle into a preset multi-view point cloud fusion network to obtain fused dense point clouds; and the reconstruction module is used for processing the fused dense point cloud obtained at each moment and rendering to obtain the three-dimensional motion trail of the object.

Optionally, in one embodiment of the present application, the first generating module includes: the first acquisition unit is used for preprocessing the motion video data to respectively acquire a low-frame-rate image sequence and an event stream which meet a second preset frame rate condition; the first generation unit is used for inputting the low-frame-rate image sequence and the event stream into a preset video frame inserting network to generate a high-frame-rate reconstructed image sequence meeting the preset frame rate condition.

Optionally, in one embodiment of the present application, the first generating unit includes: the first acquisition subunit is used for respectively accumulating the event streams in the forward direction and the reverse direction according to each adjacent image frame in the low-frame-rate image sequence to obtain a forward event intermediate representation and a reverse event intermediate representation; the second acquisition subunit is used for respectively sending the forward event intermediate representation and the backward event intermediate representation into a preset optical flow prediction network to obtain a forward optical flow and a backward optical flow; a third obtaining subunit, configured to split, for each adjacent image frame in the low frame rate image sequence, the forward optical flow and the backward optical flow according to a linear proportion, add an image frame at a previous time with the split forward optical flow to obtain a forward reconstructed image frame, and subtract an image frame at a next time with the split backward optical flow to obtain a backward reconstructed image frame; and the generation subunit is used for inputting the forward direction reconstruction image frame and the reverse direction reconstruction image frame into a preset bidirectional image composition layer to obtain a reconstruction image so as to generate the high frame rate reconstruction image sequence.

Optionally, in one embodiment of the present application, the third generating module includes: the first traversing unit is used for traversing the high frame rate reconstruction image sequence under each view angle, accumulating the event stream according to the frame rate and obtaining the event intermediate representation under each frame rate; and the second generation unit is used for traversing each image frame in the high-frame-rate reconstruction image sequence, representing the image frame and the corresponding event intermediate representation and inputting the image frame and the corresponding event intermediate representation into a depth map prediction network to generate the single-view depth map.

Optionally, in an embodiment of the present application, the third generating module further includes: the second obtaining unit is used for traversing each view angle, inputting the single-view depth map and the corresponding event intermediate representation into the point cloud generating network, and obtaining single-view sparse point cloud; and the second traversing unit is used for traversing each view angle, and carrying out up-sampling and fine tuning operation on the single-view sparse point cloud to obtain the single-view dense point cloud.

Optionally, in one embodiment of the present application, the reconstruction module includes: the splicing unit is used for splicing the fused dense point clouds obtained at each moment to generate a continuous object point cloud motion track; the conversion unit is used for converting the object point cloud motion trail into a grid model to obtain the object three-dimensional motion trail.

An embodiment of a third aspect of the present application provides an electronic device, including: the system comprises a memory, a processor and a computer program stored in the memory and capable of running on the processor, wherein the processor executes the program to realize the method for reconstructing the moving object based on the event data enhancement as described in the embodiment.

A fourth aspect of the present application provides a computer-readable storage medium storing a computer program which, when executed by a processor, implements the above method for reconstructing a moving object based on event data enhancement.

According to the method and the device for reconstructing the moving object, the moving video data containing the moving object can be collected, the image sequence is reconstructed, the single-view depth map is generated, and finally the three-dimensional moving track of the object is obtained, so that the advantage of high time resolution of event data is fully utilized, the limitation of a pure traditional camera in high-speed moving object tracking is broken through, and finer moving object reconstruction is obtained. Therefore, the problems that in the related art, accurate motion trail of a high-speed moving object is difficult to obtain through a detection or segmentation algorithm, tracking capability of the high-speed moving object is blocked based on a fixed frame rate when a traditional camera collects data, and the refinement level of reconstruction of the moving object is reduced are solved.

Additional aspects and advantages of the application will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the application.

Drawings

The foregoing and/or additional aspects and advantages of the present application will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings, in which:

FIG. 1 is a flow chart of a method for event data based enhanced reconstruction of a moving object according to an embodiment of the present application;

FIG. 2 is a flow chart of a method of event data based enhanced reconstruction of a moving object according to one embodiment of the present application;

FIG. 3 is a schematic diagram of a method of event data based enhanced reconstruction of moving objects according to one embodiment of the present application;

FIG. 4 is a schematic structural diagram of a moving object reconstruction device based on event data enhancement according to an embodiment of the present application;

fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

Embodiments of the present application are described in detail below, examples of which are illustrated in the accompanying drawings, wherein the same or similar reference numerals refer to the same or similar elements or elements having the same or similar functions throughout. The embodiments described below by referring to the drawings are exemplary and intended for the purpose of explaining the present application and are not to be construed as limiting the present application.

The following describes a method and an apparatus for reconstructing a moving object based on event data enhancement according to an embodiment of the present application with reference to the accompanying drawings. In the related art mentioned in the background technology center, by means of detection or segmentation algorithm, accurate motion trail of a high-speed moving object is difficult to obtain, in view of the problem that the tracking capability of a traditional camera on the high-speed moving object is blocked and the refinement level of the reconstruction of the moving object is reduced due to the fact that the traditional camera is based on fixed frame rate when acquiring data, the application provides a method for reconstructing the moving object based on event data enhancement, in the method, moving video data containing the moving object can be acquired, an image sequence is reconstructed, a single-view depth map is generated, and finally, the three-dimensional motion trail of the object is obtained, so that the advantage of high time resolution of the event data is fully utilized, the limitation of a pure traditional camera on tracking the high-speed moving object is broken through, and finer reconstruction of the moving object is obtained. Therefore, the problems that in the related art, accurate motion trail of a high-speed moving object is difficult to obtain through a detection or segmentation algorithm, tracking capability of the high-speed moving object is blocked based on a fixed frame rate when a traditional camera collects data, and the refinement level of reconstruction of the moving object is reduced are solved.

Specifically, fig. 1 is a schematic flow chart of a method for reconstructing a moving object based on event data enhancement according to an embodiment of the present application.

As shown in fig. 1, the method for reconstructing a moving object based on event data enhancement includes the steps of:

in step S101, motion video data including a moving object is acquired.

It can be appreciated that the embodiment of the application may collect motion video data including a moving object, such as through a conventional camera and an event camera, where the conventional camera is a common camera that collects three-channel color information and is based on equidistant exposure, and the event camera is a novel dynamic vision sensor that operates asynchronously with pixels and only records changes in light intensity.

In the actual implementation process, the embodiment of the application can acquire the motion video data containing the motion object through the traditional camera and the event camera, provides a basis for obtaining a high-frame-rate reconstruction image sequence according to the motion video data, and further breaks through the limitation of the traditional camera on high-speed object tracking to obtain finer motion object reconstruction.

In step S102, a high frame rate reconstructed image sequence satisfying a first preset frame rate condition is generated based on the motion video data.

It may be appreciated that, in the embodiment of the present application, the condition of meeting the first preset frame rate may meet a certain high frame rate for the reconstructed image sequence.

In an actual execution process, the embodiment of the application can generate a high-frame-rate reconstructed image sequence, and generate the high-frame-rate reconstructed image sequence meeting the first preset frame rate condition according to the acquired motion video data, so as to further obtain finer motion object reconstruction.

It should be noted that the first preset frame rate may be set by those skilled in the art according to actual situations, and is not particularly limited herein.

Optionally, in one embodiment of the present application, generating a high frame rate reconstructed image sequence that satisfies a preset frame rate condition based on the motion video data includes: preprocessing the motion video data to respectively obtain a low-frame-rate image sequence and an event stream which meet a second preset frame rate condition; inputting the low-frame-rate image sequence and the event stream into a preset video frame inserting network to generate a high-frame-rate reconstructed image sequence meeting the preset frame rate condition.

It may be appreciated that, in the embodiment of the present application, the condition that the second preset frame rate is met may be that the image sequence meets a certain low frame rate, where the low frame rate image sequence is an image sequence collected by a conventional camera, and represents a fixed number of images per second, and the event stream is a set of event points collected by an event camera.

In the actual execution process, the embodiment of the application can preprocess the collected motion video data to obtain a low-frame-rate image sequence and an event stream respectively, namely, an image sequence collected by a traditional camera and an event point set collected by an event camera are obtained, and further, the embodiment of the application can input the low-frame-rate image sequence and the event stream into a certain video frame-inserting network to generate a high-frame-rate reconstructed image sequence meeting certain frame-rate conditions, so that the advantage of high time resolution of the event data is fully utilized, and the reconstruction of a moving object is enhanced.

It should be noted that the second preset frame rate condition may be set by those skilled in the art according to actual situations, and is not particularly limited herein.

Optionally, in one embodiment of the present application, the set of event points of the event stream includes event point coordinates, event point time stamps and event point polarities.

In some embodiments, the event point set of the event stream includes, but is not limited to, an event point coordinate, an event point timestamp, and an event point polarity, where the event point coordinate specifically includes an event point abscissa and an event point ordinate, the event point polarity specifically includes an event positive polarity and an event negative polarity, and the basis is provided for a subsequent input video frame inserting network through the event point set of the event stream, so that the moving object reconstruction is promoted.

Optionally, in one embodiment of the present application, inputting the low frame rate image sequence and the event stream into a preset video interpolation network, generating the high frame rate reconstructed image sequence meeting the preset frame rate condition includes: respectively accumulating event streams in the forward direction and the reverse direction according to each adjacent image frame in the low-frame-rate image sequence to obtain a forward event intermediate representation and a reverse event intermediate representation; respectively sending the forward event intermediate representation and the backward event intermediate representation into a preset optical flow prediction network to obtain a forward optical flow and a backward optical flow; for each adjacent image frame in the low frame rate image sequence, splitting the forward optical flow and the backward optical flow according to a linear proportion, adding the image frame at the previous moment and the split forward optical flow to obtain a forward reconstructed image frame, and subtracting the image frame at the next moment and the split backward optical flow to obtain a backward reconstructed image frame; and inputting the forward reconstructed image frame and the reverse reconstructed image frame into a preset bidirectional image composition layer to obtain a reconstructed image so as to generate a high-frame-rate reconstructed image sequence.

For example, the embodiment of the application can respectively count the number of the pixel-level event points between each adjacent image frame in the forward direction and the reverse direction to form a matrix consistent with the resolution of an original event camera, and the matrix is used as a forward event intermediate representation and a reverse event intermediate representation; the optical flow prediction network in the embodiment of the application can be composed of a convolution layer and a deconvolution layer, and the forward event intermediate representation and the backward event intermediate representation are respectively sent into a certain optical flow prediction network, so that a forward optical flow and a backward optical flow are obtained; for each adjacent image frame in the low frame rate image sequence, the embodiment of the application can split the forward optical flow and the backward optical flow according to a linear proportion, add the image frame at the previous moment and the split forward optical flow to obtain a forward reconstructed image frame, and subtract the image frame at the next moment and the split backward optical flow to obtain a backward reconstructed image frame; the bidirectional image composition layer in the embodiment of the application can respectively use the convolution layer to carry out fine tuning transformation on the forward reconstruction image frame and the reverse reconstruction image frame, and then carry out pixel-level averaging operation to obtain a single reconstruction image, thereby realizing the reconstruction of the moving object by taking the event data as enhancement.

In step S103, a sequence of images is reconstructed for a high frame rate at each view angle, generating a single view depth map based on the corresponding event intermediate representation.

As a possible implementation manner, the embodiment of the application can generate a single-view depth map, reconstruct an image sequence at a high frame rate under each view angle, and generate the single-view depth map by means of corresponding event intermediate characterization, thereby providing basis for a subsequent input point cloud generation network and further ensuring finer moving object reconstruction.

Optionally, in one embodiment of the present application, reconstructing the image sequence for a high frame rate at each view angle, generating the single view depth map based on the corresponding event intermediate representation includes: traversing the high frame rate reconstruction image sequence under each view angle, accumulating event streams according to the frame rates, and obtaining event intermediate representation under each frame rate; and traversing each image frame in the high-frame-rate reconstructed image sequence, and inputting the image frames and the corresponding event intermediate representation into a depth map prediction network together to generate a single-view depth map.

For example, the embodiment of the application can reconstruct the frame rate of the image sequence for the high frame rate, count the number of the pixel-level event points between any two adjacent images, and form a matrix consistent with the resolution of the original event camera as the event intermediate representation of the next image at the frame rate.

Further, the depth map prediction network in the embodiment of the application may be formed by a convolution layer and a deconvolution layer, each image frame in the high frame rate reconstruction image sequence is traversed, the image frame and the corresponding event intermediate representation are taken as input together, the depth map prediction network is input, so as to generate a single view depth map, for example, the image frame and the event intermediate representation are respectively subjected to a plurality of convolution layer operations and then are spliced, and then are restored through a plurality of deconvolution layers to obtain the single view depth map with the same resolution as the original image sequence.

In step S104, the single-view depth map is input into a preset point cloud generating network to generate a single-view dense point cloud.

Specifically, the embodiment of the application can generate the single-view point cloud, input the single-view depth map into a point cloud generation network, further generate the single-view dense point cloud, and guarantee three-dimensional reconstruction of the moving object.

Optionally, in one embodiment of the present application, inputting the single-view depth map into a preset point cloud generating network, generating a single-view dense point cloud includes: traversing each view angle, and generating a network by representing the input point cloud between the single-view depth map and the corresponding event to obtain a single-view sparse point cloud; and traversing each view angle, and carrying out up-sampling and fine tuning on the single-view sparse point cloud to obtain a single-view dense point cloud.

As a possible implementation manner, the embodiment of the application may traverse each view angle, and represent the single-view depth map and the corresponding event in the middle of the input point cloud generating network, where the point cloud generating network may be composed of transformation operations based on camera internal parameters, and convert an image coordinate system under the single-view depth map into a point cloud coordinate system through a transformation matrix composed of camera internal parameters, so as to generate a single-view sparse point cloud.

Further, the embodiment of the application may traverse each view angle, and perform up-sampling and fine tuning operations on the single-view sparse point cloud, where the up-sampling and fine tuning operations may be implemented by a multi-layer sensing network based on the point cloud, for example, fine tuning may be performed after sampling is performed, so as to obtain a single-attempt dense point cloud.

According to the embodiment of the application, the single-view sparse point cloud and the single-view dense point cloud can be obtained by traversing each view angle, so that the motion trail of an object is obtained by utilizing the point cloud characteristics, and the planar reconstruction of the single-view sparse point cloud and the three-dimensional reconstruction effect of the single-view dense point cloud are improved.

In step S105, the single-view dense point cloud under each view angle is input into a preset multi-view point cloud fusion network, so as to obtain a fused dense point cloud.

In some embodiments, multi-view point cloud fusion can be performed, and single-view dense point clouds under each view angle are input into a certain multi-view point cloud fusion network to obtain fused dense point clouds, wherein the multi-view point cloud fusion network comprises regional point cloud feature extraction operation, multi-view point cloud registration operation and multi-view point cloud fusion operation.

Further, the regional point cloud feature extraction operation in the embodiment of the application can be realized through a multi-layer sensing network with multi-scale grouping so as to divide the point cloud into a plurality of regions, and fine-granularity structural information of the point cloud is extracted through a multi-layer sensor in the regions, so that the regional point cloud feature with high dimensionality is obtained. The multi-view point cloud registration operation in the embodiment of the application can perform feature matching based on point cloud region-level features of different views, minimize geometric projection errors by performing corresponding search and transformation estimation in an iterative manner, and finally obtain corresponding relations among the point cloud region-level features under the different views. The multi-view point cloud fusion operation in the embodiment of the application can fusion the point clouds based on the point cloud regional level features under different views and the corresponding relations between the point cloud regional level features, splice the point cloud regional level features under different views according to the corresponding relations, generate fused dense point clouds, and further guarantee three-dimensional reconstruction of a moving object.

In step S106, the fused dense point cloud obtained at each moment is processed, and the three-dimensional motion trail of the object is rendered.

In the actual execution process, the embodiment of the application can reconstruct a moving object, process the fused dense point cloud obtained at each moment, and render to obtain a three-dimensional moving track of the object, so as to obtain finer moving object reconstruction according to the three-dimensional moving track of the object.

Optionally, in an embodiment of the present application, processing the fused dense point cloud obtained at each moment, and rendering to obtain a three-dimensional motion track of the object includes: splicing the fused dense point clouds obtained at each moment to generate a continuous object point cloud motion track; and converting the motion trail of the object point cloud into a grid model to obtain the three-dimensional motion trail of the object.

In other embodiments, the fused dense point clouds obtained at each moment may be spliced to form a continuous object point cloud motion track, and the object point cloud motion track is converted into a grid model to obtain a three-dimensional object motion track.

For example, the embodiment of the application may convert the motion track of the object point cloud into the grid model through a point cloud smoothing operation, a normal line computing operation and a grid generating operation, wherein the point cloud smoothing operation may use a statistical analysis method to filter irregular data in the point cloud, the normal line computing operation may use an approximation estimation method to estimate the normal line, and the grid generating operation may generate the grid model through a greedy triangulation algorithm, so as to enhance the reconstruction of the moving object.

Specifically, with reference to fig. 2 and fig. 3, the working principle of the method for reconstructing a moving object based on event data enhancement according to the embodiments of the present application will be described in detail with a specific embodiment.

As shown in fig. 2, an embodiment of the present application may include the following steps:

step S201: video and event data acquisition: video containing moving objects is captured using a conventional camera and an event camera, respectively. The embodiment of the application can respectively use a traditional camera and an event camera to acquire videos containing moving objects, wherein the traditional camera is a common camera for acquiring three-channel color information and based on equidistant exposure, and the event camera is a novel dynamic vision sensor which is operated asynchronously by pixels and only records light intensity changes.

Step S202: high frame rate reconstructed image sequence generation: and generating a high frame rate reconstructed image sequence by using the data acquired by the traditional camera and the event camera. According to the embodiment of the application, the data acquired by the traditional camera and the event camera can be preprocessed to respectively obtain a low-frame-rate image sequence and an event stream, wherein the low-frame-rate image sequence is the image sequence acquired by the traditional camera and is expressed as a fixed number of images per second; the event stream is an event point set collected by an event camera; the event points comprise event point coordinates, event point time stamps and event point polarities; the event point coordinates comprise an event point abscissa and an event point ordinate; event point polarity includes event positive polarity and event negative polarity.

Step S203: single view depth map generation: reconstructing the image sequence for a high frame rate at each view angle, generating a single view depth map by means of the corresponding event intermediate representation. According to the embodiment of the application, the low-frame-rate image sequence and the event stream can be input into a video frame inserting network to generate the high-frame-rate reconstructed image sequence, and the single-view depth map is generated by means of corresponding event intermediate characterization aiming at the high-frame-rate reconstructed image sequence under each view angle.

Step S204: single view point cloud generation: and inputting the single-view depth map into a point cloud generation network to generate a single-view dense point cloud. According to the embodiment of the application, each view angle can be traversed, the single-view depth map and the corresponding event intermediate representation input point cloud are generated into a network, so that single-view sparse point clouds are obtained, each view angle is traversed, and up-sampling and fine tuning operations are carried out on the single-view sparse point clouds, so that single-view dense point clouds are obtained.

Step S205: multi-view point cloud fusion: inputting the single-view dense point cloud under each view angle into a multi-view point cloud fusion network to obtain fused dense point cloud. According to the embodiment of the application, the single-view dense point cloud under each view angle can be input into the multi-view point cloud fusion network to obtain the fused dense point cloud, wherein the multi-view point cloud fusion network comprises regional point cloud feature extraction operation, multi-view point cloud registration operation and multi-view point cloud fusion operation.

Step S206: reconstruction of moving objects: and processing the fused dense point cloud obtained at each moment, and rendering to obtain the three-dimensional motion trail of the object. According to the embodiment of the application, the fused dense point clouds obtained at each moment can be spliced to form a continuous object point cloud motion track, the object point cloud motion track is converted into a grid model, and the object three-dimensional motion track is obtained.

As shown in fig. 3, the embodiment of the present application includes: the system comprises a video and event data acquisition module, a high-frame-rate reconstruction image sequence generation module, a single-view depth map generation module, a single-view point cloud generation module, a multi-view point cloud fusion module and a moving object reconstruction module.

Specifically, the video and event data acquisition module acquires video containing moving objects by using a traditional camera and an event camera respectively; the high-frame-rate reconstructed image sequence generating module generates a high-frame-rate reconstructed image sequence by utilizing data acquired by a traditional camera and an event camera; the single-view depth map generation module reconstructs an image sequence according to the high frame rate under each view angle, and generates a single-view depth map by means of corresponding event intermediate characterization; the single-view point cloud generation module inputs the single-view depth map into a point cloud generation network to generate a single-view dense point cloud; the multi-view point cloud fusion module inputs the single-view dense point cloud under each view angle into a multi-view point cloud fusion network to obtain fused dense point cloud; and the moving object reconstruction module processes the fused dense point cloud obtained at each moment and renders to obtain the three-dimensional moving track of the object.

According to the event data enhancement-based moving object reconstruction method provided by the embodiment of the application, the moving video data containing the moving object can be collected, the image sequence is reconstructed, the single-view depth map is generated, and finally the three-dimensional moving track of the object is obtained, so that the advantage of high time resolution of the event data is fully utilized, the limitation of a pure traditional camera on high-speed moving object tracking is broken through, and finer moving object reconstruction is obtained. Therefore, the problems that in the related art, accurate motion trail of a high-speed moving object is difficult to obtain through a detection or segmentation algorithm, the tracking capability of the traditional camera to the high-speed motion is hindered based on a fixed frame rate when the traditional camera collects data, and the refinement level of the reconstruction of the moving object is reduced are solved.

Next, a moving object reconstruction device based on event data enhancement according to an embodiment of the present application will be described with reference to the accompanying drawings.

Fig. 4 is a schematic structural diagram of a moving object reconstruction device based on event data enhancement according to an embodiment of the present application.

As shown in fig. 4, the moving object reconstruction device 10 based on event data enhancement includes: the system comprises an acquisition module 100, a first generation module 200, a second generation module 300, a third generation module 400, a fusion module 500 and a reconstruction module 600.

Specifically, the acquisition module 100 is configured to acquire motion video data including a moving object.

The first generation module 200 is configured to generate, based on the motion video data, a high frame rate reconstructed image sequence that satisfies a first preset frame rate condition.

A second generation module 300 is configured to reconstruct the image sequence for the high frame rate at each view angle, and generate a single view depth map based on the corresponding event intermediate representation.

The third generating module 400 is configured to input the single-view depth map into a preset point cloud generating network to generate a single-view dense point cloud.

The fusion module 500 is configured to input the single-view dense point cloud under each view angle into a preset multi-view point cloud fusion network, so as to obtain a fused dense point cloud.

The reconstruction module 600 is configured to process the fused dense point cloud obtained at each moment, and render to obtain a three-dimensional motion track of the object.

Optionally, in one embodiment of the present application, the first generating module 200 includes: a first acquisition unit and a first generation unit.

The first acquisition unit is used for preprocessing the motion video data to respectively acquire a low-frame-rate image sequence and an event stream which meet the second preset frame rate condition.

The first generation unit is used for inputting the low-frame-rate image sequence and the event stream into a preset video frame inserting network to generate a high-frame-rate reconstructed image sequence meeting the preset frame rate condition.

Optionally, in one embodiment of the present application, the first generating unit includes: the device comprises a first acquisition subunit, a second acquisition subunit, a third acquisition subunit and a generation subunit.

And the first acquisition subunit is used for respectively accumulating the event streams forward and backward according to each adjacent image frame in the low-frame-rate image sequence to obtain forward event intermediate characterization and backward event intermediate characterization.

The second acquisition subunit is used for respectively sending the forward event intermediate representation and the backward event intermediate representation into a preset optical flow prediction network to obtain a forward optical flow and a backward optical flow.

And the third acquisition subunit is used for splitting the forward optical flow and the backward optical flow according to the linear proportion for each adjacent image frame in the low-frame-rate image sequence, adding the image frame at the previous time and the split forward optical flow to obtain a forward reconstructed image frame, and subtracting the image frame at the next time and the split backward optical flow to obtain a backward reconstructed image frame.

And the generation subunit is used for inputting the forward reconstruction image frame and the reverse reconstruction image frame into a preset bidirectional image composition layer to obtain a reconstruction image so as to generate a high-frame-rate reconstruction image sequence.

Optionally, in an embodiment of the present application, the third generating module 400 includes: a first traversing unit and a second generating unit.

The first traversing unit is used for traversing the high frame rate reconstruction image sequence under each view angle, accumulating the event stream according to the frame rate and obtaining the event intermediate representation under each frame rate.

The second generation unit is used for traversing each image frame in the high-frame-rate reconstruction image sequence, representing the image frames and the corresponding event middles and inputting the image frames and the corresponding event middles into the depth map prediction network together, and generating a single-view depth map.

Optionally, in an embodiment of the present application, the third generating module 400 further includes: a second acquisition unit and a second traversal unit.

The second obtaining unit is used for traversing each view angle, and generating a network by representing the input point cloud between the single-view depth map and the corresponding event to obtain a single-view sparse point cloud.

And the second traversing unit is used for traversing each view angle, and carrying out up-sampling and fine tuning operation on the single-view sparse point cloud to obtain a single-view dense point cloud.

Optionally, in one embodiment of the present application, the reconstruction module 600 includes: a splicing unit and a converting unit.

The splicing unit is used for splicing the fused dense point clouds obtained at each moment to generate continuous object point cloud motion tracks.

The conversion unit is used for converting the motion trail of the object point cloud into a grid model to obtain the three-dimensional motion trail of the object.

It should be noted that the foregoing explanation of the embodiment of the method for reconstructing a moving object based on event data enhancement is also applicable to the apparatus for reconstructing a moving object based on event data enhancement of this embodiment, and will not be repeated here.

According to the event data enhancement-based moving object reconstruction device provided by the embodiment of the application, the moving video data containing a moving object can be acquired, an image sequence is reconstructed, a single-view depth image is generated, and finally, the three-dimensional moving track of the object is obtained, so that the advantage of high time resolution of the event data is fully utilized, the limitation of a pure traditional camera on high-speed moving object tracking is broken through, and finer moving object reconstruction is obtained. Therefore, the problems that in the related art, accurate motion trail of a high-speed moving object is difficult to obtain through a detection or segmentation algorithm, the tracking capability of the traditional camera to the high-speed motion is hindered based on a fixed frame rate when the traditional camera collects data, and the refinement level of the reconstruction of the moving object is reduced are solved.

Fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present application. The electronic device may include:

Memory 501, processor 502, and a computer program stored on memory 501 and executable on processor 502.

The processor 502 implements the moving object reconstruction method based on event data enhancement provided in the above-described embodiment when executing a program.

Further, the electronic device further includes:

a communication interface 503 for communication between the memory 501 and the processor 502.

Memory 501 for storing a computer program executable on processor 502.

The memory 501 may include high-speed RAM memory and may also include non-volatile memory (non-volatile memory), such as at least one disk memory.

If the memory 501, the processor 502, and the communication interface 503 are implemented independently, the communication interface 503, the memory 501, and the processor 502 may be connected to each other via a bus and perform communication with each other. The bus may be an industry standard architecture (Industry Standard Architecture, abbreviated ISA) bus, an external device interconnect (Peripheral Component, abbreviated PCI) bus, or an extended industry standard architecture (Extended Industry Standard Architecture, abbreviated EISA) bus, among others. The buses may be divided into address buses, data buses, control buses, etc. For ease of illustration, only one thick line is shown in fig. 5, but not only one bus or one type of bus.

Alternatively, in a specific implementation, if the memory 501, the processor 502, and the communication interface 503 are integrated on a chip, the memory 501, the processor 502, and the communication interface 503 may perform communication with each other through internal interfaces.

The processor 502 may be a central processing unit (Central Processing Unit, abbreviated as CPU), or an application specific integrated circuit (Application Specific Integrated Circuit, abbreviated as ASIC), or one or more integrated circuits configured to implement embodiments of the present application.

The present embodiment also provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the moving object reconstruction method based on event data enhancement as above.

In the description of the present specification, a description referring to terms "one embodiment," "some embodiments," "examples," "specific examples," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present application. In this specification, schematic representations of the above terms are not necessarily directed to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or N embodiments or examples. Furthermore, the different embodiments or examples described in this specification and the features of the different embodiments or examples may be combined and combined by those skilled in the art without contradiction.

Furthermore, the terms "first," "second," and the like, are used for descriptive purposes only and are not to be construed as indicating or implying a relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include at least one such feature. In the description of the present application, the meaning of "N" is at least two, such as two, three, etc., unless explicitly defined otherwise.

Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing specific logical functions or steps of the process, and additional implementations are included within the scope of the preferred embodiment of the present application in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order from that shown or discussed, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the embodiments of the present application.

Logic and/or steps represented in the flowcharts or otherwise described herein, e.g., a ordered listing of executable instructions for implementing logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device. More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or N wires, a portable computer cartridge (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). In addition, the computer readable medium may even be paper or other suitable medium on which the program is printed, as the program may be electronically captured, via optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner if necessary, and then stored in a computer memory.

It is to be understood that portions of the present application may be implemented in hardware, software, firmware, or a combination thereof. In the above-described embodiments, the N steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution system. As with the other embodiments, if implemented in hardware, may be implemented using any one or combination of the following techniques, as is well known in the art: discrete logic circuits having logic gates for implementing logic functions on data signals, application specific integrated circuits having suitable combinational logic gates, programmable Gate Arrays (PGAs), field Programmable Gate Arrays (FPGAs), and the like.

Those of ordinary skill in the art will appreciate that all or a portion of the steps carried out in the method of the above-described embodiments may be implemented by a program to instruct related hardware, where the program may be stored in a computer readable storage medium, and where the program, when executed, includes one or a combination of the steps of the method embodiments.

In addition, each functional unit in each embodiment of the present application may be integrated in one processing module, or each unit may exist alone physically, or two or more units may be integrated in one module. The integrated modules may be implemented in hardware or in software functional modules. The integrated modules may also be stored in a computer readable storage medium if implemented in the form of software functional modules and sold or used as a stand-alone product.

The above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, or the like. Although embodiments of the present application have been shown and described above, it will be understood that the above embodiments are illustrative and not to be construed as limiting the application, and that variations, modifications, alternatives, and variations may be made to the above embodiments by one of ordinary skill in the art within the scope of the application.

Claims

1. The event data enhancement-based moving object reconstruction method is characterized by comprising the following steps of:

collecting motion video data containing a moving object;

generating a high frame rate reconstructed image sequence meeting a first preset frame rate condition based on the motion video data;

reconstructing an image sequence for the high frame rate at each view angle, generating a single view depth map based on the corresponding event intermediate representation;

inputting the single-view depth map into a preset point cloud generation network to generate a single-view dense point cloud;

inputting the single-view dense point cloud under each view angle into a preset multi-view point cloud fusion network to obtain fused dense point cloud; and

and processing the fused dense point cloud obtained at each moment, and rendering to obtain the three-dimensional motion trail of the object.

2. The method of claim 1, wherein generating a high frame rate reconstructed image sequence that satisfies a preset frame rate condition based on the motion video data comprises:

preprocessing the motion video data to respectively obtain a low-frame-rate image sequence and an event stream which meet a second preset frame rate condition;

inputting the low-frame-rate image sequence and the event stream into a preset video interpolation network to generate a high-frame-rate reconstructed image sequence meeting the preset frame rate condition.

3. The method of claim 2, wherein the set of event points of the event stream comprises event point coordinates, event point time stamps, and event point polarities.

4. The method of claim 2, wherein inputting the sequence of low frame rate images and the event stream into a preset video interpolation network generates a sequence of high frame rate reconstructed images that satisfy the preset frame rate condition, comprising:

according to each adjacent image frame in the low-frame-rate image sequence, respectively accumulating the event stream in the forward direction and the reverse direction to obtain a forward event intermediate representation and a reverse event intermediate representation;

respectively sending the forward event intermediate representation and the backward event intermediate representation into a preset optical flow prediction network to obtain a forward optical flow and a backward optical flow;

For each adjacent image frame in the low frame rate image sequence, splitting the forward optical flow and the backward optical flow according to a linear proportion, adding the image frame at the previous moment and the split forward optical flow to obtain a forward reconstructed image frame, and subtracting the image frame at the next moment and the split backward optical flow to obtain a backward reconstructed image frame;

and inputting the forward reconstructed image frame and the reverse reconstructed image frame into a preset bidirectional image composition layer to obtain a reconstructed image so as to generate the high-frame-rate reconstructed image sequence.

5. The method of claim 1, wherein the reconstructing the sequence of images for the high frame rate at each view angle based on the corresponding inter-event characterization generates a single view depth map comprising:

traversing the high frame rate reconstructed image sequence under each view angle, accumulating the event stream according to the frame rate, and obtaining the event intermediate representation under each frame rate;

and traversing each image frame in the high frame rate reconstruction image sequence, representing the image frame and the corresponding event intermediate representation together into a depth map prediction network, and generating the single-view depth map.

6. The method of claim 1, wherein the inputting the single-view depth map into a preset point cloud generation network generates a single-view dense point cloud, comprising:

Traversing each view angle, and inputting the single-view depth map and the corresponding event intermediate representation into the point cloud generation network to obtain a single-view sparse point cloud;

and traversing each view angle, and carrying out up-sampling and fine tuning on the single-view sparse point cloud to obtain the single-view dense point cloud.

7. The method according to claim 1, wherein the processing the fused dense point cloud obtained at each moment, and rendering to obtain a three-dimensional motion trail of the object, includes:

splicing the fused dense point clouds obtained at each moment to generate a continuous object point cloud motion track;

and converting the object point cloud motion trail into a grid model to obtain the object three-dimensional motion trail.

8. A moving object reconstruction device based on event data enhancement, comprising:

the acquisition module is used for acquiring motion video data containing a moving object;

the first generation module is used for generating a high-frame-rate reconstructed image sequence meeting a first preset frame rate condition based on the motion video data;

the second generation module is used for reconstructing an image sequence at the high frame rate under each view angle and generating a single-view depth map based on the corresponding event intermediate representation;

The third generation module is used for inputting the single-view depth map into a preset point cloud generation network to generate a single-view dense point cloud;

the fusion module is used for inputting the single-view dense point cloud under each view angle into a preset multi-view point cloud fusion network to obtain fused dense point clouds; and

and the reconstruction module is used for processing the fused dense point cloud obtained at each moment and rendering to obtain the three-dimensional motion trail of the object.

9. An electronic device, comprising: a memory, a processor and a computer program stored on the memory and executable on the processor, the processor executing the program to implement the method of event data based augmentation of moving object reconstruction as claimed in any one of claims 1 to 7.

10. A computer readable storage medium having stored thereon a computer program, wherein the program is executed by a processor for implementing the event data based enhanced moving object reconstruction method according to any one of claims 1 to 7.