CN113192185B

CN113192185B - Dynamic light field reconstruction method, device and equipment

Info

Publication number: CN113192185B
Application number: CN202110540712.2A
Authority: CN
Inventors: 方璐; 季梦奇; 郑添; 袁肖赟; 王生进
Original assignee: Tsinghua University
Current assignee: Tsinghua University
Priority date: 2021-05-18
Filing date: 2021-05-18
Publication date: 2022-05-17
Anticipated expiration: 2041-05-18
Also published as: CN113192185A

Abstract

The invention discloses a dynamic light field reconstruction method, a dynamic light field reconstruction device and dynamic light field reconstruction equipment. The method comprises the following steps: performing interframe motion estimation on the multi-view video respectively to determine the dynamic area of each view to obtain a dynamic area set; performing depth estimation on each frame of the multi-view video based on the dynamic region set to obtain a depth map of each frame of a main view video in the multi-view video; performing RGB compression coding on each frame of the multi-view video based on the dynamic region set to obtain first RGB information of each frame of the main view video; determining second RGB information and distance field SDF information for each frame of spatial voxels based on the depth map and the first RGB information; and constructing a three-dimensional dynamic model based on the second RGB information and the SDF information. By using the method, the three-dimensional construction can be directly carried out according to the first RGB information and the depth map without decompressing the first RGB information and the depth map, the compression and the three-dimensional construction can be integrally designed, the data flux in the three-dimensional construction process is reduced, and the light field reconstruction efficiency is effectively improved.

Description

Dynamic light field reconstruction method, device and equipment

Technical Field

The embodiment of the invention relates to the technical field of three-dimensional construction, in particular to a dynamic light field reconstruction method, a dynamic light field reconstruction device and dynamic light field reconstruction equipment.

Background

According to the characteristics of the light field, the light field can be equivalently regarded as the fusion of a series of two-dimensional graphs shot at different angles simultaneously. With the continuous update of light field imaging technology, the existing light field imaging can reach the spatial resolution of billions of pixels. The cost of obtaining high resolution light field information is extremely high data throughput. For example, in the top billion pixel level light field imaging at present, the occupied space of outputting one frame is up to more than 2GB, which brings great burden to storage and subsequent calculation processing.

In the prior art, the compression and three-dimensional reconstruction technologies of the light field are regarded as two unrelated tasks, and if a three-dimensional module needs to be reconstructed, the light field data in a compressed format needs to be decompressed before the decompressed light field can be three-dimensionally reconstructed. The compression and decompression of the light field data is a process with a large calculation amount, and in addition, the storage space is increased after the decompression of the light field data.

Disclosure of Invention

The embodiment of the invention provides a dynamic light field reconstruction method, a dynamic light field reconstruction device and dynamic light field reconstruction equipment, which can reduce data flux in a three-dimensional construction process and effectively improve the efficiency of light field reconstruction.

In a first aspect, an embodiment of the present invention provides a dynamic light field reconstruction method, including:

performing interframe motion estimation on the multi-view video respectively, determining a dynamic area of each view, and obtaining a dynamic area set;

performing depth estimation on each frame of the multi-view video based on the dynamic region set to obtain a depth map of each frame of a main view video in the multi-view video;

performing RGB compression coding on each frame of the multi-view video based on the dynamic region set to obtain first RGB information of each frame of the main view video;

determining second RGB information and distance field SDF information for each frame of spatial voxels from the depth map and the first RGB information;

and constructing a three-dimensional dynamic model based on the second RGB information and the SDF information.

In a second aspect, an embodiment of the present invention further provides a dynamic light field reconstruction apparatus, including:

the first acquisition module is used for respectively carrying out interframe motion estimation on the multi-view video, determining the dynamic area of each view and acquiring a dynamic area set;

a second obtaining module, configured to perform depth estimation on each frame of the multi-view video based on the dynamic region set, and obtain a depth map of each frame of a main view video in the multi-view video;

a third obtaining module, configured to perform RGB compression coding on each frame of the multi-view video based on the dynamic region set, to obtain first RGB information of each frame of the main view video;

a determination module to determine second RGB information and distance field SDF information for each frame of spatial voxels from the depth map and the first RGB information;

and the construction module is used for constructing a three-dimensional dynamic model based on the second RGB information and the SDF information.

In a third aspect, an embodiment of the present invention further provides a computer device, where the computer device includes:

one or more processing devices;

a storage device to store one or more instructions;

when executed by the one or more processing devices, cause the one or more processing devices to implement a dynamic light-field reconstruction method as described in embodiments of the invention.

The embodiment of the invention provides a dynamic light field reconstruction method, a device, equipment and a medium, wherein inter-frame motion estimation is firstly carried out on a multi-view video respectively, and a dynamic area of each view is determined to obtain a dynamic area set; secondly, depth estimation is carried out on each frame of the multi-view video based on the dynamic region set, and a depth map of each frame of a main view video in the multi-view video is obtained; then, performing RGB compression coding on each frame of the multi-view video based on the dynamic region set to obtain first RGB information of each frame of the main view video; then determining second RGB information and distance field SDF information for each frame of spatial voxels based on the depth map and the first RGB information; and finally, constructing a three-dimensional dynamic model based on the second RGB information and the SDF information. By utilizing the technical scheme, the three-dimensional construction can be directly carried out according to the first RGB information and the depth map, the first RGB information and the depth map do not need to be decompressed, the compression and the three-dimensional construction can be integrally designed, the data flux in the three-dimensional construction process is reduced, and the light field reconstruction efficiency is effectively improved.

Drawings

Fig. 1 is a schematic flowchart of a dynamic light field reconstruction method according to an embodiment of the present invention;

fig. 2 is a schematic structural diagram of a dynamic light field reconstruction apparatus according to a second embodiment of the present invention;

fig. 3 is a schematic structural diagram of a computer device according to a third embodiment of the present invention.

Detailed Description

The present invention will be described in further detail with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be further noted that, for the convenience of description, only some of the structures related to the present invention are shown in the drawings, not all of the structures.

Before discussing exemplary embodiments in more detail, it should be noted that some exemplary embodiments are described as processes or methods depicted as flowcharts. Although a flowchart may describe the operations (or steps) as a sequential process, many of the operations can be performed in parallel, concurrently or simultaneously. In addition, the order of the operations may be re-arranged. The process may be terminated when its operations are completed, but may have additional steps not included in the figure. The processes may correspond to methods, functions, procedures, subroutines, and the like. In addition, the embodiments and features of the embodiments in the present invention may be combined with each other without conflict.

The term "include" and variations thereof as used herein are intended to be open-ended, i.e., "including but not limited to". The term "based on" is "based, at least in part, on". The term "one embodiment" means "at least one embodiment".

Example one

Fig. 1 is a flowchart of a dynamic light field reconstruction method according to an embodiment of the present invention, where the method is applicable to a case of performing three-dimensional reconstruction on a light field, and the method may be performed by a dynamic light field reconstruction apparatus, where the apparatus may be implemented by software and/or hardware, and is generally integrated on a computer device.

As shown in fig. 1, a dynamic light field reconstruction method provided in an embodiment of the present invention includes the following steps:

s110, performing inter-frame motion estimation on the multi-view video respectively, determining a dynamic area of each view, and obtaining a dynamic area set.

In this embodiment, the multi-view video may be a video captured by a light field camera at a plurality of different viewing angles, and for example, the multi-view video may be a video captured by 7 × 7 different viewing angles and having a resolution of 2560 × 1600. Among them, a light field camera is a type of imaging device that can capture the complete light field information of a photographed scene. The captured light field information includes not only the intensity of the light, but also the direction in which the light propagates in space.

The manner of determining the dynamic area of each view by performing inter-frame motion estimation on the multi-view video may be to calculate an optical flow value inside each pixel grid area of each view video by using an optical flow estimation method, mark an area with an internal optical flow value of 0 as a static area, and mark an area with an internal optical flow value of not 0 as a dynamic area.

Specifically, the performing inter-frame motion estimation on the multi-view video and determining the dynamic region of each view may include: for each visual angle video, carrying out grid division on the visual angle video to obtain a plurality of areas; acquiring optical flow values corresponding to the plurality of areas respectively; and determining the area with the optical flow value of non-0 as a dynamic area.

For example, taking the view angle k as an example for detailed description, the pixel grid of the view angle k is divided into a plurality of areas with the size of M × M, and the optical flow values in the M × M areas are calculated by an optical flow estimation method. The optical flow value is calculated in the following manner:

the optical flow field of the visual angle k at the time t is

，

For a two-dimensional vector, the viewing angle k is at time t

Pixel value of the location is

At time t +1, view k is at

Pixel value of the location is

. If the optical flow field of the visual angle k at the time t and the time t +1

The following formula is satisfied:

then represents

The region where the position is located is a static region, and if the result obtained by the above formula is not 0, it indicates that the position is in a static region

The area where the position is located is a dynamic area. Wherein the content of the first and second substances,

represents

The gradient value of (a).

And determining the dynamic area of each view according to the calculation process, and combining the dynamic areas of all the views into a dynamic area set.

And S120, performing depth estimation on each frame of the multi-view video based on the dynamic region set to obtain a depth map of each frame of a main view video in the multi-view video.

In this embodiment, after depth estimation is performed on each frame of the multi-view video, the multi-view video may be compressed in the main-view video, and a depth map of each frame of the main-view video is obtained. The depth estimation may be any depth estimation algorithm, and may be, for example, a phase-shift-based sub-pixel multi-view stereo matching algorithm. Wherein, the depth map of each frame can be understood as the information coding of each frame of video.

Specifically, the depth estimation of each frame of the multi-view video based on the dynamic region set, and the manner of obtaining the depth map of each frame of the main view video in the multi-view video may include: acquiring camera calibration parameters respectively corresponding to videos of all visual angles; for a first frame of the multi-view video, depth estimation is carried out on each pixel point of the first frame based on the camera calibration parameters, and a depth map of the first frame of the main view video in the multi-view video is obtained; for the Nth frame of the multi-view video, depth estimation is carried out on pixel points in the dynamic region set in the Nth frame based on the camera calibration parameters, and a depth map of the Nth frame of the main view video is obtained; wherein N is greater than or equal to 2.

The first frame of the multi-view video may be understood as the first frame of all the view videos, that is, assuming that there are D view videos, the first frame includes D first frames. Similarly, the nth frame of the multiview video may be understood as the nth frame of all the view videos, i.e., assuming that there are D views, the nth frame includes D views. The specific process of obtaining the camera calibration parameters corresponding to each view video may be determining external parameters and internal parameters between the light field cameras through multi-modal calibration. The external parameters of each angle light field camera are calibrated relative to the external parameters of the main viewing angle light field camera, and the calibration process only needs to be executed once. It should be noted that any method may be used to calibrate the multi-view light field camera and the RGB depth camera, and this is not limited herein.

Illustratively, for a first frame of a multi-view video, complete depth estimation is performed on each pixel point of the first frame according to camera calibration parameters of each view video and a phase-shift-based sub-pixel multi-view stereo matching algorithm. And performing depth estimation on both the dynamic area and the static area where the pixel points are located, thereby obtaining a depth map of the first frame of the main visual angle video. In this embodiment, the process of performing complete depth estimation on each pixel point of the first frame may be understood as compressing the depth information of each pixel point in the multi-view video into the depth information of the first frame of the main-view video.

It should be noted that after depth estimation of a static region and a dynamic region is performed on a first frame of a multi-view video, depth estimation is performed only on the dynamic region for other frames of the multi-view video, and depth estimation is not performed on the static region.

For example, for an nth frame of each view video, depth estimation may be performed on pixel points in a dynamic region set in the nth frame according to calibration parameters of each view camera and a phase-shift-based sub-pixel multi-view stereo matching algorithm. It is understood that when matching corresponding points between different views, only a dynamic region set is searched, and each view corresponds to one dynamic region. And performing depth estimation on the Nth frame of the multi-view video to obtain a depth map of the Nth frame of the main view. In this embodiment, when performing depth estimation on the nth frame of a multi-view video, the dynamic region set may be used as a mask, and a dynamic region in each view video may be extracted quickly, so that only the extracted dynamic region is subjected to depth estimation, and depth estimation is not required for a static region, which may greatly reduce the amount of computation.

S130, performing RGB compression coding on each frame of the multi-view video based on the dynamic region set to obtain first RGB information of each frame of the main view video.

In this embodiment, RGB information in the entire area may be retained for the first frame of the multiview video, and the entire area may include a static area and a dynamic area; and only reserving RGB information in the dynamic area set for the Nth frame of the multi-view video, and compressing the RGB information of each frame of each view video in the main view to obtain the first RGB information of each frame of the main view.

Wherein the first RGB information may include RGB information of each frame. The RGB information may be color information of three channels, red, green, and blue, respectively, for each frame of video image.

Specifically, the RGB compression encoding of each frame of the multiview video based on the dynamic region set may obtain the first RGB information of each frame of the main view video in a manner including: for a first frame of the multi-view video, performing compression coding on RGB information of each pixel point of the first frame by adopting a set video coding algorithm to obtain first RGB information of a first frame of a main view video in the multi-view video; and for the Nth frame of the multi-view video, performing compression coding on RGB information of pixel points in the dynamic region of the Nth frame by adopting a set video coding algorithm to obtain first RGB information of the Nth frame of the main view video in the multi-view video.

The set Video Coding algorithm may be any Video Coding algorithm, and for example, the set Video Coding algorithm may be High Efficiency Video Coding (HEVC).

For example, for a first frame of a multi-view video, HEVC may be adopted to compress and encode RGB information of each pixel point in the first frame of the video, and the RGB of the first frame of each view video is compressed into the RGB of the first frame of the main view video to obtain the RGB information of the first frame of the main view video. And for the Nth frame of the multi-view video, only compressing the RGB information of the pixel points in the dynamic region in the Nth frame by adopting a set video coding algorithm, and compressing the RGB of the Nth frame of each view video into the RGB of the Nth frame of the main view video to obtain the RGB information of the Nth frame of the main view video. In this embodiment, when performing compression coding on the RGB information of the nth frame of the multi-view video, the dynamic region set may be used as a mask, and the dynamic region in each view video may be extracted quickly, so that only the extracted dynamic region is subjected to compression coding on the RGB information, and the static region does not need to be processed, which may greatly reduce the amount of computation.

S140, determining second RGB information and distance field SDF information of each frame of the spatial voxel according to the depth map and the first RGB information.

Wherein the second RGB information may include RGB information for each frame of spatial voxels; the SDF information may be understood as SDF information for each frame of voxels.

In the embodiment, the SDF information of each frame of the spatial voxel in the dynamic region set can be determined according to the depth map of each frame of the main view video; the RGB information of each frame of the spatial voxel in the dynamic region set can be determined according to the RGB information of each frame of the main visual angle video.

Wherein determining the second RGB information and the SDF information for the distance field may be divided into two parts, the first part may include determining initial SDF information for each frame of the spatial voxel and target SDF information for a first frame of the spatial voxel and determining second initial RGB information for the spatial voxel and target RGB information for the first frame of the spatial voxel; the second portion may include calculating target SDF information for an nth frame of spatial voxels and calculating second RGB information for the nth frame of spatial voxels.

The voxel can be understood as an abbreviation of a volume element, and a solid containing the voxel can be represented by solid rendering or by extracting a polygonal isosurface of a given threshold contour.

It should be noted that, for the nth frame of the spatial voxel, only the target SDF information and the second RGB information of the frame in the dynamic region under the main view angle need to be calculated.

Specifically, the way to determine the distance field SDF information for each frame of spatial voxels may be: projecting a space voxel into the depth map, and acquiring a projection distance and a pixel point projected by the center of the space voxel; determining initial SDF information of each frame of the space voxel according to the projection distance and the depth value of the pixel point projected by the center; for a first frame of the spatial voxels, determining the initial SDF information as target SDF information for the first frame; for the nth frame of the spatial voxels, determining target SDF information for the nth frame according to the following formula:

wherein the content of the first and second substances,

is the target SDF information of the N-1 th frame,

is the weight value of the N-1 th frame,

for the initial SDF information of the nth frame,

is a weight value for the nth frame,

the target SDF information is obtained; wherein N is greater than or equal to 2.

In this embodiment, the central point of the spatial voxel may be projected onto the depth map of the main viewing angle by the camera pose initialized by the camera and the internal and external parameters of the camera.

The method for determining the initial SDF information of the spatial voxel according to the projection distance and the depth value of the pixel point projected by the center may be: and taking the difference value of the projection distance and the depth value of the pixel point projected by the center as the initial SDF information of each frame.

Specifically, the difference between the projection distance of the first frame of the spatial voxel and the depth value of the pixel point projected by the voxel center is used as the initial SDF information of the first frame; similarly, the initial SDF information of the nth frame is calculated according to the difference between the projection distance of the nth frame of the spatial voxel and the depth value projected to the pixel point from the voxel center.

In this embodiment, the above formula may calculate the target SDF information of the nth frame using the initial SDF information of the nth frame and the target SDF information of the N-1 th frame. The target SDF information of the (N-1) th frame can be obtained by utilizing the formula according to the target SDF information of the (N-2) th frame and the initial SDF information of the (N-1) th frame. It should be noted that the calculation target SDF information is calculated only for the nth frame in the dynamic region under the main view.

Specifically, the manner of determining the second RGB information of each frame of the spatial voxel may be: determining first RGB information of a pixel point projected by the center of a spatial voxel as second initial RGB information of the spatial voxel; for a first frame of the spatial voxels, determining the initial RGB information as target RGB information for the first frame; for the Nth frame of the spatial voxel, the following formula is adoptedDetermining second target RGB information for the Nth frame:

wherein the content of the first and second substances,

is the second target RGB information of the N-1 th frame,

is the second initial RGB information of the nth frame,

second target RGB information of the nth frame.

In this embodiment, the central point of the spatial voxel may be projected onto the RGB map of the principal viewing angle by the camera pose initialized by the camera and the internal and external parameters of the camera.

Wherein the second initial RGB information may be understood as the initial RGB information of the first frame of spatial voxels. Further, the second initial RGB information may be used as the target RGB information of the first frame of the spatial voxel, and the target RGB information of the nth frame of the spatial voxel may be calculated as the second target RGB information by the above formula for the nth frame of the spatial voxel.

It should be noted that the process of calculating the second target RGB information of the nth frame may be similar to the process of calculating the target SDF information described above. The above formula may calculate the second target RGB information of the nth frame using the second initial RGB information of the nth frame and the second target RGB information of the N-1 th frame. The second target RGB information of the N-1 th frame may be calculated according to the second target RGB information of the N-2 th frame and the second initial RGB information of the N-1 th frame by the above formula. It should be noted that, the second target RGB information is calculated only for the nth frame in the dynamic region under the main viewing angle.

S150, building a three-dimensional dynamic model based on the second RGB information and the SDF information.

In this embodiment, a three-dimensional dynamic model may be constructed according to the calculated RGB information of each frame of the spatial voxel in the dynamic region of the principal viewing angle and the SDF information of each frame of the spatial voxel in the dynamic region of the principal viewing angle.

Specifically, the process of constructing the three-dimensional dynamic state may include: for each frame of the spatial voxels, constructing a three-dimensional model of each frame by adopting a set grid extraction algorithm based on the target SDF information and the second target RGB information; and combining the three-dimensional models of each frame to obtain a three-dimensional dynamic model.

The set mesh extraction algorithm may be any mesh extraction algorithm, and is not limited herein. The manner of combining the three-dimensional models of each frame to obtain the three-dimensional dynamic model may be: and combining the three-dimensional models of each frame according to the time stamp to obtain a dynamic three-dimensional model.

The dynamic light field reconstruction method provided by the embodiment of the invention comprises the steps of firstly, respectively carrying out interframe motion estimation on a multi-view video, determining a dynamic area of each view, and obtaining a dynamic area set; secondly, depth estimation is carried out on each frame of the multi-view video based on the dynamic region set, and a depth map of each frame of a main view video in the multi-view video is obtained; then, performing RGB compression coding on each frame of the multi-view video based on the dynamic region set to obtain first RGB information of each frame of the main view video; then determining second RGB information and distance field SDF information for each frame of spatial voxels based on the depth map and the first RGB information; and finally, constructing a three-dimensional dynamic model based on the second RGB information and the SDF information. By utilizing the technical scheme, the three-dimensional construction can be directly carried out according to the first RGB information and the depth map, the first RGB information and the depth map do not need to be decompressed, the compression and the three-dimensional construction can be integrally designed, the data flux in the three-dimensional construction process is reduced, and the light field reconstruction efficiency is effectively improved.

Example two

Fig. 2 is a schematic structural diagram of a dynamic light field reconstruction apparatus according to a second embodiment of the present invention, which may be applied to three-dimensional reconstruction of a light field, where the apparatus may be implemented by software and/or hardware and is generally integrated on a computer device.

As shown in fig. 2, the apparatus includes: a first acquisition module 210, a second acquisition module 220, a third acquisition module 230, a determination module 240, and a construction module 250.

A first obtaining module 210, configured to perform inter-frame motion estimation on a multi-view video, determine a dynamic region of each view, and obtain a dynamic region set;

a second obtaining module 220, configured to perform depth estimation on each frame of the multi-view video based on the dynamic region set, so as to obtain a depth map of each frame of a main view video in the multi-view video;

a third obtaining module 230, configured to perform RGB compression coding on each frame of the multi-view video based on the dynamic region set, so as to obtain first RGB information of each frame of the main view video;

a determination module 240 to determine second RGB information and distance field SDF information for each frame of spatial voxels based on the depth map and the first RGB information;

a construction module 250 configured to construct a three-dimensional dynamic model based on the second RGB information and the SDF information.

In this embodiment, the device first passes; secondly, the first acquisition module is used for respectively carrying out interframe motion estimation on the multi-view video, determining the dynamic area of each view, and acquiring a dynamic area set; secondly, a second acquisition module is used for carrying out depth estimation on each frame of the multi-view video based on the dynamic region set to obtain a depth map of each frame of a main view video in the multi-view video; then, a third acquisition module is used for carrying out RGB compression coding on each frame of the multi-view video based on the dynamic region set to obtain first RGB information of each frame of the main view video; then determining, by a determination module, second RGB information and distance field SDF information for each frame of spatial voxels based on the depth map and the first RGB information; and finally, a construction module is used for constructing a three-dimensional dynamic model based on the second RGB information and the SDF information.

The embodiment provides a dynamic light field reconstruction device, and by using the device, three-dimensional construction can be directly performed according to first RGB information and a depth map, decompression of the first RGB information and the depth map is not needed, compression and three-dimensional construction can be integrally designed, data flux in a three-dimensional construction process is reduced, and light field reconstruction efficiency is effectively improved.

Further, the first obtaining module 210 is specifically configured to, for each view video, perform mesh division on the view video to obtain a plurality of regions; acquiring optical flow values corresponding to the plurality of areas respectively; and determining the area with the optical flow value of non-0 as a dynamic area.

On the basis of the optimization, the first obtaining module 220 is specifically configured to obtain camera calibration parameters corresponding to videos of various viewing angles; for a first frame of the multi-view video, depth estimation is carried out on each pixel point of the first frame based on the camera calibration parameters, and a depth map of the first frame of the main view video in the multi-view video is obtained; for the Nth frame of the multi-view video, depth estimation is carried out on pixel points in the dynamic region set in the Nth frame based on the camera calibration parameters, and a depth map of the Nth frame of the main view video is obtained; wherein N is greater than or equal to 2.

Based on the above technical solution, the third obtaining module 230 is specifically configured to, for a first frame of the multi-view video, perform compression coding on RGB information of each pixel point of the first frame by using a set video coding algorithm, so as to obtain first RGB information of the first frame of the main view video in the multi-view video; and for the Nth frame of the multi-view video, performing compression coding on RGB information of pixel points in the dynamic region of the Nth frame by adopting a set video coding algorithm to obtain first RGB information of the Nth frame of the main view video in the multi-view video.

Further, the determining module 240 is further configured to project a spatial voxel into the depth map, and obtain a projection distance and a pixel point projected by the center of the spatial voxel;

determining initial SDF information of each frame of the space voxel according to the projection distance and the depth value of the pixel point projected by the center; for a first frame of the spatial voxels, determining the initial SDF information as target SDF information for the first frame; for the nth frame of the spatial voxels, determining target SDF information for the nth frame according to the following formula:

wherein the content of the first and second substances,

is the target SDF information of the N-1 th frame,

is the weight value of the N-1 th frame,

for the initial SDF information of the nth frame,

is a weight value for the nth frame,

Further, the determining module 240 is further configured to determine the first RGB information of the pixel point to which the center of the spatial voxel is projected as the second initial RGB information of the spatial voxel; for a first frame of the spatial voxels, determining the initial RGB information as target RGB information for the first frame; for an nth frame of the spatial voxels, determining target RGB information for the nth frame according to the following formula:

wherein the content of the first and second substances,

is the second target RGB information of the N-1 th frame,

is the second initial RGB information of the nth frame,

is the target RGB information of the nth frame.

Further, the constructing module 250 is specifically configured to, for each frame of the spatial voxels, construct a three-dimensional model of each frame by using a set mesh extraction algorithm based on the target SDF information and the second target RGB information; and combining the three-dimensional models of each frame to obtain a three-dimensional dynamic model.

The dynamic light field reconstruction device can execute the dynamic light field reconstruction method provided by any embodiment of the invention, and has corresponding functional modules and beneficial effects of the execution method.

EXAMPLE III

Fig. 3 is a schematic structural diagram of a computer device according to a third embodiment of the present invention. As shown in fig. 3, a computer device provided in a third embodiment of the present invention includes: one or more processors 31 and storage devices 32; the processor 31 in the computer device may be one or more, and fig. 3 illustrates one processor 31 as an example; storage 32 is used to store one or more programs; the one or more programs are executed by the one or more processors 31, so that the one or more processors 31 implement the dynamic light field reconstruction method according to any of the embodiments of the present invention.

The computer device may further include: an input device 33 and an output device 34.

The processor 31, the storage means 32, the input means 33 and the output means 34 in the computer apparatus may be connected by a bus or other means, which is exemplified in fig. 3.

The storage device 32 in the computer device serves as a computer-readable storage medium and may be used to store one or more programs, which may be software programs, computer-executable programs, and modules, such as program instructions/modules corresponding to the dynamic light field reconstruction method provided in an embodiment of the present invention (for example, the modules in the dynamic light field reconstruction apparatus shown in fig. 2 include the first obtaining module 210, the second obtaining module 220, the third obtaining module 230, the determining module 240, and the constructing module 250). The processor 31 executes various functional applications and data processing of the computer device by executing the software programs, instructions and modules stored in the storage device 32, namely, implements the dynamic light field reconstruction method in the above method embodiment.

The storage device 32 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to use of the computer device, and the like. Further, the storage device 32 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some examples, the storage 32 may further include memory located remotely from the processor 31, which may be connected to the device over a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The input device 33 may be used to receive input numeric or character information and to generate key signal inputs related to user settings and function controls of the computer apparatus. The output device 34 may include a display device such as a display screen.

And, when one or more programs included in the above-mentioned computer apparatus are executed by the one or more processors 31, the programs perform the following operations:

Example four

A fourth embodiment of the present invention provides a computer-readable storage medium, on which a computer program is stored, where the computer program is used to execute a dynamic light field reconstruction method when executed by a processor, and the method includes:

Optionally, the program, when executed by a processor, may be further configured to perform a dynamic light field reconstruction method provided in any embodiment of the present invention.

Computer storage media for embodiments of the invention may employ any combination of one or more computer-readable media. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a Read Only Memory (ROM), an Erasable Programmable Read Only Memory (EPROM), a flash Memory, an optical fiber, a portable CD-ROM, an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. A computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

A computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take a variety of forms, including, but not limited to: an electromagnetic signal, an optical signal, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.

Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, Radio Frequency (RF), etc., or any suitable combination of the foregoing.

Computer program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present invention and the technical principles employed. Those skilled in the art will appreciate that the present invention is not limited to the particular embodiments described herein, and that various obvious changes, rearrangements and substitutions will now be apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present invention has been described in greater detail by the above embodiments, the present invention is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present invention, and the scope of the present invention is determined by the scope of the appended claims.

Claims

1. A method of dynamic light field reconstruction, comprising:

2. The method of claim 1, wherein performing inter-frame motion estimation on the multi-view video to determine the dynamic region of each view comprises:

for each visual angle video, carrying out grid division on the visual angle video to obtain a plurality of areas;

acquiring optical flow values corresponding to the plurality of areas respectively;

and determining the area with the optical flow value of non-0 as a dynamic area.

3. The method of claim 1, wherein performing depth estimation on each frame of the multi-view video based on the set of dynamic regions to obtain a depth map for each frame of a main-view video in the multi-view video comprises:

acquiring camera calibration parameters respectively corresponding to videos of all visual angles;

for a first frame of the multi-view video, depth estimation is carried out on each pixel point of the first frame based on the camera calibration parameters, and a depth map of the first frame of the main view video in the multi-view video is obtained;

for the Nth frame of the multi-view video, depth estimation is carried out on pixel points in the dynamic region set in the Nth frame based on the camera calibration parameters, and a depth map of the Nth frame of the main view video is obtained; wherein N is greater than or equal to 2.

4. The method of claim 1, wherein performing RGB compression encoding on each frame of the multiview video based on the set of dynamic regions to obtain first RGB information for each frame of the main view video comprises:

for a first frame of the multi-view video, performing compression coding on RGB information of each pixel point of the first frame by adopting a set video coding algorithm to obtain first RGB information of a first frame of a main view video in the multi-view video;

and for the Nth frame of the multi-view video, performing compression coding on RGB information of pixel points in the dynamic region of the Nth frame by adopting a set video coding algorithm to obtain first RGB information of the Nth frame of the main view video in the multi-view video.

5. The method of claim 1, wherein determining distance field (SDF) information for each frame of spatial voxels from the depth map and the first RGB information comprises:

projecting a space voxel into the depth map, and acquiring a projection distance and a pixel point projected by the center of the space voxel;

determining initial SDF information of each frame of the space voxel according to the projection distance and the depth value of the pixel point projected by the center;

for a first frame of the spatial voxels, determining the initial SDF information as target SDF information for the first frame;

for the nth frame of the spatial voxels, determining target SDF information for the nth frame according to the following formula:

wherein the content of the first and second substances,

is the target SDF information of the N-1 th frame,

is the weight value of the N-1 th frame,

for the initial SDF information of the nth frame,

is a weight value for the nth frame,

6. The method of claim 5, wherein determining second RGB information for each frame of spatial voxels from the depth map and the first RGB information comprises:

determining first RGB information of a pixel point projected by the center of a spatial voxel as second initial RGB information of the spatial voxel;

for a first frame of the spatial voxels, determining the initial RGB information as target RGB information for the first frame;

for an nth frame of the spatial voxels, determining target RGB information for the nth frame according to the following formula:

wherein the content of the first and second substances,

is the second target RGB information of the N-1 th frame,

is the second initial RGB information of the nth frame,

is the target RGB information of the nth frame.

7. The method of claim 6, wherein constructing a three-dimensional dynamic model based on the second RGB information and the SDF information comprises:

for each frame of the spatial voxels, constructing a three-dimensional model of each frame by adopting a set grid extraction algorithm based on the target SDF information and the second target RGB information;

and combining the three-dimensional models of each frame to obtain a three-dimensional dynamic model.

8. A dynamic light field reconstruction apparatus, comprising:

9. A computer device, comprising:

one or more processors;

storage means for storing one or more programs;

the one or more programs are executable by the one or more processors to cause the one or more processors to perform the dynamic light field reconstruction method of any of claims 1-7.