CN113192185B - Dynamic light field reconstruction method, device and equipment - Google Patents

Dynamic light field reconstruction method, device and equipment Download PDF

Info

Publication number
CN113192185B
CN113192185B CN202110540712.2A CN202110540712A CN113192185B CN 113192185 B CN113192185 B CN 113192185B CN 202110540712 A CN202110540712 A CN 202110540712A CN 113192185 B CN113192185 B CN 113192185B
Authority
CN
China
Prior art keywords
frame
view video
information
dynamic
rgb information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110540712.2A
Other languages
Chinese (zh)
Other versions
CN113192185A (en
Inventor
方璐
季梦奇
郑添
袁肖赟
王生进
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tsinghua University
Original Assignee
Tsinghua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tsinghua University filed Critical Tsinghua University
Priority to CN202110540712.2A priority Critical patent/CN113192185B/en
Publication of CN113192185A publication Critical patent/CN113192185A/en
Application granted granted Critical
Publication of CN113192185B publication Critical patent/CN113192185B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three dimensional [3D] modelling, e.g. data description of 3D objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/50Depth or shape recovery
    • G06T7/55Depth or shape recovery from multiple images
    • G06T7/557Depth or shape recovery from multiple images from light fields, e.g. from plenoptic cameras
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/80Analysis of captured images to determine intrinsic or extrinsic camera parameters, i.e. camera calibration
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/597Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding specially adapted for multi-view video sequence encoding

Abstract

The invention discloses a dynamic light field reconstruction method, a dynamic light field reconstruction device and dynamic light field reconstruction equipment. The method comprises the following steps: performing interframe motion estimation on the multi-view video respectively to determine the dynamic area of each view to obtain a dynamic area set; performing depth estimation on each frame of the multi-view video based on the dynamic region set to obtain a depth map of each frame of a main view video in the multi-view video; performing RGB compression coding on each frame of the multi-view video based on the dynamic region set to obtain first RGB information of each frame of the main view video; determining second RGB information and distance field SDF information for each frame of spatial voxels based on the depth map and the first RGB information; and constructing a three-dimensional dynamic model based on the second RGB information and the SDF information. By using the method, the three-dimensional construction can be directly carried out according to the first RGB information and the depth map without decompressing the first RGB information and the depth map, the compression and the three-dimensional construction can be integrally designed, the data flux in the three-dimensional construction process is reduced, and the light field reconstruction efficiency is effectively improved.

Description

Dynamic light field reconstruction method, device and equipment
Technical Field
The embodiment of the invention relates to the technical field of three-dimensional construction, in particular to a dynamic light field reconstruction method, a dynamic light field reconstruction device and dynamic light field reconstruction equipment.
Background
According to the characteristics of the light field, the light field can be equivalently regarded as the fusion of a series of two-dimensional graphs shot at different angles simultaneously. With the continuous update of light field imaging technology, the existing light field imaging can reach the spatial resolution of billions of pixels. The cost of obtaining high resolution light field information is extremely high data throughput. For example, in the top billion pixel level light field imaging at present, the occupied space of outputting one frame is up to more than 2GB, which brings great burden to storage and subsequent calculation processing.
In the prior art, the compression and three-dimensional reconstruction technologies of the light field are regarded as two unrelated tasks, and if a three-dimensional module needs to be reconstructed, the light field data in a compressed format needs to be decompressed before the decompressed light field can be three-dimensionally reconstructed. The compression and decompression of the light field data is a process with a large calculation amount, and in addition, the storage space is increased after the decompression of the light field data.
Disclosure of Invention
The embodiment of the invention provides a dynamic light field reconstruction method, a dynamic light field reconstruction device and dynamic light field reconstruction equipment, which can reduce data flux in a three-dimensional construction process and effectively improve the efficiency of light field reconstruction.
In a first aspect, an embodiment of the present invention provides a dynamic light field reconstruction method, including:
performing interframe motion estimation on the multi-view video respectively, determining a dynamic area of each view, and obtaining a dynamic area set;
performing depth estimation on each frame of the multi-view video based on the dynamic region set to obtain a depth map of each frame of a main view video in the multi-view video;
performing RGB compression coding on each frame of the multi-view video based on the dynamic region set to obtain first RGB information of each frame of the main view video;
determining second RGB information and distance field SDF information for each frame of spatial voxels from the depth map and the first RGB information;
and constructing a three-dimensional dynamic model based on the second RGB information and the SDF information.
In a second aspect, an embodiment of the present invention further provides a dynamic light field reconstruction apparatus, including:
the first acquisition module is used for respectively carrying out interframe motion estimation on the multi-view video, determining the dynamic area of each view and acquiring a dynamic area set;
a second obtaining module, configured to perform depth estimation on each frame of the multi-view video based on the dynamic region set, and obtain a depth map of each frame of a main view video in the multi-view video;
a third obtaining module, configured to perform RGB compression coding on each frame of the multi-view video based on the dynamic region set, to obtain first RGB information of each frame of the main view video;
a determination module to determine second RGB information and distance field SDF information for each frame of spatial voxels from the depth map and the first RGB information;
and the construction module is used for constructing a three-dimensional dynamic model based on the second RGB information and the SDF information.
In a third aspect, an embodiment of the present invention further provides a computer device, where the computer device includes:
one or more processing devices;
a storage device to store one or more instructions;
when executed by the one or more processing devices, cause the one or more processing devices to implement a dynamic light-field reconstruction method as described in embodiments of the invention.
The embodiment of the invention provides a dynamic light field reconstruction method, a device, equipment and a medium, wherein inter-frame motion estimation is firstly carried out on a multi-view video respectively, and a dynamic area of each view is determined to obtain a dynamic area set; secondly, depth estimation is carried out on each frame of the multi-view video based on the dynamic region set, and a depth map of each frame of a main view video in the multi-view video is obtained; then, performing RGB compression coding on each frame of the multi-view video based on the dynamic region set to obtain first RGB information of each frame of the main view video; then determining second RGB information and distance field SDF information for each frame of spatial voxels based on the depth map and the first RGB information; and finally, constructing a three-dimensional dynamic model based on the second RGB information and the SDF information. By utilizing the technical scheme, the three-dimensional construction can be directly carried out according to the first RGB information and the depth map, the first RGB information and the depth map do not need to be decompressed, the compression and the three-dimensional construction can be integrally designed, the data flux in the three-dimensional construction process is reduced, and the light field reconstruction efficiency is effectively improved.
Drawings
Fig. 1 is a schematic flowchart of a dynamic light field reconstruction method according to an embodiment of the present invention;
fig. 2 is a schematic structural diagram of a dynamic light field reconstruction apparatus according to a second embodiment of the present invention;
fig. 3 is a schematic structural diagram of a computer device according to a third embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be further noted that, for the convenience of description, only some of the structures related to the present invention are shown in the drawings, not all of the structures.
Before discussing exemplary embodiments in more detail, it should be noted that some exemplary embodiments are described as processes or methods depicted as flowcharts. Although a flowchart may describe the operations (or steps) as a sequential process, many of the operations can be performed in parallel, concurrently or simultaneously. In addition, the order of the operations may be re-arranged. The process may be terminated when its operations are completed, but may have additional steps not included in the figure. The processes may correspond to methods, functions, procedures, subroutines, and the like. In addition, the embodiments and features of the embodiments in the present invention may be combined with each other without conflict.
The term "include" and variations thereof as used herein are intended to be open-ended, i.e., "including but not limited to". The term "based on" is "based, at least in part, on". The term "one embodiment" means "at least one embodiment".
Example one
Fig. 1 is a flowchart of a dynamic light field reconstruction method according to an embodiment of the present invention, where the method is applicable to a case of performing three-dimensional reconstruction on a light field, and the method may be performed by a dynamic light field reconstruction apparatus, where the apparatus may be implemented by software and/or hardware, and is generally integrated on a computer device.
As shown in fig. 1, a dynamic light field reconstruction method provided in an embodiment of the present invention includes the following steps:
s110, performing inter-frame motion estimation on the multi-view video respectively, determining a dynamic area of each view, and obtaining a dynamic area set.
In this embodiment, the multi-view video may be a video captured by a light field camera at a plurality of different viewing angles, and for example, the multi-view video may be a video captured by 7 × 7 different viewing angles and having a resolution of 2560 × 1600. Among them, a light field camera is a type of imaging device that can capture the complete light field information of a photographed scene. The captured light field information includes not only the intensity of the light, but also the direction in which the light propagates in space.
The manner of determining the dynamic area of each view by performing inter-frame motion estimation on the multi-view video may be to calculate an optical flow value inside each pixel grid area of each view video by using an optical flow estimation method, mark an area with an internal optical flow value of 0 as a static area, and mark an area with an internal optical flow value of not 0 as a dynamic area.
Specifically, the performing inter-frame motion estimation on the multi-view video and determining the dynamic region of each view may include: for each visual angle video, carrying out grid division on the visual angle video to obtain a plurality of areas; acquiring optical flow values corresponding to the plurality of areas respectively; and determining the area with the optical flow value of non-0 as a dynamic area.
For example, taking the view angle k as an example for detailed description, the pixel grid of the view angle k is divided into a plurality of areas with the size of M × M, and the optical flow values in the M × M areas are calculated by an optical flow estimation method. The optical flow value is calculated in the following manner:
the optical flow field of the visual angle k at the time t is
Figure 249630DEST_PATH_IMAGE001
Figure 104454DEST_PATH_IMAGE002
For a two-dimensional vector, the viewing angle k is at time t
Figure 821874DEST_PATH_IMAGE003
Pixel value of the location is
Figure 748242DEST_PATH_IMAGE004
At time t +1, view k is at
Figure 969139DEST_PATH_IMAGE005
Pixel value of the location is
Figure 57180DEST_PATH_IMAGE006
. If the optical flow field of the visual angle k at the time t and the time t +1
Figure 730738DEST_PATH_IMAGE001
The following formula is satisfied:
Figure 195218DEST_PATH_IMAGE007
then represents
Figure 395255DEST_PATH_IMAGE005
The region where the position is located is a static region, and if the result obtained by the above formula is not 0, it indicates that the position is in a static region
Figure 529564DEST_PATH_IMAGE005
The area where the position is located is a dynamic area. Wherein the content of the first and second substances,
Figure 815052DEST_PATH_IMAGE008
represents
Figure 958588DEST_PATH_IMAGE009
The gradient value of (a).
And determining the dynamic area of each view according to the calculation process, and combining the dynamic areas of all the views into a dynamic area set.
And S120, performing depth estimation on each frame of the multi-view video based on the dynamic region set to obtain a depth map of each frame of a main view video in the multi-view video.
In this embodiment, after depth estimation is performed on each frame of the multi-view video, the multi-view video may be compressed in the main-view video, and a depth map of each frame of the main-view video is obtained. The depth estimation may be any depth estimation algorithm, and may be, for example, a phase-shift-based sub-pixel multi-view stereo matching algorithm. Wherein, the depth map of each frame can be understood as the information coding of each frame of video.
Specifically, the depth estimation of each frame of the multi-view video based on the dynamic region set, and the manner of obtaining the depth map of each frame of the main view video in the multi-view video may include: acquiring camera calibration parameters respectively corresponding to videos of all visual angles; for a first frame of the multi-view video, depth estimation is carried out on each pixel point of the first frame based on the camera calibration parameters, and a depth map of the first frame of the main view video in the multi-view video is obtained; for the Nth frame of the multi-view video, depth estimation is carried out on pixel points in the dynamic region set in the Nth frame based on the camera calibration parameters, and a depth map of the Nth frame of the main view video is obtained; wherein N is greater than or equal to 2.
The first frame of the multi-view video may be understood as the first frame of all the view videos, that is, assuming that there are D view videos, the first frame includes D first frames. Similarly, the nth frame of the multiview video may be understood as the nth frame of all the view videos, i.e., assuming that there are D views, the nth frame includes D views. The specific process of obtaining the camera calibration parameters corresponding to each view video may be determining external parameters and internal parameters between the light field cameras through multi-modal calibration. The external parameters of each angle light field camera are calibrated relative to the external parameters of the main viewing angle light field camera, and the calibration process only needs to be executed once. It should be noted that any method may be used to calibrate the multi-view light field camera and the RGB depth camera, and this is not limited herein.
Illustratively, for a first frame of a multi-view video, complete depth estimation is performed on each pixel point of the first frame according to camera calibration parameters of each view video and a phase-shift-based sub-pixel multi-view stereo matching algorithm. And performing depth estimation on both the dynamic area and the static area where the pixel points are located, thereby obtaining a depth map of the first frame of the main visual angle video. In this embodiment, the process of performing complete depth estimation on each pixel point of the first frame may be understood as compressing the depth information of each pixel point in the multi-view video into the depth information of the first frame of the main-view video.
It should be noted that after depth estimation of a static region and a dynamic region is performed on a first frame of a multi-view video, depth estimation is performed only on the dynamic region for other frames of the multi-view video, and depth estimation is not performed on the static region.
For example, for an nth frame of each view video, depth estimation may be performed on pixel points in a dynamic region set in the nth frame according to calibration parameters of each view camera and a phase-shift-based sub-pixel multi-view stereo matching algorithm. It is understood that when matching corresponding points between different views, only a dynamic region set is searched, and each view corresponds to one dynamic region. And performing depth estimation on the Nth frame of the multi-view video to obtain a depth map of the Nth frame of the main view. In this embodiment, when performing depth estimation on the nth frame of a multi-view video, the dynamic region set may be used as a mask, and a dynamic region in each view video may be extracted quickly, so that only the extracted dynamic region is subjected to depth estimation, and depth estimation is not required for a static region, which may greatly reduce the amount of computation.
S130, performing RGB compression coding on each frame of the multi-view video based on the dynamic region set to obtain first RGB information of each frame of the main view video.
In this embodiment, RGB information in the entire area may be retained for the first frame of the multiview video, and the entire area may include a static area and a dynamic area; and only reserving RGB information in the dynamic area set for the Nth frame of the multi-view video, and compressing the RGB information of each frame of each view video in the main view to obtain the first RGB information of each frame of the main view.
Wherein the first RGB information may include RGB information of each frame. The RGB information may be color information of three channels, red, green, and blue, respectively, for each frame of video image.
Specifically, the RGB compression encoding of each frame of the multiview video based on the dynamic region set may obtain the first RGB information of each frame of the main view video in a manner including: for a first frame of the multi-view video, performing compression coding on RGB information of each pixel point of the first frame by adopting a set video coding algorithm to obtain first RGB information of a first frame of a main view video in the multi-view video; and for the Nth frame of the multi-view video, performing compression coding on RGB information of pixel points in the dynamic region of the Nth frame by adopting a set video coding algorithm to obtain first RGB information of the Nth frame of the main view video in the multi-view video.
The set Video Coding algorithm may be any Video Coding algorithm, and for example, the set Video Coding algorithm may be High Efficiency Video Coding (HEVC).
For example, for a first frame of a multi-view video, HEVC may be adopted to compress and encode RGB information of each pixel point in the first frame of the video, and the RGB of the first frame of each view video is compressed into the RGB of the first frame of the main view video to obtain the RGB information of the first frame of the main view video. And for the Nth frame of the multi-view video, only compressing the RGB information of the pixel points in the dynamic region in the Nth frame by adopting a set video coding algorithm, and compressing the RGB of the Nth frame of each view video into the RGB of the Nth frame of the main view video to obtain the RGB information of the Nth frame of the main view video. In this embodiment, when performing compression coding on the RGB information of the nth frame of the multi-view video, the dynamic region set may be used as a mask, and the dynamic region in each view video may be extracted quickly, so that only the extracted dynamic region is subjected to compression coding on the RGB information, and the static region does not need to be processed, which may greatly reduce the amount of computation.
S140, determining second RGB information and distance field SDF information of each frame of the spatial voxel according to the depth map and the first RGB information.
Wherein the second RGB information may include RGB information for each frame of spatial voxels; the SDF information may be understood as SDF information for each frame of voxels.
In the embodiment, the SDF information of each frame of the spatial voxel in the dynamic region set can be determined according to the depth map of each frame of the main view video; the RGB information of each frame of the spatial voxel in the dynamic region set can be determined according to the RGB information of each frame of the main visual angle video.
Wherein determining the second RGB information and the SDF information for the distance field may be divided into two parts, the first part may include determining initial SDF information for each frame of the spatial voxel and target SDF information for a first frame of the spatial voxel and determining second initial RGB information for the spatial voxel and target RGB information for the first frame of the spatial voxel; the second portion may include calculating target SDF information for an nth frame of spatial voxels and calculating second RGB information for the nth frame of spatial voxels.
The voxel can be understood as an abbreviation of a volume element, and a solid containing the voxel can be represented by solid rendering or by extracting a polygonal isosurface of a given threshold contour.
It should be noted that, for the nth frame of the spatial voxel, only the target SDF information and the second RGB information of the frame in the dynamic region under the main view angle need to be calculated.
Specifically, the way to determine the distance field SDF information for each frame of spatial voxels may be: projecting a space voxel into the depth map, and acquiring a projection distance and a pixel point projected by the center of the space voxel; determining initial SDF information of each frame of the space voxel according to the projection distance and the depth value of the pixel point projected by the center; for a first frame of the spatial voxels, determining the initial SDF information as target SDF information for the first frame; for the nth frame of the spatial voxels, determining target SDF information for the nth frame according to the following formula:
Figure 419657DEST_PATH_IMAGE010
wherein the content of the first and second substances,
Figure 115080DEST_PATH_IMAGE011
is the target SDF information of the N-1 th frame,
Figure 887864DEST_PATH_IMAGE012
is the weight value of the N-1 th frame,
Figure 295144DEST_PATH_IMAGE013
for the initial SDF information of the nth frame,
Figure 469773DEST_PATH_IMAGE014
is a weight value for the nth frame,
Figure 70519DEST_PATH_IMAGE015
the target SDF information is obtained; wherein N is greater than or equal to 2.
In this embodiment, the central point of the spatial voxel may be projected onto the depth map of the main viewing angle by the camera pose initialized by the camera and the internal and external parameters of the camera.
The method for determining the initial SDF information of the spatial voxel according to the projection distance and the depth value of the pixel point projected by the center may be: and taking the difference value of the projection distance and the depth value of the pixel point projected by the center as the initial SDF information of each frame.
Specifically, the difference between the projection distance of the first frame of the spatial voxel and the depth value of the pixel point projected by the voxel center is used as the initial SDF information of the first frame; similarly, the initial SDF information of the nth frame is calculated according to the difference between the projection distance of the nth frame of the spatial voxel and the depth value projected to the pixel point from the voxel center.
In this embodiment, the above formula may calculate the target SDF information of the nth frame using the initial SDF information of the nth frame and the target SDF information of the N-1 th frame. The target SDF information of the (N-1) th frame can be obtained by utilizing the formula according to the target SDF information of the (N-2) th frame and the initial SDF information of the (N-1) th frame. It should be noted that the calculation target SDF information is calculated only for the nth frame in the dynamic region under the main view.
Specifically, the manner of determining the second RGB information of each frame of the spatial voxel may be: determining first RGB information of a pixel point projected by the center of a spatial voxel as second initial RGB information of the spatial voxel; for a first frame of the spatial voxels, determining the initial RGB information as target RGB information for the first frame; for the Nth frame of the spatial voxel, the following formula is adoptedDetermining second target RGB information for the Nth frame:
Figure 205965DEST_PATH_IMAGE016
wherein the content of the first and second substances,
Figure DEST_PATH_IMAGE017
is the second target RGB information of the N-1 th frame,
Figure 956884DEST_PATH_IMAGE018
is the second initial RGB information of the nth frame,
Figure 986020DEST_PATH_IMAGE019
second target RGB information of the nth frame.
In this embodiment, the central point of the spatial voxel may be projected onto the RGB map of the principal viewing angle by the camera pose initialized by the camera and the internal and external parameters of the camera.
Wherein the second initial RGB information may be understood as the initial RGB information of the first frame of spatial voxels. Further, the second initial RGB information may be used as the target RGB information of the first frame of the spatial voxel, and the target RGB information of the nth frame of the spatial voxel may be calculated as the second target RGB information by the above formula for the nth frame of the spatial voxel.
It should be noted that the process of calculating the second target RGB information of the nth frame may be similar to the process of calculating the target SDF information described above. The above formula may calculate the second target RGB information of the nth frame using the second initial RGB information of the nth frame and the second target RGB information of the N-1 th frame. The second target RGB information of the N-1 th frame may be calculated according to the second target RGB information of the N-2 th frame and the second initial RGB information of the N-1 th frame by the above formula. It should be noted that, the second target RGB information is calculated only for the nth frame in the dynamic region under the main viewing angle.
S150, building a three-dimensional dynamic model based on the second RGB information and the SDF information.
In this embodiment, a three-dimensional dynamic model may be constructed according to the calculated RGB information of each frame of the spatial voxel in the dynamic region of the principal viewing angle and the SDF information of each frame of the spatial voxel in the dynamic region of the principal viewing angle.
Specifically, the process of constructing the three-dimensional dynamic state may include: for each frame of the spatial voxels, constructing a three-dimensional model of each frame by adopting a set grid extraction algorithm based on the target SDF information and the second target RGB information; and combining the three-dimensional models of each frame to obtain a three-dimensional dynamic model.
The set mesh extraction algorithm may be any mesh extraction algorithm, and is not limited herein. The manner of combining the three-dimensional models of each frame to obtain the three-dimensional dynamic model may be: and combining the three-dimensional models of each frame according to the time stamp to obtain a dynamic three-dimensional model.
The dynamic light field reconstruction method provided by the embodiment of the invention comprises the steps of firstly, respectively carrying out interframe motion estimation on a multi-view video, determining a dynamic area of each view, and obtaining a dynamic area set; secondly, depth estimation is carried out on each frame of the multi-view video based on the dynamic region set, and a depth map of each frame of a main view video in the multi-view video is obtained; then, performing RGB compression coding on each frame of the multi-view video based on the dynamic region set to obtain first RGB information of each frame of the main view video; then determining second RGB information and distance field SDF information for each frame of spatial voxels based on the depth map and the first RGB information; and finally, constructing a three-dimensional dynamic model based on the second RGB information and the SDF information. By utilizing the technical scheme, the three-dimensional construction can be directly carried out according to the first RGB information and the depth map, the first RGB information and the depth map do not need to be decompressed, the compression and the three-dimensional construction can be integrally designed, the data flux in the three-dimensional construction process is reduced, and the light field reconstruction efficiency is effectively improved.
Example two
Fig. 2 is a schematic structural diagram of a dynamic light field reconstruction apparatus according to a second embodiment of the present invention, which may be applied to three-dimensional reconstruction of a light field, where the apparatus may be implemented by software and/or hardware and is generally integrated on a computer device.
As shown in fig. 2, the apparatus includes: a first acquisition module 210, a second acquisition module 220, a third acquisition module 230, a determination module 240, and a construction module 250.
A first obtaining module 210, configured to perform inter-frame motion estimation on a multi-view video, determine a dynamic region of each view, and obtain a dynamic region set;
a second obtaining module 220, configured to perform depth estimation on each frame of the multi-view video based on the dynamic region set, so as to obtain a depth map of each frame of a main view video in the multi-view video;
a third obtaining module 230, configured to perform RGB compression coding on each frame of the multi-view video based on the dynamic region set, so as to obtain first RGB information of each frame of the main view video;
a determination module 240 to determine second RGB information and distance field SDF information for each frame of spatial voxels based on the depth map and the first RGB information;
a construction module 250 configured to construct a three-dimensional dynamic model based on the second RGB information and the SDF information.
In this embodiment, the device first passes; secondly, the first acquisition module is used for respectively carrying out interframe motion estimation on the multi-view video, determining the dynamic area of each view, and acquiring a dynamic area set; secondly, a second acquisition module is used for carrying out depth estimation on each frame of the multi-view video based on the dynamic region set to obtain a depth map of each frame of a main view video in the multi-view video; then, a third acquisition module is used for carrying out RGB compression coding on each frame of the multi-view video based on the dynamic region set to obtain first RGB information of each frame of the main view video; then determining, by a determination module, second RGB information and distance field SDF information for each frame of spatial voxels based on the depth map and the first RGB information; and finally, a construction module is used for constructing a three-dimensional dynamic model based on the second RGB information and the SDF information.
The embodiment provides a dynamic light field reconstruction device, and by using the device, three-dimensional construction can be directly performed according to first RGB information and a depth map, decompression of the first RGB information and the depth map is not needed, compression and three-dimensional construction can be integrally designed, data flux in a three-dimensional construction process is reduced, and light field reconstruction efficiency is effectively improved.
Further, the first obtaining module 210 is specifically configured to, for each view video, perform mesh division on the view video to obtain a plurality of regions; acquiring optical flow values corresponding to the plurality of areas respectively; and determining the area with the optical flow value of non-0 as a dynamic area.
On the basis of the optimization, the first obtaining module 220 is specifically configured to obtain camera calibration parameters corresponding to videos of various viewing angles; for a first frame of the multi-view video, depth estimation is carried out on each pixel point of the first frame based on the camera calibration parameters, and a depth map of the first frame of the main view video in the multi-view video is obtained; for the Nth frame of the multi-view video, depth estimation is carried out on pixel points in the dynamic region set in the Nth frame based on the camera calibration parameters, and a depth map of the Nth frame of the main view video is obtained; wherein N is greater than or equal to 2.
Based on the above technical solution, the third obtaining module 230 is specifically configured to, for a first frame of the multi-view video, perform compression coding on RGB information of each pixel point of the first frame by using a set video coding algorithm, so as to obtain first RGB information of the first frame of the main view video in the multi-view video; and for the Nth frame of the multi-view video, performing compression coding on RGB information of pixel points in the dynamic region of the Nth frame by adopting a set video coding algorithm to obtain first RGB information of the Nth frame of the main view video in the multi-view video.
Further, the determining module 240 is further configured to project a spatial voxel into the depth map, and obtain a projection distance and a pixel point projected by the center of the spatial voxel;
determining initial SDF information of each frame of the space voxel according to the projection distance and the depth value of the pixel point projected by the center; for a first frame of the spatial voxels, determining the initial SDF information as target SDF information for the first frame; for the nth frame of the spatial voxels, determining target SDF information for the nth frame according to the following formula:
Figure 23246DEST_PATH_IMAGE020
wherein the content of the first and second substances,
Figure 380409DEST_PATH_IMAGE011
is the target SDF information of the N-1 th frame,
Figure 528493DEST_PATH_IMAGE012
is the weight value of the N-1 th frame,
Figure 287502DEST_PATH_IMAGE013
for the initial SDF information of the nth frame,
Figure 495629DEST_PATH_IMAGE014
is a weight value for the nth frame,
Figure 464722DEST_PATH_IMAGE015
the target SDF information is obtained; wherein N is greater than or equal to 2.
Further, the determining module 240 is further configured to determine the first RGB information of the pixel point to which the center of the spatial voxel is projected as the second initial RGB information of the spatial voxel; for a first frame of the spatial voxels, determining the initial RGB information as target RGB information for the first frame; for an nth frame of the spatial voxels, determining target RGB information for the nth frame according to the following formula:
Figure 26285DEST_PATH_IMAGE016
wherein the content of the first and second substances,
Figure 30013DEST_PATH_IMAGE017
is the second target RGB information of the N-1 th frame,
Figure 549987DEST_PATH_IMAGE018
is the second initial RGB information of the nth frame,
Figure 740797DEST_PATH_IMAGE019
is the target RGB information of the nth frame.
Further, the constructing module 250 is specifically configured to, for each frame of the spatial voxels, construct a three-dimensional model of each frame by using a set mesh extraction algorithm based on the target SDF information and the second target RGB information; and combining the three-dimensional models of each frame to obtain a three-dimensional dynamic model.
The dynamic light field reconstruction device can execute the dynamic light field reconstruction method provided by any embodiment of the invention, and has corresponding functional modules and beneficial effects of the execution method.
EXAMPLE III
Fig. 3 is a schematic structural diagram of a computer device according to a third embodiment of the present invention. As shown in fig. 3, a computer device provided in a third embodiment of the present invention includes: one or more processors 31 and storage devices 32; the processor 31 in the computer device may be one or more, and fig. 3 illustrates one processor 31 as an example; storage 32 is used to store one or more programs; the one or more programs are executed by the one or more processors 31, so that the one or more processors 31 implement the dynamic light field reconstruction method according to any of the embodiments of the present invention.
The computer device may further include: an input device 33 and an output device 34.
The processor 31, the storage means 32, the input means 33 and the output means 34 in the computer apparatus may be connected by a bus or other means, which is exemplified in fig. 3.
The storage device 32 in the computer device serves as a computer-readable storage medium and may be used to store one or more programs, which may be software programs, computer-executable programs, and modules, such as program instructions/modules corresponding to the dynamic light field reconstruction method provided in an embodiment of the present invention (for example, the modules in the dynamic light field reconstruction apparatus shown in fig. 2 include the first obtaining module 210, the second obtaining module 220, the third obtaining module 230, the determining module 240, and the constructing module 250). The processor 31 executes various functional applications and data processing of the computer device by executing the software programs, instructions and modules stored in the storage device 32, namely, implements the dynamic light field reconstruction method in the above method embodiment.
The storage device 32 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to use of the computer device, and the like. Further, the storage device 32 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some examples, the storage 32 may further include memory located remotely from the processor 31, which may be connected to the device over a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The input device 33 may be used to receive input numeric or character information and to generate key signal inputs related to user settings and function controls of the computer apparatus. The output device 34 may include a display device such as a display screen.
And, when one or more programs included in the above-mentioned computer apparatus are executed by the one or more processors 31, the programs perform the following operations:
performing interframe motion estimation on the multi-view video respectively, determining a dynamic area of each view, and obtaining a dynamic area set;
performing depth estimation on each frame of the multi-view video based on the dynamic region set to obtain a depth map of each frame of a main view video in the multi-view video;
performing RGB compression coding on each frame of the multi-view video based on the dynamic region set to obtain first RGB information of each frame of the main view video;
determining second RGB information and distance field SDF information for each frame of spatial voxels from the depth map and the first RGB information;
and constructing a three-dimensional dynamic model based on the second RGB information and the SDF information.
Example four
A fourth embodiment of the present invention provides a computer-readable storage medium, on which a computer program is stored, where the computer program is used to execute a dynamic light field reconstruction method when executed by a processor, and the method includes:
performing interframe motion estimation on the multi-view video respectively, determining a dynamic area of each view, and obtaining a dynamic area set;
performing depth estimation on each frame of the multi-view video based on the dynamic region set to obtain a depth map of each frame of a main view video in the multi-view video;
performing RGB compression coding on each frame of the multi-view video based on the dynamic region set to obtain first RGB information of each frame of the main view video;
determining second RGB information and distance field SDF information for each frame of spatial voxels from the depth map and the first RGB information;
and constructing a three-dimensional dynamic model based on the second RGB information and the SDF information.
Optionally, the program, when executed by a processor, may be further configured to perform a dynamic light field reconstruction method provided in any embodiment of the present invention.
Computer storage media for embodiments of the invention may employ any combination of one or more computer-readable media. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a Read Only Memory (ROM), an Erasable Programmable Read Only Memory (EPROM), a flash Memory, an optical fiber, a portable CD-ROM, an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. A computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
A computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take a variety of forms, including, but not limited to: an electromagnetic signal, an optical signal, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, Radio Frequency (RF), etc., or any suitable combination of the foregoing.
Computer program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present invention and the technical principles employed. Those skilled in the art will appreciate that the present invention is not limited to the particular embodiments described herein, and that various obvious changes, rearrangements and substitutions will now be apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present invention has been described in greater detail by the above embodiments, the present invention is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present invention, and the scope of the present invention is determined by the scope of the appended claims.

Claims (9)

1. A method of dynamic light field reconstruction, comprising:
performing interframe motion estimation on the multi-view video respectively, determining a dynamic area of each view, and obtaining a dynamic area set;
performing depth estimation on each frame of the multi-view video based on the dynamic region set to obtain a depth map of each frame of a main view video in the multi-view video;
performing RGB compression coding on each frame of the multi-view video based on the dynamic region set to obtain first RGB information of each frame of the main view video;
determining second RGB information and distance field SDF information for each frame of spatial voxels from the depth map and the first RGB information;
and constructing a three-dimensional dynamic model based on the second RGB information and the SDF information.
2. The method of claim 1, wherein performing inter-frame motion estimation on the multi-view video to determine the dynamic region of each view comprises:
for each visual angle video, carrying out grid division on the visual angle video to obtain a plurality of areas;
acquiring optical flow values corresponding to the plurality of areas respectively;
and determining the area with the optical flow value of non-0 as a dynamic area.
3. The method of claim 1, wherein performing depth estimation on each frame of the multi-view video based on the set of dynamic regions to obtain a depth map for each frame of a main-view video in the multi-view video comprises:
acquiring camera calibration parameters respectively corresponding to videos of all visual angles;
for a first frame of the multi-view video, depth estimation is carried out on each pixel point of the first frame based on the camera calibration parameters, and a depth map of the first frame of the main view video in the multi-view video is obtained;
for the Nth frame of the multi-view video, depth estimation is carried out on pixel points in the dynamic region set in the Nth frame based on the camera calibration parameters, and a depth map of the Nth frame of the main view video is obtained; wherein N is greater than or equal to 2.
4. The method of claim 1, wherein performing RGB compression encoding on each frame of the multiview video based on the set of dynamic regions to obtain first RGB information for each frame of the main view video comprises:
for a first frame of the multi-view video, performing compression coding on RGB information of each pixel point of the first frame by adopting a set video coding algorithm to obtain first RGB information of a first frame of a main view video in the multi-view video;
and for the Nth frame of the multi-view video, performing compression coding on RGB information of pixel points in the dynamic region of the Nth frame by adopting a set video coding algorithm to obtain first RGB information of the Nth frame of the main view video in the multi-view video.
5. The method of claim 1, wherein determining distance field (SDF) information for each frame of spatial voxels from the depth map and the first RGB information comprises:
projecting a space voxel into the depth map, and acquiring a projection distance and a pixel point projected by the center of the space voxel;
determining initial SDF information of each frame of the space voxel according to the projection distance and the depth value of the pixel point projected by the center;
for a first frame of the spatial voxels, determining the initial SDF information as target SDF information for the first frame;
for the nth frame of the spatial voxels, determining target SDF information for the nth frame according to the following formula:
Figure DEST_PATH_IMAGE001
wherein the content of the first and second substances,
Figure 915499DEST_PATH_IMAGE002
is the target SDF information of the N-1 th frame,
Figure 383520DEST_PATH_IMAGE003
is the weight value of the N-1 th frame,
Figure 199030DEST_PATH_IMAGE004
for the initial SDF information of the nth frame,
Figure 877136DEST_PATH_IMAGE005
is a weight value for the nth frame,
Figure 311659DEST_PATH_IMAGE006
the target SDF information is obtained; wherein N is greater than or equal to 2.
6. The method of claim 5, wherein determining second RGB information for each frame of spatial voxels from the depth map and the first RGB information comprises:
determining first RGB information of a pixel point projected by the center of a spatial voxel as second initial RGB information of the spatial voxel;
for a first frame of the spatial voxels, determining the initial RGB information as target RGB information for the first frame;
for an nth frame of the spatial voxels, determining target RGB information for the nth frame according to the following formula:
Figure 758821DEST_PATH_IMAGE007
wherein the content of the first and second substances,
Figure 355018DEST_PATH_IMAGE008
is the second target RGB information of the N-1 th frame,
Figure 786000DEST_PATH_IMAGE009
is the second initial RGB information of the nth frame,
Figure 617690DEST_PATH_IMAGE010
is the target RGB information of the nth frame.
7. The method of claim 6, wherein constructing a three-dimensional dynamic model based on the second RGB information and the SDF information comprises:
for each frame of the spatial voxels, constructing a three-dimensional model of each frame by adopting a set grid extraction algorithm based on the target SDF information and the second target RGB information;
and combining the three-dimensional models of each frame to obtain a three-dimensional dynamic model.
8. A dynamic light field reconstruction apparatus, comprising:
the first acquisition module is used for respectively carrying out interframe motion estimation on the multi-view video, determining the dynamic area of each view and acquiring a dynamic area set;
a second obtaining module, configured to perform depth estimation on each frame of the multi-view video based on the dynamic region set, and obtain a depth map of each frame of a main view video in the multi-view video;
a third obtaining module, configured to perform RGB compression coding on each frame of the multi-view video based on the dynamic region set, to obtain first RGB information of each frame of the main view video;
a determination module to determine second RGB information and distance field SDF information for each frame of spatial voxels from the depth map and the first RGB information;
and the construction module is used for constructing a three-dimensional dynamic model based on the second RGB information and the SDF information.
9. A computer device, comprising:
one or more processors;
storage means for storing one or more programs;
the one or more programs are executable by the one or more processors to cause the one or more processors to perform the dynamic light field reconstruction method of any of claims 1-7.
CN202110540712.2A 2021-05-18 2021-05-18 Dynamic light field reconstruction method, device and equipment Active CN113192185B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110540712.2A CN113192185B (en) 2021-05-18 2021-05-18 Dynamic light field reconstruction method, device and equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110540712.2A CN113192185B (en) 2021-05-18 2021-05-18 Dynamic light field reconstruction method, device and equipment

Publications (2)

Publication Number Publication Date
CN113192185A CN113192185A (en) 2021-07-30
CN113192185B true CN113192185B (en) 2022-05-17

Family

ID=76982370

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110540712.2A Active CN113192185B (en) 2021-05-18 2021-05-18 Dynamic light field reconstruction method, device and equipment

Country Status (1)

Country Link
CN (1) CN113192185B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113989432A (en) * 2021-10-25 2022-01-28 北京字节跳动网络技术有限公司 3D image reconstruction method and device, electronic equipment and storage medium
CN114173106B (en) * 2021-12-01 2022-08-05 北京拙河科技有限公司 Real-time video stream fusion processing method and system based on light field camera
CN114579776B (en) * 2022-03-14 2023-02-07 武汉工程大学 Optical field data storage method and device, electronic equipment and computer medium
CN115022613A (en) * 2022-05-19 2022-09-06 北京字节跳动网络技术有限公司 Video reconstruction method and device, electronic equipment and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9883112B1 (en) * 2016-09-22 2018-01-30 Pinnacle Imaging Corporation High dynamic range imaging
CN110874851A (en) * 2019-10-25 2020-03-10 深圳奥比中光科技有限公司 Method, device, system and readable storage medium for reconstructing three-dimensional model of human body
CN112017228A (en) * 2019-05-31 2020-12-01 华为技术有限公司 Method for three-dimensional reconstruction of object and related equipment

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9883112B1 (en) * 2016-09-22 2018-01-30 Pinnacle Imaging Corporation High dynamic range imaging
CN112017228A (en) * 2019-05-31 2020-12-01 华为技术有限公司 Method for three-dimensional reconstruction of object and related equipment
CN110874851A (en) * 2019-10-25 2020-03-10 深圳奥比中光科技有限公司 Method, device, system and readable storage medium for reconstructing three-dimensional model of human body

Also Published As

Publication number Publication date
CN113192185A (en) 2021-07-30

Similar Documents

Publication Publication Date Title
CN113192185B (en) Dynamic light field reconstruction method, device and equipment
JP2019534606A (en) Method and apparatus for reconstructing a point cloud representing a scene using light field data
US20220130075A1 (en) Device and method for processing point cloud data
US9269163B2 (en) Method and apparatus for compressing or decompressing light field images
CN106716490A (en) Simultaneous localization and mapping for video coding
US10853975B2 (en) Hybrid projection-based point cloud texture coding
US9514569B2 (en) Method and apparatus for converting two-dimensional image into three-dimensional image
US20120212477A1 (en) Fast Haze Removal and Three Dimensional Depth Calculation
US9621901B1 (en) Encoding of computer-generated video content
CN113989432A (en) 3D image reconstruction method and device, electronic equipment and storage medium
US11601488B2 (en) Device and method for transmitting point cloud data, device and method for processing point cloud data
CN115690382B (en) Training method of deep learning model, and method and device for generating panorama
CN115035235A (en) Three-dimensional reconstruction method and device
CN111091491B (en) Panoramic video pixel redistribution method and system for equidistant cylindrical projection
CN116760965B (en) Panoramic video encoding method, device, computer equipment and storage medium
US11665369B2 (en) Method and a device for encoding a signal representative of a light-field content
KR20170073937A (en) Method and apparatus for transmitting image data, and method and apparatus for generating 3dimension image
US20220337872A1 (en) Point cloud data transmission device, point cloud data transmission method, point cloud data reception device, and point cloud data reception method
CN116016934A (en) Video encoding method, apparatus, electronic device, and computer-readable medium
US20220094910A1 (en) Systems and methods for predicting a coding block
EP4161074A1 (en) Point cloud data transmission device, point cloud data transmission method, point cloud data reception device and point cloud data reception method
US20150130898A1 (en) View synthesis using low resolution depth maps
US20240062428A1 (en) Point cloud data transmission method, point cloud data transmission device, point cloud data reception method, and point cloud data reception device
CN110771167A (en) Video compression method, device, computer system and movable equipment
US20230345008A1 (en) Point cloud data transmission device, point cloud data transmission method, point cloud data reception device, and point cloud data reception method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant