CN112348957A

CN112348957A - Three-dimensional portrait real-time reconstruction and rendering method based on multi-view depth camera

Info

Publication number: CN112348957A
Application number: CN202011225534.6A
Authority: CN
Inventors: 徐迪; 王凯; 毛文涛; 孙立; 张旭; 李臻
Original assignee: Shanghai Shadow Creator Information Technology Co Ltd
Current assignee: Shanghai Shadow Creator Information Technology Co Ltd
Priority date: 2020-11-05
Filing date: 2020-11-05
Publication date: 2021-02-09

Abstract

The invention discloses a real-time reconstruction and rendering method of a three-dimensional portrait based on a multi-view depth camera, which relates to the field of computer vision, and comprises the steps of calibrating a plurality of cameras to obtain relative poses among the cameras, transposing depth information output by the cameras into a unified three-dimensional coordinate system according to the relative poses to form point clouds, respectively judging whether each voxel is occupied by the point clouds, if so, respectively constructing a truncated symbol distance function for each voxel according to the relative poses and a pre-established portrait mask, carrying out weighted average on the truncated symbol distance functions corresponding to each voxel and utilizing a point cloud gridding algorithm to obtain a three-dimensional human body grid, carrying out weighted average on color information to obtain a three-dimensional human body grid carrying the color information, inputting the three-dimensional human body grid into a pre-established countermeasure neural network to render the three-dimensional human body grid, and a corresponding two-dimensional portrait is obtained, so that the real-time capture of the three-dimensional portrait is realized, and the imaging quality is improved.

Description

Three-dimensional portrait real-time reconstruction and rendering method based on multi-view depth camera

Technical Field

The invention relates to the field of computer vision, in particular to a three-dimensional portrait real-time reconstruction and rendering method based on a multi-view depth camera.

Background

At present, a plurality of three-dimensional whole body portrait volume shooting systems mainly adopt 2 types of schemes, one type adopts a high-definition color camera, depth information is lacked, the amount of calculation is huge by relying on a traditional feature point matching method, and real-time performance is difficult to achieve; one type employs depth cameras, but fails to address the problem of depth camera aberrations to black objects and to objects with finer structures, resulting in lower quality.

Therefore, the prior art fails to effectively solve the problems of real-time capture and high-quality rendering of three-dimensional portraits.

Disclosure of Invention

In order to overcome the defects of the prior art, the embodiment of the invention provides a three-dimensional portrait real-time reconstruction and rendering method based on a multi-view depth camera, which comprises the following steps:

calibrating a plurality of cameras to obtain relative poses among the cameras, and constructing a unified three-dimensional coordinate system according to the relative poses;

dividing a plurality of voxels in the three-dimensional coordinate system, and transposing the depth information output by the plurality of cameras into a uniform three-dimensional coordinate system to form a point cloud;

respectively judging whether each voxel is occupied by the point cloud, if so, respectively constructing a symbol distance function based on truncation for each voxel according to the relative pose and a pre-established portrait mask;

carrying out weighted average on the truncated symbol distance function corresponding to each voxel and obtaining a human body three-dimensional grid by utilizing a point cloud meshing algorithm;

constructing uv chartlets according to the relative poses, and projecting the color information of each camera to a human body three-dimensional grid in a chartlet mode;

calculating the weight of each overlapped area of the uv map according to the relation between the included angle and the distance between the three-dimensional point on the human body three-dimensional grid and the camera, and carrying out weighted average on the color information according to the weight to obtain the human body three-dimensional grid carrying the color information;

and inputting the human body three-dimensional grid into a pre-constructed antagonistic neural network to render the human body three-dimensional grid, so as to obtain a corresponding two-dimensional portrait.

Preferably, calibrating the plurality of cameras to obtain the relative poses between the plurality of cameras includes:

the method comprises the steps of using a plurality of calibration checkerboards, vertically placing the calibration checkerboards in the visible range of a plurality of cameras, extracting corner point information and sub-pixel-level corner point information of each calibration checkerboard, and obtaining the relative poses of the cameras according to the corner point information and the sub-pixel-level corner point information.

Preferably, the creation process of the portrait mask includes:

and taking the image of the depth value of the camera within the preset range as a human body image, and generating a soft-segmentation first human image mask.

Preferably, the creation process of the antagonistic neural network comprises:

constructing a first generation antagonistic neural network by utilizing a generator with a framework of U-Net and a discriminator with a framework of DenseNet-121, wherein the first generation antagonistic neural network is used for view angle synthesis;

and constructing a second generation antagonistic neural network by utilizing a combination of a generator with a U-Net framework and a super-resolution convolutional layer and a discriminator of the 5-layer convolutional neural network, wherein the second generation antagonistic neural network has the effect of image enhancement.

Preferably, after generating the soft segmented first portrait mask, the method further comprises:

and optimizing the first portrait mask by adopting an active contour model algorithm to obtain a second portrait mask.

The method for reconstructing and rendering the three-dimensional portrait in real time based on the multi-view depth camera provided by the embodiment of the invention has the following beneficial effects:

(1) by adopting a visual shell fusion scheme, a portrait mask and a visual shell are generated according to color information and depth information, and objects (such as black objects and thin structures) with poor shooting effects of a depth camera are completed, so that the aim of completing portrait information is fulfilled, real-time capture of a three-dimensional portrait is realized, and the imaging quality is improved;

(2) the three-dimensional portrait is rendered by adopting the generated antagonistic neural network, so that noise points are reduced, missing information is supplemented, the resolution is improved, and the imaging quality is further improved.

Detailed Description

The present invention will be described in detail with reference to the following embodiments.

The embodiment of the invention provides a three-dimensional portrait real-time reconstruction and rendering method based on a multi-view depth camera, which comprises the following steps:

s101, calibrating the cameras to obtain relative poses among the cameras, and constructing a unified three-dimensional coordinate system according to the relative poses.

S102, dividing a plurality of voxels in a three-dimensional coordinate system, and transposing depth information output by a plurality of cameras into a unified three-dimensional coordinate system to form a point cloud.

S103, respectively judging whether each voxel is occupied by the point cloud, if so, respectively constructing a symbol distance function based on truncation for each voxel according to the relative pose and a pre-established portrait mask.

Wherein the expression based on the truncated symbol distance function is: sdf (x) ═ depth (pic (x)) -cam (x), where pic (x) is the projection of the voxel center x on the depth image, depth (pic (x)) is the measured depth between the camera and the nearest object surface point on the observation ray intersection x, cam (x) is the distance between the voxel and the camera along the optical axis, and thus sdf (x) is also the distance along the optical axis.

And polymerizing the two-dimensional portrait masks to generate a visual shell, and calculating the corresponding truncation distance by the visual shell. (a visual shell is a three-dimensional convex hull of an object formed by the convergence of all known two-dimensional contours (portrait masks) of the object in space, and can be considered as a reasonable approximation of the three-dimensional shape of the object).

And S104, carrying out weighted average on the truncated symbol distance functions corresponding to the voxels, and obtaining a human body three-dimensional grid by using a point cloud gridding algorithm.

And S105, constructing uv maps according to the relative poses, and projecting the color information of each camera to a human body three-dimensional grid in a map form.

And S106, calculating the weight of each overlapped area of the uv map according to the relation between the included angle and the distance between the three-dimensional point on the human body three-dimensional grid and the camera, and carrying out weighted average on the color information according to the weight to obtain the human body three-dimensional grid carrying the color information.

As a specific embodiment, the generation process of the human body three-dimensional grid carrying color information is as follows:

and respectively calculating the direction v from the point to the optical axes of two adjacent cameras according to the unit normal vector n of the corresponding three-dimensional point p in each region where the overlay appears on the map_lAnd v_rDot multiplication with unit normal vector n to obtain w_lAnd w_rThen by w_lAnd w_rAnd performing weighted average to obtain the human body three-dimensional grid with color information as the weight of the color values of the two cameras at the p point of the three-dimensional point.

And S107, inputting the human body three-dimensional grid into a pre-constructed antagonistic neural network to render the human body three-dimensional grid, so as to obtain a corresponding two-dimensional portrait.

Optionally, calibrating the plurality of cameras to obtain the relative poses between the plurality of cameras includes:

the method comprises the steps of using a plurality of calibration checkerboards, vertically placing the calibration checkerboards in the visible range of a plurality of cameras, extracting corner point information and sub-pixel-level corner point information of each calibration checkerboard, and obtaining the relative pose among the cameras according to the corner point information and the sub-pixel-level corner point information.

Optionally, the creation process of the portrait mask includes:

Optionally, the creation process of the antagonistic neural network comprises:

As a specific embodiment of the present invention, the first generation of input data of the anti-neural network is a human body image captured by each camera, a corresponding skeleton image generated by the depth camera, and a corresponding skeleton image corresponding to a new angle of view to be rendered, respectively, and the output data is a synthesized human body image under the new angle of view; and the input data of the second generation antagonistic neural network are respectively a synthetic human body image, a two-dimensional image of the human body three-dimensional grid under the current visual angle, a corresponding human body normal map and a confidence map which are obtained by the first generation antagonistic neural network, and the output data is a high-quality two-dimensional rendering image.

Optionally, after generating the soft-segmented first portrait mask, the method further comprises:

The method for reconstructing and rendering the three-dimensional portrait in real time based on the multi-view depth camera provided by the embodiment of the invention comprises the steps of calibrating a plurality of cameras to obtain relative poses among the cameras, constructing a unified three-dimensional coordinate system according to the relative poses, dividing a plurality of voxels in the three-dimensional coordinate system, transposing depth information output by the cameras into the unified three-dimensional coordinate system to form point clouds, respectively judging whether each voxel is occupied by the point clouds, if so, respectively constructing a truncated symbolic distance function for each voxel according to the relative poses and a pre-established portrait mask, carrying out weighted average on the truncated symbolic distance functions corresponding to each voxel and utilizing a point cloud meshing algorithm to obtain a human three-dimensional grid, constructing uv chartlet according to the relative poses, and respectively projecting color information of each camera to the human three-dimensional grid in a chartlet form, for each region where the uv maps are overlapped, calculating the weight according to the relation between the included angle and the distance between the three-dimensional point on the human body three-dimensional grid and the camera, carrying out weighted average on the color information according to the weight to obtain the human body three-dimensional grid carrying the color information, inputting the human body three-dimensional grid into a pre-constructed antagonistic neural network to render the human body three-dimensional grid to obtain a corresponding two-dimensional portrait, realizing the real-time capture of the three-dimensional portrait and improving the imaging quality.

In the foregoing embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in the process, method, article, or apparatus that comprises the element.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The above are merely examples of the present application and are not intended to limit the present application. Various modifications and changes may occur to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the scope of the claims of the present application.

Claims

1. A three-dimensional portrait real-time reconstruction and rendering method based on a multi-view depth camera is characterized by comprising the following steps:

2. The method for reconstructing and rendering the three-dimensional portrait in real time based on the multi-view depth camera according to claim 1, wherein calibrating the plurality of cameras to obtain the relative poses between the plurality of cameras comprises:

3. The method for real-time reconstruction and rendering of three-dimensional portrait based on multi-view depth camera as claimed in claim 1, wherein the creation process of the portrait mask comprises:

4. The method for reconstructing and rendering the three-dimensional portrait based on the multi-view depth camera in real time as claimed in claim 1, wherein the creation process of the antagonistic neural network comprises:

5. The method for multi-view depth camera based three-dimensional real-time reconstruction and rendering of a portrait according to claim 3, wherein after generating the soft segmented first portrait mask, the method further comprises:

6. A computer arrangement comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the steps of claims 1-5 are implemented when the computer program is executed by the processor.