CN112330795A

CN112330795A - Human body three-dimensional reconstruction method and system based on single RGBD image

Info

Publication number: CN112330795A
Application number: CN202011080171.1A
Authority: CN
Inventors: 刘烨斌; 李哲; 于涛; 王松涛; 戴琼海
Original assignee: Tsinghua University
Current assignee: Tsinghua University
Priority date: 2020-10-10
Filing date: 2020-10-10
Publication date: 2021-02-05
Anticipated expiration: 2040-10-10
Also published as: CN112330795B

Abstract

The invention discloses a human body three-dimensional reconstruction method and a human body three-dimensional reconstruction system based on a single RGBD image, wherein the method comprises the following steps: shooting a single RGBD image by using a depth camera; rendering a single RGBD image by using the ultra-high quality human body three-dimensional model, and sampling space points to obtain training data; building a neural network and training by using training data; and inputting a single RGBD test image into the trained neural network for testing to generate a human body three-dimensional model. The method establishes an end-to-end neural network through implicit function representation, can deduce the geometric details of human bodies and clothes through supervised learning, and has the characteristics of simplicity and convenience compared with other traditional reconstruction methods.

Description

Human body three-dimensional reconstruction method and system based on single RGBD image

Technical Field

The invention relates to the technical field of computer vision and computer graphics, in particular to a human body three-dimensional reconstruction method and a human body three-dimensional reconstruction system based on a single RGBD image.

Background

Human body three-dimensional reconstruction technology based on a single RGBD image has attracted extensive attention in recent years. The human body three-dimensional model has wide application prospect and important application value in the fields of movie and television entertainment, virtual fitting, holographic communication, demographic data analysis and the like. However, the common multi-view reconstruction technology needs to build a multi-camera array system which is expensive to implement, and although the precision is high, the method has the following significant disadvantages: 1. not convenient enough, 2, expensive. In addition, the reconstruction technology based on a single RGB image is not robust enough and has no high application value. At present, depth cameras are quite popular, and the robustness of the reconstruction method can be greatly improved by adding depth information, so that the human body three-dimensional reconstruction technology based on a single RGBD has a very wide application prospect.

Disclosure of Invention

The present invention is directed to solving, at least to some extent, one of the technical problems in the related art.

Therefore, one object of the present invention is to provide a human body three-dimensional reconstruction method based on a single RGBD image, which is simple and convenient, and can realize more accurate human body reconstruction through the human body RGBD image captured by a single depth camera.

Another objective of the present invention is to provide a human body three-dimensional reconstruction system based on a single RGBD image.

In order to achieve the above object, an embodiment of the present invention provides a human body three-dimensional reconstruction method based on a single RGBD image, including the following steps: s1, shooting a single RGBD image by using a depth camera; s2, rendering the single RGBD image through a high-quality human body three-dimensional model, and collecting space points to obtain training data; s3, building a neural network, and training the neural network by using the training data; and S4, inputting a single RGBD test image into the trained neural network for testing to obtain a human body three-dimensional model.

According to the human body three-dimensional reconstruction method based on the single RGBD image, the depth camera is used for shooting the human body to acquire the RGBD image, the three-dimensional reconstruction function of the human body is realized based on the image, the acquired input information is very easy to acquire, the human body three-dimensional model can be conveniently and quickly acquired, the method is robust in solving, simple and easy to implement, has a wide application prospect, and can be quickly realized on hardware systems such as a PC or a workstation.

In addition, the human body three-dimensional reconstruction method based on a single RGBD image according to the above embodiment of the present invention may further have the following additional technical features:

further, in an embodiment of the present invention, in the step S2, a combination of spatially uniform sampling and adaptive sampling is used to acquire spatial points of the rendered RGBD image.

Further, in an embodiment of the present invention, the neural network includes an image coding network and a fully connected mapping network, where the image coding network is a fully convolutional neural network, and performs convolution and deconvolution on RGBD four-channel images in the training data to obtain a feature map, the fully connected mapping network is a multilayer perceptron, and maps a feature vector acquired from the feature map and a vector formed by z (x) cascade connection through interconnection of multiple layers of neurons to obtain an implicit function value, a loss function is constructed according to the feature map and the implicit function value, and the neural network is optimized through the loss function to obtain the trained neural network.

Further, in one embodiment of the present invention, the loss function is:

where n is the total number of samples, F (F (x)_i),z(X_i) F (x) is the result of the network output_i) Feature vector, z (X), for 2D projection point X on the image_i) Is the depth value of X relative to the depth camera, gt (X)_i) Is X_iThe corresponding true value 0 or 1.

Further, in an embodiment of the present invention, the step S4 further includes: encoding the input single RGBD test image into a characteristic diagram by using an image encoding network; uniformly dividing the feature map into volume volumes of NxNxN, wherein each small grid is a volume element volume, inputting each volume element volume into the trained neural network, sampling a corresponding feature vector on the feature map according to the projection position of the feature map, and inputting the feature vector and the depth value of the volume element volume relative to the depth camera into a fully-connected mapping network of the trained neural network to obtain an implicit function value corresponding to each volume element volume; and extracting an isosurface from the volume of the implicit function value by adopting an isosurface extraction MarchingCubes method, and deducing the human body three-dimensional model.

In order to achieve the above object, another embodiment of the present invention provides a human body three-dimensional reconstruction system based on a single RGBD image, including: a single depth camera for capturing a single RGBD image; the rendering module is used for rendering the single RGBD image through a high-quality human body three-dimensional model and collecting space points to obtain training data; the neural network training module is used for building a neural network and training the neural network by using the training data; and the model generation module is used for inputting the single RGBD test image into the trained neural network for testing to obtain the human body three-dimensional model.

The human body three-dimensional reconstruction system based on the single RGBD image provided by the embodiment of the invention utilizes the depth camera to shoot the human body so as to acquire the RGBD image, and realizes the three-dimensional reconstruction function of the human body based on the image, so that the acquired input information is very easy to acquire, and the human body three-dimensional model can be conveniently and quickly acquired.

In addition, the human body three-dimensional reconstruction system based on a single RGBD image according to the above embodiment of the present invention may further have the following additional technical features:

further, in an embodiment of the present invention, in the rendering module, a combination of spatially uniform sampling and adaptive sampling is adopted to acquire spatial points of the rendered RGBD image.

Further, in an embodiment of the present invention, the neural network includes an image coding network and a fully connected mapping network, where the image coding network is a fully convolutional neural network, and performs convolution and deconvolution on RGBD four-channel images in the training data to obtain a feature map; the fully-connected mapping network is a multilayer perceptron, the characteristic vectors collected from the characteristic diagram and the vectors formed by z (X) cascade are mapped through the mutual connection of multilayer neurons to obtain implicit function values, a loss function is constructed according to the characteristic diagram and the implicit function values, and the image coding network and the fully-connected mapping network are optimized through the loss function to obtain a trained neural network.

Further, in one embodiment of the present invention, the loss function is:

Further, in an embodiment of the present invention, the model generation module is further configured to: encoding the single input RGBD test image into a characteristic diagram by using the image encoding network; uniformly dividing the feature map into volume volumes of NxNxN, wherein each small grid is a volume element volume, inputting each volume element volume into the trained neural network, sampling a corresponding feature vector on the feature map according to the projection position of the feature map, and inputting the feature vector and the depth value of the volume element volume relative to the depth camera into a fully-connected mapping network of the trained neural network to obtain an implicit function value corresponding to each volume element volume; and extracting an isosurface from the volume of the implicit function value by adopting an isosurface extraction MarchingCubes method, and deducing the human body three-dimensional model.

Additional aspects and advantages of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.

Drawings

The foregoing and/or additional aspects and advantages of the present invention will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:

FIG. 1 is a flow chart of a human body three-dimensional reconstruction method based on a single RGBD image according to an embodiment of the present invention;

fig. 2 is a schematic structural diagram of a human body three-dimensional reconstruction system based on a single RGBD image according to an embodiment of the present invention.

Detailed Description

Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are illustrative and intended to be illustrative of the invention and are not to be construed as limiting the invention.

The following describes a three-dimensional human body reconstruction method and system based on a single RGBD image according to an embodiment of the present invention with reference to the drawings, and first, the three-dimensional human body reconstruction method based on a single RGBD image according to an embodiment of the present invention will be described with reference to the drawings.

Fig. 1 is a flowchart of a human body three-dimensional reconstruction method based on a single RGBD image according to an embodiment of the present invention.

As shown in fig. 1, the human body three-dimensional reconstruction method based on a single RGBD image comprises the following steps:

in step S1, a single RGBD image is captured with the depth camera.

In step S2, a single RGBD image is rendered by a high-quality three-dimensional human body model, and spatial points are acquired to obtain training data.

Further, in an embodiment of the present invention, a combination of spatially uniform sampling and adaptive sampling is adopted to collect spatial points of the rendered RGBD image.

For example, 60 viewpoints are horizontally selected to render an image by taking a three-dimensional human body model as a center, and according to a camera imaging principle:

wherein (u, v) is the pixel coordinate on the image, (x, y, z) is the space point coordinate, f_x,f_y,c_x,c_yIs the camera internal reference. Normalizing the three-dimensional model to be (-0.5, -0.5, -0.5) to (0.5,0.5,0.5), and selecting the focal length f_x＝f_y5000, the distance between the camera and the model is 10. Meanwhile, sampling is carried out on spatial points in a mode of combining spatial uniform sampling and adaptive sampling to obtain training data, specifically, 5000 points are randomly sampled in a 1 x 1 cube between (-0.5, -0.5, -0.5) and (0.5,0.5,0.5), in addition, surface points (more than surface vertexes) of a human body three-dimensional model are randomly sampled to obtain 16000 points, and all sampled spaces are used as the training data. The sampling method combining uniform sampling and self-adaptive sampling not only ensures the distribution universality of the sampling points, but also ensures that enough sampling points are near the model so as to learn more accurate human body surface representation near the model through a network.

In step S3, a neural network is constructed and trained using the training data.

Further, in an embodiment of the present invention, the neural network includes an image coding network and a fully connected mapping network, where the image coding network is a fully convolutional neural network, the RGBD four-channel image in the training data is convolved and deconvolved to obtain a feature map, the fully connected mapping network is a multilayer perceptron, and feature vectors acquired from the feature map and vectors formed by z (x) cascade are mapped to obtain implicit function values by interconnection of multilayer neurons, a loss function is constructed according to the feature map and the implicit function values, and the neural network is optimized through the loss function to obtain the trained neural network.

For example, the surface of the human body can be described by an implicit function f, i.e.

f(X)＝0,X∈R³

It is expressed in the form of a complex function

s＝f(F(x),z(X))

Where X ═ pi (X) is the 2D projection point on the image, f (X) is the feature vector for X, obtained from the full convolution network, and z (X) is the depth value of X relative to the camera. Based on the above representation, neural networks mainly comprise two parts: an image coding network and a fully connected mapping network. The image coding network is a full convolution network, the input is an RGBD four-channel image, and the output is a characteristic diagram. The fully-connected mapping network is a multilayer perceptron, the input is a characteristic vector sampled from a characteristic diagram and a vector formed by z (X) cascade, the output is a scalar value, 1 represents that X is outside a human body, and 0 represents that X is inside the human body. Constructing a loss function from the network output values and the true values

In step S4, a single RGBD test image is input into the trained neural network for testing, and a three-dimensional model of the human body is obtained.

Specifically, the image coding network codes an input RGBD image into a characteristic diagram; then, uniformly dividing a 1 × 1 × 1 cube into nxnxnxn volumes, wherein each small square is a volume, inputting each volume into a network, sampling on a feature map according to a projection position of the feature map to obtain a corresponding feature vector, and inputting the feature vector and a depth value of the volume relative to a camera into a full-connection mapping network together to obtain an implicit function value corresponding to each volume; and finally, extracting the isosurface from the volume by adopting an isosurface extraction MarchingCubes method to obtain the deduced human body three-dimensional model.

On the other hand, it can be understood that since the implicit function values of all visible regions are known due to the known depth image, it is not necessary to infer the implicit function values of these points, and the cuda program can be written for fast screening, and only invisible voxels are input into the network, so that the inference speed of the network can be greatly improved.

It should be noted that, the steps S1-S3 are specific steps for implementing the whole system, and in actual use, only a single RGBD image needs to be input into the network, so that the three-dimensional model of the human body can be obtained.

According to the human body three-dimensional reconstruction method based on the single RGBD image, which is provided by the embodiment of the invention, the RGBD image is acquired by shooting the human body by using the depth camera, the three-dimensional reconstruction function of the human body is realized based on the image, the acquired input information is very easy to acquire, and the human body three-dimensional model can be conveniently and quickly acquired.

Next, a three-dimensional human body reconstruction system based on a single RGBD image according to an embodiment of the present invention will be described with reference to the drawings.

As shown in fig. 2, the system 10 includes: a single depth camera 100, a rendering module 200, a neural network training module 300, and a model generation module 400.

Therein, a single depth camera 100 is used to capture a single RGBD image. And the rendering module 200 is configured to render a single RGBD image through the high-quality three-dimensional human body model, and acquire a space point to obtain training data. And the neural network training module 300 is used for building a neural network and training the neural network by using the training data. And the model generation module 400 is configured to input a single RGBD test image into the trained neural network for testing, so as to obtain a three-dimensional model of the human body.

Further, in an embodiment of the present invention, in the rendering module, a combination of spatially uniform sampling and adaptive sampling is adopted to collect spatial points of the rendered RGBD image.

Further, in an embodiment of the present invention, the neural network includes an image coding network and a fully connected mapping network, where the image coding network is a fully convolutional neural network, and performs convolution and deconvolution on RGBD four-channel images in the training data to obtain a feature map; the fully-connected mapping network is a multilayer perceptron, the characteristic vectors collected from the characteristic diagram and the vectors formed by z (X) cascade are mapped through the mutual connection of multilayer neurons to obtain implicit function values, a loss function is constructed according to the characteristic diagram and the implicit function values, and the trained neural network is obtained through the loss function optimization image coding network and the fully-connected mapping network.

Further, in one embodiment of the present invention, the loss function is:

Further, in an embodiment of the present invention, the model generation module is further configured to: encoding an input single RGBD test image into a characteristic diagram by using an image encoding network; uniformly dividing the feature map into volume volumes of NxNxN, wherein each small grid is a volume element volume, inputting each volume element volume into a trained neural network, sampling a corresponding feature vector on the feature map according to the projection position of the feature map, and inputting the feature vector and the depth value of the volume element volume relative to a depth camera into a fully-connected mapping network of the trained neural network to obtain an implicit function value corresponding to each volume element volume; and extracting the isosurface from the volume of the implicit function value by adopting an isosurface extracting MarchingCubes method, and deducing the human body three-dimensional model.

According to the human body three-dimensional reconstruction system based on the single RGBD image, which is provided by the embodiment of the invention, the RGBD image is acquired by shooting the human body by using the depth camera, the three-dimensional reconstruction function of the human body is realized based on the image, the acquired input information is very easy to acquire, and the human body three-dimensional model can be conveniently and quickly acquired.

Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present invention, "a plurality" means at least two, e.g., two, three, etc., unless specifically limited otherwise.

In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.

Although embodiments of the present invention have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present invention, and that variations, modifications, substitutions and alterations can be made to the above embodiments by those of ordinary skill in the art within the scope of the present invention.

Claims

1. A human body three-dimensional reconstruction method based on a single RGBD image is characterized by comprising the following steps:

s1, shooting a single RGBD image by using a depth camera;

s2, rendering the single RGBD image through a high-quality human body three-dimensional model, and collecting space points to obtain training data;

s3, building a neural network, and training the neural network by using the training data; and

and S4, inputting a single RGBD test image into the trained neural network for testing to obtain a human body three-dimensional model.

2. The method for three-dimensional human body reconstruction based on single RGBD image according to claim 1, wherein in step S2, the spatial points of the rendered RGBD image are collected by a combination of spatially uniform collection and adaptive sampling.

3. The method as claimed in claim 1, wherein the neural network includes an image coding network and a fully connected mapping network, wherein the image coding network is a fully convolutional neural network, the RGBD four-channel image in the training data is convolved and deconvolved to obtain a feature map, the fully connected mapping network is a multi-layer perceptron, a feature vector collected from the feature map and a vector formed by z (x) cascade are mapped through interconnection of multi-layer neurons to obtain implicit function values, a loss function is constructed according to the feature map and the implicit function values, and the neural network is optimized through the loss function to obtain the trained neural network.

4. The method of claim 3, wherein the loss function is:

5. The method for human body three-dimensional reconstruction based on single RGBD image according to claim 1, wherein said step S4 further comprises:

encoding the input single RGBD test image into a characteristic diagram by using an image encoding network;

uniformly dividing the feature map into volume volumes of NxNxN, wherein each small grid is a volume element volume, inputting each volume element volume into the trained neural network, sampling a corresponding feature vector on the feature map according to the projection position of the feature map, and inputting the feature vector and the depth value of the volume element volume relative to the depth camera into a fully-connected mapping network of the trained neural network to obtain an implicit function value corresponding to each volume element volume;

and extracting an isosurface from the volume of the implicit function value by adopting an isosurface extraction MarchingCubes method, and deducing the human body three-dimensional model.

6. A human body three-dimensional reconstruction system based on a single RGBD image is characterized by comprising the following components:

a single depth camera for capturing a single RGBD image;

the rendering module is used for rendering the single RGBD image through a high-quality human body three-dimensional model and collecting space points to obtain training data;

the neural network training module is used for building a neural network and training the neural network by using the training data;

and the model generation module is used for inputting the single RGBD test image into the trained neural network for testing to obtain the human body three-dimensional model.

7. The system of claim 6, wherein the rendering module collects the spatial points of the rendered RGBD image by a combination of spatially uniform collection and adaptive sampling.

8. The system of claim 6, wherein the neural network comprises an image coding network and a fully connected mapping network, wherein,

the image coding network is a full convolution neural network, and the RGBD four-channel image in the training data is convolved and deconvoluted to obtain a characteristic diagram;

the fully-connected mapping network is a multilayer perceptron, the characteristic vectors collected from the characteristic diagram and the vectors formed by z (X) cascade are mapped through the mutual connection of multilayer neurons to obtain implicit function values, a loss function is constructed according to the characteristic diagram and the implicit function values, and the image coding network and the fully-connected mapping network are optimized through the loss function to obtain a trained neural network.

9. The system of claim 8, wherein the loss function is:

wherein the content of the first and second substances,n is the total number of samples, F (F (x)_i),z(X_i) F (x) is the result of the network output_i) Feature vector, z (X), for 2D projection point X on the image_i) Is the depth value of X relative to the depth camera, gt (X)_i) Is X_iThe corresponding true value 0 or 1.

10. The system of claim 6, wherein the model generation module is further configured to:

encoding the single input RGBD test image into a characteristic diagram by using the image encoding network;