CN111476899A

CN111476899A - Three-dimensional reconstruction method for dense texture coordinates of human hand based on single-viewpoint RGB camera

Info

Publication number: CN111476899A
Application number: CN202010211923.7A
Authority: CN
Inventors: 刘烨斌; 李梦成; 安亮; 于涛; 戴琼海
Original assignee: Tsinghua University
Current assignee: Tsinghua University
Priority date: 2020-03-24
Filing date: 2020-03-24
Publication date: 2020-07-31

Abstract

The invention discloses a three-dimensional reconstruction method of dense texture coordinates of a human hand based on a single-viewpoint RGB camera, which comprises the following steps: a single color (RGB) camera is used for obtaining a picture of a human hand, the human hand area is processed and recognized in real time, the posture and the shape of the human hand are obtained through a deep learning algorithm, and the human hand is expressed in a dense texture coordinate mode, so that a three-dimensional model of the surface of the human hand is obtained. The method can better accord with the characteristics of a neural network in a three-dimensional point expression method of the texture coordinate mapping.

Description

Three-dimensional reconstruction method for dense texture coordinates of human hand based on single-viewpoint RGB camera

Technical Field

The invention relates to the technical field of dynamic capture and three-dimensional reconstruction in the field of computer vision, in particular to dynamic detection and real-time three-dimensional reconstruction of a single-view hand.

Background

In the field of three-dimensional dynamic reconstruction, three-dimensional reconstruction of human hands is a very hot and difficult problem. Accurate dynamic hand reconstruction is very challenging due to the complex and flexible movements of human hands and often accompanied by the problem of self-occlusion. The hand is used as a main way for the physical contact between the human body and the outside, and the hand reconstruction has great application in the fields of virtual reality, man-machine interaction and the like.

In daily life, people and the outside world are constantly and constantly interacting, and the most frequent is physical interaction between hands and the outside world. If the interaction process of the hand and the outside can be well reconstructed, the method has great help for understanding deep meaning of interaction of people and the outside, and has great effect on the fields of artificial intelligence, computer interaction, VR games and the like in industry.

In recent years, some artificial intelligence-based human hand detection algorithms are proposed due to machine learning and the rapid development of neural networks, but the output of the traditional convolutional neural network is a two-dimensional picture or a one-dimensional array, and the three-dimensional shape of the human hand is not intuitively expressed. Based on this, it is urgently needed to provide a three-dimensional point expression method based on texture coordinate mapping, which can better conform to the characteristics of a neural network.

Disclosure of Invention

The present invention is directed to solving, at least to some extent, one of the technical problems in the related art.

Therefore, one purpose of the invention is to provide a three-dimensional reconstruction method of dense texture coordinates of a human hand based on a single-viewpoint RGB camera, which can better conform to the characteristics of a neural network. .

In order to achieve the above purpose, the embodiment of the present invention provides a three-dimensional reconstruction method for dense texture coordinates of a human hand based on a single viewpoint RGB camera, which includes the following steps: acquiring a hand image, and establishing a three-dimensional hand model according to the acquired hand image; identifying the three-dimensional model of the hand by a deep learning algorithm to obtain dense texture coordinates of the hand; and reconstructing a human hand three-dimensional model according to the human hand dense texture coordinates.

According to the three-dimensional reconstruction method of the dense texture coordinates of the human hand based on the single-viewpoint RGB camera, the parameterized expression template of the three-dimensional human hand model is established, the three-dimensional texture of the three-dimensional human hand model is expanded, the large data is utilized to train the neural network to recognize the texture coordinates of the human hand, a human hand picture is obtained, the area where the human hand is located is recognized, the texture of the human hand is recognized by the deep learning algorithm, and the three-dimensional human hand model is reconstructed through the texture coordinates, so that the three-dimensional shape of the human hand has intuitive expression, and the three-dimensional human hand model can.

In addition, the three-dimensional reconstruction method for dense texture coordinates of a human hand based on the single-viewpoint RGB camera according to the above embodiment of the present invention may further have the following additional technical features:

further, in an embodiment of the present invention, the acquiring a human hand image, and establishing a human hand three-dimensional model according to the acquired human hand image further include: acquiring a hand image through a color camera; expanding three-dimensional textures of the hand image to obtain one-to-one correspondence between partial pixels of the hand image and surface points of the hand; respectively storing three-dimensional coordinates corresponding to the surface points of the human hand through the corresponding relation; and recovering the hand three-dimensional model of the hand image according to the three-dimensional coordinates.

Further, in one embodiment of the invention, the three-dimensional model of the human hand is a parameterized expression template, and the posture and the driving of the human hand are changed by changing parameters.

Further, in an embodiment of the present invention, the recognizing the three-dimensional model of the human hand by a deep learning algorithm to obtain dense texture coordinates of the human hand further includes: identifying the hand position from the hand image by using a target detection algorithm, and identifying and extracting a hand image of which the hand occupies most parts; detecting the hand diagrams, and uniformly converting all the hand diagrams into left hands or right hands by turning left and right; rendering a synthetic data set of the human hand by using the human hand three-dimensional model and a human hand texture map corresponding to the human hand three-dimensional model, and taking the synthetic data set as a pre-training data set; preprocessing the human hand detection neural network by using the pre-training data set to obtain an estimated value of human hand parameters and an estimated value of texture coordinates; and inputting the hand subgraph into a trained hand neural network to obtain the dense texture coordinates of the hand.

Further, in one embodiment of the present invention, the human hand detection neural network includes an encoder, a parameter decoder, and a texture coordinate decoder.

Further, in an embodiment of the present invention, the human hand graph is encoded into hidden variables by the encoder, the hidden variables are respectively input into a parameter decoder and a texture coordinate decoder, and the estimated value of the human hand parameter and the estimated value of the texture coordinate are output.

Additional aspects and advantages of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.

Drawings

The foregoing and/or additional aspects and advantages of the present invention will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:

fig. 1 is a flowchart of a three-dimensional reconstruction method of dense texture coordinates of a human hand based on a single-viewpoint RGB camera according to an embodiment of the present invention.

Detailed Description

Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are illustrative and intended to be illustrative of the invention and are not to be construed as limiting the invention.

The three-dimensional reconstruction method of dense texture coordinates of a human hand based on a single-viewpoint RGB camera according to the embodiment of the invention is described below with reference to the attached drawings.

As shown in fig. 1, the three-dimensional reconstruction method of dense texture coordinates of a human hand based on a single-viewpoint RGB camera includes the following steps:

in step S1, a hand image is acquired, and a three-dimensional model of the hand is created from the acquired hand image.

Further, in an embodiment of the present invention, the step S1 further includes:

acquiring a hand image through a color camera;

expanding three-dimensional textures of the hand image to obtain one-to-one correspondence between partial pixels of the hand image and surface points of the hand;

respectively storing three-dimensional coordinates corresponding to the surface points of the human hand through corresponding relations;

and recovering a hand three-dimensional model of the hand image according to the three-dimensional coordinates, wherein the hand three-dimensional model is a parameterized expression template, and the posture and the driving of the hand are changed by changing parameters.

For example, a hand is stored by a 3-channel color picture, part of pixels of the picture have one-to-one correspondence with surface points of the hand, the correspondence can be obtained by means of texture expansion and the like, three channels of the picture respectively store three-dimensional coordinates of the surface points of the hand, and a three-dimensional model of the hand can be recovered from the picture.

In step S2, the three-dimensional human hand model is recognized by a deep learning algorithm to obtain dense texture coordinates of the human hand.

Further, step S2 further includes: identifying the hand position from the hand image by using a target detection algorithm, and identifying and extracting a hand image of which the hand occupies most parts;

detecting the human hand diagrams, and uniformly converting all the human hand diagrams into left hands or right hands by turning left and right;

rendering a synthetic data set of the human hand by using the human hand three-dimensional model and a human hand texture mapping corresponding to the human hand three-dimensional model, and taking the synthetic data set as a pre-training data set;

preprocessing a human hand detection neural network by utilizing a pre-training data set to obtain an estimated value of human hand parameters and an estimated value of texture coordinates;

and inputting the hand subgraph into the trained hand neural network to obtain the dense texture coordinates of the hand.

The human hand detection neural network comprises an encoder, a parameter decoder and a texture coordinate decoder, a human hand graph is encoded into hidden variables through the encoder, the hidden variables are respectively input into the parameter decoder and the texture coordinate decoder, and an estimated value of a human hand parameter and an estimated value of a texture coordinate are output.

In step S3, a three-dimensional model of the human hand is reconstructed from the dense texture coordinates of the human hand.

The three-dimensional reconstruction method of dense texture coordinates of a human hand based on a single-viewpoint RGB camera is described in detail below with reference to specific examples.

Step 1, collecting a single-view angle RGB picture containing a human hand by using a common color camera, such as a mobile phone camera, a digital camera, a single lens reflex camera and the like.

Step 2, using a parameterized hand model, wherein the posture and shape of the hand can be changed by changing parameters;

step 3, unfolding the corresponding relation of the picture and the texture coordinate based on the texture of the human hand model;

and 4, recognizing the position of the hand from the color picture containing the hand by using a target detection algorithm, and extracting a subgraph of the majority of the hand.

And 5, detecting the subgraphs, and uniformly converting all the subgraphs into a left hand or a right hand by turning the images left and right.

And 6, rendering a synthetic data set of the human hand by using the parameterized human hand model and the corresponding human hand texture mapping, and pasting a background by using a common landscape picture. Thereby obtaining a pre-training data set.

Step 7, pre-training the human hand detection neural network by using a pre-training data set, wherein the network consists of three parts: an encoder, a parameter decoder, a texture coordinate decoder. The network input is a human hand subgraph, the human hand subgraph is coded into an implicit variable through a coder, and then the implicit variable is respectively input into a parameter decoder and a texture coordinate decoder to obtain the estimation of the human hand parameters and the estimation of the texture coordinates.

And 8, further training the pre-trained human hand detection neural network by utilizing a human hand data set which is really acquired or a rendered and synthesized human hand data set so as to improve the generalization capability of the network on real data. Meanwhile, an error function is constructed between the parameter estimation and the texture coordinate estimation, so that the texture coordinate output is more in line with the shape of a human hand.

And 9, inputting the human hand sub-picture obtained in the step 5 into the human hand detection neural network obtained in the step S8 to obtain a texture coordinate graph of the human hand.

And step 10, restoring the human hand expressed by the texture coordinates into a three-dimensional model, thereby reconstructing the three-dimensional human hand.

In summary, according to the three-dimensional reconstruction method for dense texture coordinates of a human hand based on a single-viewpoint RGB camera provided by the embodiment of the present invention, firstly, the texture unfolding mapping of the human hand model needs to be obtained, the human hand model can be simply and manually unfolded by using various existing three-dimensional modeling software, and due to the smoothness of the neural network, an unfolding mode with few cutting functions, such as unfolding the human hand into the front and back sides, is generally adopted. Secondly, pre-rendering is carried out through modeling software to obtain virtual data, mixed pre-training is carried out on the virtual data and public real data, pre-training of a neural network is carried out, an encoder, a texture decoder and a parameter decoder of the network need to be trained respectively, and then combined training is carried out. The method comprises the steps of collecting a hand picture through a camera, obtaining a hand area and a mark indicating whether the hand belongs to the left hand or the right hand by using a target detection algorithm, cutting the picture into a subgraph mainly comprising the hand part, uniformly converting the hand picture into the left hand or the right hand according to the mark, and converting the left hand picture into the right hand picture in a left-right mirror image overturning mode. Inputting the obtained human hand sub-image set into a network, and obtaining a texture coordinate image of the sub-image human hand through an encoder and a texture decoder. The texture coordinate picture is converted into a three-dimensional point cloud and connected to a patch, so that a three-dimensional hand model is reconstructed, the three-dimensional shape of the hand is visually expressed, and the characteristics of a neural network can be better met.

Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present invention, "a plurality" means at least two, e.g., two, three, etc., unless specifically limited otherwise.

In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.

Although embodiments of the present invention have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present invention, and that variations, modifications, substitutions and alterations can be made to the above embodiments by those of ordinary skill in the art within the scope of the present invention.

Claims

1. A three-dimensional reconstruction method for dense texture coordinates of a human hand based on a single-viewpoint RGB camera is characterized by comprising the following steps:

acquiring a hand image, and establishing a three-dimensional hand model according to the acquired hand image;

identifying the three-dimensional model of the hand by a deep learning algorithm to obtain dense texture coordinates of the hand; and

and reconstructing a human hand three-dimensional model according to the human hand dense texture coordinates.

2. The method for three-dimensional reconstruction of dense texture coordinates of a human hand based on a single-viewpoint RGB camera as claimed in claim 1, wherein the acquiring of the human hand image and the establishment of the three-dimensional model of the human hand according to the acquired human hand image further comprises:

acquiring a hand image through a color camera;

respectively storing three-dimensional coordinates corresponding to the surface points of the human hand through the corresponding relation;

and recovering the hand three-dimensional model of the hand image according to the three-dimensional coordinates.

3. The single-viewpoint RGB camera-based human hand dense texture coordinate three-dimensional reconstruction method as claimed in claim 2, wherein the human hand three-dimensional model is a parameterized expression template, and the posture and driving of the human hand are changed by changing parameters.

4. The human hand dense texture coordinate three-dimensional reconstruction method based on the single-viewpoint RGB camera as claimed in claim 1, wherein the recognizing the human hand three-dimensional model through a deep learning algorithm to obtain the human hand dense texture coordinate further comprises:

identifying the hand position from the hand image by using a target detection algorithm, and identifying and extracting a hand image of which the hand occupies most parts;

detecting the hand diagrams, and uniformly converting all the hand diagrams into left hands or right hands by turning left and right;

rendering a synthetic data set of the human hand by using the human hand three-dimensional model and a human hand texture map corresponding to the human hand three-dimensional model, and taking the synthetic data set as a pre-training data set;

preprocessing the human hand detection neural network by using the pre-training data set to obtain an estimated value of human hand parameters and an estimated value of texture coordinates;

and inputting the hand subgraph into a trained hand neural network to obtain the dense texture coordinates of the hand.

5. The single-viewpoint RGB camera-based human hand dense texture coordinate three-dimensional reconstruction method according to claim 4, wherein the human hand detection neural network comprises an encoder, a parameter decoder and a texture coordinate decoder.

6. The human hand dense texture coordinate three-dimensional reconstruction method based on the single-viewpoint RGB camera as claimed in claim 5, wherein the human hand subgraph is encoded into hidden variables by the encoder, the hidden variables are respectively input into a parameter decoder and a texture coordinate decoder, and the estimated values of the human hand parameters and the texture coordinates are output.