CN111476899A - Three-dimensional reconstruction method for dense texture coordinates of human hand based on single-viewpoint RGB camera - Google Patents

Three-dimensional reconstruction method for dense texture coordinates of human hand based on single-viewpoint RGB camera Download PDF

Info

Publication number
CN111476899A
CN111476899A CN202010211923.7A CN202010211923A CN111476899A CN 111476899 A CN111476899 A CN 111476899A CN 202010211923 A CN202010211923 A CN 202010211923A CN 111476899 A CN111476899 A CN 111476899A
Authority
CN
China
Prior art keywords
hand
human hand
dimensional
human
image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN202010211923.7A
Other languages
Chinese (zh)
Inventor
刘烨斌
李梦成
安亮
于涛
戴琼海
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tsinghua University
Original Assignee
Tsinghua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tsinghua University filed Critical Tsinghua University
Priority to CN202010211923.7A priority Critical patent/CN111476899A/en
Publication of CN111476899A publication Critical patent/CN111476899A/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three dimensional [3D] modelling, e.g. data description of 3D objects
    • G06T17/20Finite element generation, e.g. wire-frame surface description, tesselation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T15/003D [Three Dimensional] image rendering
    • G06T15/10Geometric effects
    • G06T15/20Perspective computation
    • G06T15/205Image-based rendering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/40Analysis of texture
    • G06T7/49Analysis of texture based on structural texture description, e.g. using primitives or placement rules

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Biomedical Technology (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Graphics (AREA)
  • Geometry (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a three-dimensional reconstruction method of dense texture coordinates of a human hand based on a single-viewpoint RGB camera, which comprises the following steps: a single color (RGB) camera is used for obtaining a picture of a human hand, the human hand area is processed and recognized in real time, the posture and the shape of the human hand are obtained through a deep learning algorithm, and the human hand is expressed in a dense texture coordinate mode, so that a three-dimensional model of the surface of the human hand is obtained. The method can better accord with the characteristics of a neural network in a three-dimensional point expression method of the texture coordinate mapping.

Description

Three-dimensional reconstruction method for dense texture coordinates of human hand based on single-viewpoint RGB camera
Technical Field
The invention relates to the technical field of dynamic capture and three-dimensional reconstruction in the field of computer vision, in particular to dynamic detection and real-time three-dimensional reconstruction of a single-view hand.
Background
In the field of three-dimensional dynamic reconstruction, three-dimensional reconstruction of human hands is a very hot and difficult problem. Accurate dynamic hand reconstruction is very challenging due to the complex and flexible movements of human hands and often accompanied by the problem of self-occlusion. The hand is used as a main way for the physical contact between the human body and the outside, and the hand reconstruction has great application in the fields of virtual reality, man-machine interaction and the like.
In daily life, people and the outside world are constantly and constantly interacting, and the most frequent is physical interaction between hands and the outside world. If the interaction process of the hand and the outside can be well reconstructed, the method has great help for understanding deep meaning of interaction of people and the outside, and has great effect on the fields of artificial intelligence, computer interaction, VR games and the like in industry.
In recent years, some artificial intelligence-based human hand detection algorithms are proposed due to machine learning and the rapid development of neural networks, but the output of the traditional convolutional neural network is a two-dimensional picture or a one-dimensional array, and the three-dimensional shape of the human hand is not intuitively expressed. Based on this, it is urgently needed to provide a three-dimensional point expression method based on texture coordinate mapping, which can better conform to the characteristics of a neural network.
Disclosure of Invention
The present invention is directed to solving, at least to some extent, one of the technical problems in the related art.
Therefore, one purpose of the invention is to provide a three-dimensional reconstruction method of dense texture coordinates of a human hand based on a single-viewpoint RGB camera, which can better conform to the characteristics of a neural network. .
In order to achieve the above purpose, the embodiment of the present invention provides a three-dimensional reconstruction method for dense texture coordinates of a human hand based on a single viewpoint RGB camera, which includes the following steps: acquiring a hand image, and establishing a three-dimensional hand model according to the acquired hand image; identifying the three-dimensional model of the hand by a deep learning algorithm to obtain dense texture coordinates of the hand; and reconstructing a human hand three-dimensional model according to the human hand dense texture coordinates.
According to the three-dimensional reconstruction method of the dense texture coordinates of the human hand based on the single-viewpoint RGB camera, the parameterized expression template of the three-dimensional human hand model is established, the three-dimensional texture of the three-dimensional human hand model is expanded, the large data is utilized to train the neural network to recognize the texture coordinates of the human hand, a human hand picture is obtained, the area where the human hand is located is recognized, the texture of the human hand is recognized by the deep learning algorithm, and the three-dimensional human hand model is reconstructed through the texture coordinates, so that the three-dimensional shape of the human hand has intuitive expression, and the three-dimensional human hand model can.
In addition, the three-dimensional reconstruction method for dense texture coordinates of a human hand based on the single-viewpoint RGB camera according to the above embodiment of the present invention may further have the following additional technical features:
further, in an embodiment of the present invention, the acquiring a human hand image, and establishing a human hand three-dimensional model according to the acquired human hand image further include: acquiring a hand image through a color camera; expanding three-dimensional textures of the hand image to obtain one-to-one correspondence between partial pixels of the hand image and surface points of the hand; respectively storing three-dimensional coordinates corresponding to the surface points of the human hand through the corresponding relation; and recovering the hand three-dimensional model of the hand image according to the three-dimensional coordinates.
Further, in one embodiment of the invention, the three-dimensional model of the human hand is a parameterized expression template, and the posture and the driving of the human hand are changed by changing parameters.
Further, in an embodiment of the present invention, the recognizing the three-dimensional model of the human hand by a deep learning algorithm to obtain dense texture coordinates of the human hand further includes: identifying the hand position from the hand image by using a target detection algorithm, and identifying and extracting a hand image of which the hand occupies most parts; detecting the hand diagrams, and uniformly converting all the hand diagrams into left hands or right hands by turning left and right; rendering a synthetic data set of the human hand by using the human hand three-dimensional model and a human hand texture map corresponding to the human hand three-dimensional model, and taking the synthetic data set as a pre-training data set; preprocessing the human hand detection neural network by using the pre-training data set to obtain an estimated value of human hand parameters and an estimated value of texture coordinates; and inputting the hand subgraph into a trained hand neural network to obtain the dense texture coordinates of the hand.
Further, in one embodiment of the present invention, the human hand detection neural network includes an encoder, a parameter decoder, and a texture coordinate decoder.
Further, in an embodiment of the present invention, the human hand graph is encoded into hidden variables by the encoder, the hidden variables are respectively input into a parameter decoder and a texture coordinate decoder, and the estimated value of the human hand parameter and the estimated value of the texture coordinate are output.
Additional aspects and advantages of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.
Drawings
The foregoing and/or additional aspects and advantages of the present invention will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:
fig. 1 is a flowchart of a three-dimensional reconstruction method of dense texture coordinates of a human hand based on a single-viewpoint RGB camera according to an embodiment of the present invention.
Detailed Description
Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are illustrative and intended to be illustrative of the invention and are not to be construed as limiting the invention.
The three-dimensional reconstruction method of dense texture coordinates of a human hand based on a single-viewpoint RGB camera according to the embodiment of the invention is described below with reference to the attached drawings.
Fig. 1 is a flowchart of a three-dimensional reconstruction method of dense texture coordinates of a human hand based on a single-viewpoint RGB camera according to an embodiment of the present invention.
As shown in fig. 1, the three-dimensional reconstruction method of dense texture coordinates of a human hand based on a single-viewpoint RGB camera includes the following steps:
in step S1, a hand image is acquired, and a three-dimensional model of the hand is created from the acquired hand image.
Further, in an embodiment of the present invention, the step S1 further includes:
acquiring a hand image through a color camera;
expanding three-dimensional textures of the hand image to obtain one-to-one correspondence between partial pixels of the hand image and surface points of the hand;
respectively storing three-dimensional coordinates corresponding to the surface points of the human hand through corresponding relations;
and recovering a hand three-dimensional model of the hand image according to the three-dimensional coordinates, wherein the hand three-dimensional model is a parameterized expression template, and the posture and the driving of the hand are changed by changing parameters.
For example, a hand is stored by a 3-channel color picture, part of pixels of the picture have one-to-one correspondence with surface points of the hand, the correspondence can be obtained by means of texture expansion and the like, three channels of the picture respectively store three-dimensional coordinates of the surface points of the hand, and a three-dimensional model of the hand can be recovered from the picture.
In step S2, the three-dimensional human hand model is recognized by a deep learning algorithm to obtain dense texture coordinates of the human hand.
Further, step S2 further includes: identifying the hand position from the hand image by using a target detection algorithm, and identifying and extracting a hand image of which the hand occupies most parts;
detecting the human hand diagrams, and uniformly converting all the human hand diagrams into left hands or right hands by turning left and right;
rendering a synthetic data set of the human hand by using the human hand three-dimensional model and a human hand texture mapping corresponding to the human hand three-dimensional model, and taking the synthetic data set as a pre-training data set;
preprocessing a human hand detection neural network by utilizing a pre-training data set to obtain an estimated value of human hand parameters and an estimated value of texture coordinates;
and inputting the hand subgraph into the trained hand neural network to obtain the dense texture coordinates of the hand.
The human hand detection neural network comprises an encoder, a parameter decoder and a texture coordinate decoder, a human hand graph is encoded into hidden variables through the encoder, the hidden variables are respectively input into the parameter decoder and the texture coordinate decoder, and an estimated value of a human hand parameter and an estimated value of a texture coordinate are output.
In step S3, a three-dimensional model of the human hand is reconstructed from the dense texture coordinates of the human hand.
The three-dimensional reconstruction method of dense texture coordinates of a human hand based on a single-viewpoint RGB camera is described in detail below with reference to specific examples.
Step 1, collecting a single-view angle RGB picture containing a human hand by using a common color camera, such as a mobile phone camera, a digital camera, a single lens reflex camera and the like.
Step 2, using a parameterized hand model, wherein the posture and shape of the hand can be changed by changing parameters;
step 3, unfolding the corresponding relation of the picture and the texture coordinate based on the texture of the human hand model;
and 4, recognizing the position of the hand from the color picture containing the hand by using a target detection algorithm, and extracting a subgraph of the majority of the hand.
And 5, detecting the subgraphs, and uniformly converting all the subgraphs into a left hand or a right hand by turning the images left and right.
And 6, rendering a synthetic data set of the human hand by using the parameterized human hand model and the corresponding human hand texture mapping, and pasting a background by using a common landscape picture. Thereby obtaining a pre-training data set.
Step 7, pre-training the human hand detection neural network by using a pre-training data set, wherein the network consists of three parts: an encoder, a parameter decoder, a texture coordinate decoder. The network input is a human hand subgraph, the human hand subgraph is coded into an implicit variable through a coder, and then the implicit variable is respectively input into a parameter decoder and a texture coordinate decoder to obtain the estimation of the human hand parameters and the estimation of the texture coordinates.
And 8, further training the pre-trained human hand detection neural network by utilizing a human hand data set which is really acquired or a rendered and synthesized human hand data set so as to improve the generalization capability of the network on real data. Meanwhile, an error function is constructed between the parameter estimation and the texture coordinate estimation, so that the texture coordinate output is more in line with the shape of a human hand.
And 9, inputting the human hand sub-picture obtained in the step 5 into the human hand detection neural network obtained in the step S8 to obtain a texture coordinate graph of the human hand.
And step 10, restoring the human hand expressed by the texture coordinates into a three-dimensional model, thereby reconstructing the three-dimensional human hand.
In summary, according to the three-dimensional reconstruction method for dense texture coordinates of a human hand based on a single-viewpoint RGB camera provided by the embodiment of the present invention, firstly, the texture unfolding mapping of the human hand model needs to be obtained, the human hand model can be simply and manually unfolded by using various existing three-dimensional modeling software, and due to the smoothness of the neural network, an unfolding mode with few cutting functions, such as unfolding the human hand into the front and back sides, is generally adopted. Secondly, pre-rendering is carried out through modeling software to obtain virtual data, mixed pre-training is carried out on the virtual data and public real data, pre-training of a neural network is carried out, an encoder, a texture decoder and a parameter decoder of the network need to be trained respectively, and then combined training is carried out. The method comprises the steps of collecting a hand picture through a camera, obtaining a hand area and a mark indicating whether the hand belongs to the left hand or the right hand by using a target detection algorithm, cutting the picture into a subgraph mainly comprising the hand part, uniformly converting the hand picture into the left hand or the right hand according to the mark, and converting the left hand picture into the right hand picture in a left-right mirror image overturning mode. Inputting the obtained human hand sub-image set into a network, and obtaining a texture coordinate image of the sub-image human hand through an encoder and a texture decoder. The texture coordinate picture is converted into a three-dimensional point cloud and connected to a patch, so that a three-dimensional hand model is reconstructed, the three-dimensional shape of the hand is visually expressed, and the characteristics of a neural network can be better met.
Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present invention, "a plurality" means at least two, e.g., two, three, etc., unless specifically limited otherwise.
In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.
Although embodiments of the present invention have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present invention, and that variations, modifications, substitutions and alterations can be made to the above embodiments by those of ordinary skill in the art within the scope of the present invention.

Claims (6)

1. A three-dimensional reconstruction method for dense texture coordinates of a human hand based on a single-viewpoint RGB camera is characterized by comprising the following steps:
acquiring a hand image, and establishing a three-dimensional hand model according to the acquired hand image;
identifying the three-dimensional model of the hand by a deep learning algorithm to obtain dense texture coordinates of the hand; and
and reconstructing a human hand three-dimensional model according to the human hand dense texture coordinates.
2. The method for three-dimensional reconstruction of dense texture coordinates of a human hand based on a single-viewpoint RGB camera as claimed in claim 1, wherein the acquiring of the human hand image and the establishment of the three-dimensional model of the human hand according to the acquired human hand image further comprises:
acquiring a hand image through a color camera;
expanding three-dimensional textures of the hand image to obtain one-to-one correspondence between partial pixels of the hand image and surface points of the hand;
respectively storing three-dimensional coordinates corresponding to the surface points of the human hand through the corresponding relation;
and recovering the hand three-dimensional model of the hand image according to the three-dimensional coordinates.
3. The single-viewpoint RGB camera-based human hand dense texture coordinate three-dimensional reconstruction method as claimed in claim 2, wherein the human hand three-dimensional model is a parameterized expression template, and the posture and driving of the human hand are changed by changing parameters.
4. The human hand dense texture coordinate three-dimensional reconstruction method based on the single-viewpoint RGB camera as claimed in claim 1, wherein the recognizing the human hand three-dimensional model through a deep learning algorithm to obtain the human hand dense texture coordinate further comprises:
identifying the hand position from the hand image by using a target detection algorithm, and identifying and extracting a hand image of which the hand occupies most parts;
detecting the hand diagrams, and uniformly converting all the hand diagrams into left hands or right hands by turning left and right;
rendering a synthetic data set of the human hand by using the human hand three-dimensional model and a human hand texture map corresponding to the human hand three-dimensional model, and taking the synthetic data set as a pre-training data set;
preprocessing the human hand detection neural network by using the pre-training data set to obtain an estimated value of human hand parameters and an estimated value of texture coordinates;
and inputting the hand subgraph into a trained hand neural network to obtain the dense texture coordinates of the hand.
5. The single-viewpoint RGB camera-based human hand dense texture coordinate three-dimensional reconstruction method according to claim 4, wherein the human hand detection neural network comprises an encoder, a parameter decoder and a texture coordinate decoder.
6. The human hand dense texture coordinate three-dimensional reconstruction method based on the single-viewpoint RGB camera as claimed in claim 5, wherein the human hand subgraph is encoded into hidden variables by the encoder, the hidden variables are respectively input into a parameter decoder and a texture coordinate decoder, and the estimated values of the human hand parameters and the texture coordinates are output.
CN202010211923.7A 2020-03-24 2020-03-24 Three-dimensional reconstruction method for dense texture coordinates of human hand based on single-viewpoint RGB camera Withdrawn CN111476899A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010211923.7A CN111476899A (en) 2020-03-24 2020-03-24 Three-dimensional reconstruction method for dense texture coordinates of human hand based on single-viewpoint RGB camera

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010211923.7A CN111476899A (en) 2020-03-24 2020-03-24 Three-dimensional reconstruction method for dense texture coordinates of human hand based on single-viewpoint RGB camera

Publications (1)

Publication Number Publication Date
CN111476899A true CN111476899A (en) 2020-07-31

Family

ID=71748361

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010211923.7A Withdrawn CN111476899A (en) 2020-03-24 2020-03-24 Three-dimensional reconstruction method for dense texture coordinates of human hand based on single-viewpoint RGB camera

Country Status (1)

Country Link
CN (1) CN111476899A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114078152A (en) * 2020-08-20 2022-02-22 北京瓦特曼科技有限公司 Robot carbon block cleaning method based on three-dimensional reconstruction
CN117152397A (en) * 2023-10-26 2023-12-01 慧医谷中医药科技(天津)股份有限公司 Three-dimensional face imaging method and system based on thermal imaging projection

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114078152A (en) * 2020-08-20 2022-02-22 北京瓦特曼科技有限公司 Robot carbon block cleaning method based on three-dimensional reconstruction
CN114078152B (en) * 2020-08-20 2023-05-02 北京瓦特曼科技有限公司 Robot carbon block cleaning method based on three-dimensional reconstruction
CN117152397A (en) * 2023-10-26 2023-12-01 慧医谷中医药科技(天津)股份有限公司 Three-dimensional face imaging method and system based on thermal imaging projection
CN117152397B (en) * 2023-10-26 2024-01-26 慧医谷中医药科技(天津)股份有限公司 Three-dimensional face imaging method and system based on thermal imaging projection

Similar Documents

Publication Publication Date Title
CN111243093B (en) Three-dimensional face grid generation method, device, equipment and storage medium
CN109636831B (en) Method for estimating three-dimensional human body posture and hand information
Jafarian et al. Learning high fidelity depths of dressed humans by watching social media dance videos
CN110335343B (en) Human body three-dimensional reconstruction method and device based on RGBD single-view-angle image
CN110148217A (en) A kind of real-time three-dimensional method for reconstructing, device and equipment
CN112258387A (en) Image conversion system and method for generating cartoon portrait based on face photo
CN112784621B (en) Image display method and device
US11928778B2 (en) Method for human body model reconstruction and reconstruction system
CN101154289A (en) Method for tracing three-dimensional human body movement based on multi-camera
CN113628327A (en) Head three-dimensional reconstruction method and equipment
CN110751730B (en) Dressing human body shape estimation method based on deep neural network
CN110197156B (en) Single-image human hand action and shape reconstruction method and device based on deep learning
CN113421328B (en) Three-dimensional human body virtual reconstruction method and device
CN111382618B (en) Illumination detection method, device, equipment and storage medium for face image
CN113593001A (en) Target object three-dimensional reconstruction method and device, computer equipment and storage medium
CN111476899A (en) Three-dimensional reconstruction method for dense texture coordinates of human hand based on single-viewpoint RGB camera
WO2022060229A1 (en) Systems and methods for generating a skull surface for computer animation
Wang et al. Digital twin: Acquiring high-fidelity 3D avatar from a single image
CN111531546A (en) Robot pose estimation method, device, equipment and storage medium
CN117218246A (en) Training method and device for image generation model, electronic equipment and storage medium
CN107341476A (en) A kind of unsupervised manikin construction method based on system-computed principle
CN113763536A (en) Three-dimensional reconstruction method based on RGB image
CN116797713A (en) Three-dimensional reconstruction method and terminal equipment
CN115880766A (en) Method and device for training posture migration and posture migration models and storage medium
Jian et al. Realistic face animation generation from videos

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication
WW01 Invention patent application withdrawn after publication

Application publication date: 20200731