CN111489379B - Method and system for estimating hand posture by introducing kinematic constraint 3D network - Google Patents

Method and system for estimating hand posture by introducing kinematic constraint 3D network Download PDF

Info

Publication number
CN111489379B
CN111489379B CN202010597038.7A CN202010597038A CN111489379B CN 111489379 B CN111489379 B CN 111489379B CN 202010597038 A CN202010597038 A CN 202010597038A CN 111489379 B CN111489379 B CN 111489379B
Authority
CN
China
Prior art keywords
heatmap
length
hand
predicted
joint
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010597038.7A
Other languages
Chinese (zh)
Other versions
CN111489379A (en
Inventor
张一帆
孙琳晖
冷聪
卢汉清
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhongke Nanjing Artificial Intelligence Innovation Research Institute
Institute of Automation of Chinese Academy of Science
Original Assignee
Nanjing Artificial Intelligence Chip Innovation Institute Institute Of Automation Chinese Academy Of Sciences
Institute of Automation of Chinese Academy of Science
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Artificial Intelligence Chip Innovation Institute Institute Of Automation Chinese Academy Of Sciences, Institute of Automation of Chinese Academy of Science filed Critical Nanjing Artificial Intelligence Chip Innovation Institute Institute Of Automation Chinese Academy Of Sciences
Priority to CN202010597038.7A priority Critical patent/CN111489379B/en
Publication of CN111489379A publication Critical patent/CN111489379A/en
Application granted granted Critical
Publication of CN111489379B publication Critical patent/CN111489379B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • G06V40/28Recognition of hand or arm movements, e.g. recognition of deaf sign language
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10028Range image; Depth image; 3D point clouds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Psychiatry (AREA)
  • Social Psychology (AREA)
  • Human Computer Interaction (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a method and a system for estimating hand postures by introducing a 3D network of kinematic constraints, which comprises the following steps: step 1, converting a hand area positioned in an original depth map into voxelized input; step 2, introducing a kinematic constraint 3D hand posture estimation network; and 3, evaluating the accuracy of the predicted joint point positions and evaluating the reasonability of the hand postures formed by the predicted joint points. The network takes a voxelized hand area as an input and predicts a 3D heatmap representing the joint probability distribution through a 3D convolutional neural network. The position of the maximum value in heatmap is also the position of the joint point; the positions of the joint points are obtained by using the 3D heatmap, and then the length of the bone is calculated according to the corresponding relation between the joint points. The coordinates of each joint point and the corresponding bone length are obtained by processing the 3D heatmap, so that the kinematic constraint can be added to the predicted result by modifying the loss function.

Description

Method and system for estimating hand posture by introducing kinematic constraint 3D network
Technical Field
The invention relates to a method and a system for estimating hand postures by a 3D network introducing kinematic constraints, and relates to the field of general image data processing or generation G06T, in particular to the field of G06T 7/20 motion analysis.
Background
The task of hand posture estimation is divided into two steps, wherein the first step is to locate a hand area from a depth map, and the second step is to predict the coordinates of each joint point from the hand area. The method for positioning the hand area with excellent performance at present comprises two steps, wherein the first step is that based on the characteristic that the depth value is lower as the hand is closer to a camera in a depth map, a rough hand positioning result is obtained by setting a depth threshold value on the assumption that the hand is the object closest to the camera; the second step is to input the rough hand positioning area into the hand area positioning network (Com-rainnet) shown in fig. 1, predict the coordinates of the palm of the hand, and then obtain the final hand positioning result according to the preset hand size.
The hand is a structural object, and the hand gesture formed by connecting all joint points through bones needs to meet physical constraints. Therefore, in the hand posture estimation task, in addition to the accuracy of the predicted joint point position, it is necessary to consider whether the hand posture formed by the predicted joint point is reasonable.
Disclosure of Invention
The purpose of the invention is as follows: an object is to provide a method for hand pose estimation using a 3D network incorporating kinematic constraints, so as to solve the above problems in the prior art. A further object is to propose a system implementing the above method.
The technical scheme is as follows: a method for hand pose estimation by a 3D network introducing kinematic constraints comprises the following steps:
step 1, converting a hand area positioned in an original depth map into voxelized input;
step 2, introducing a kinematic constraint 3D hand posture estimation network;
and 3, evaluating the accuracy of the predicted joint point positions and evaluating the reasonability of the hand postures formed by the predicted joint points.
In a further embodiment, the step 1 further comprises:
step 1-1, positioning a hand region from a depth map by using a hand region positioning network;
step 1-2, projecting the hand area positioned in the depth map to a 3D space;
step 1-3, discretizing data projected to a 3D space according to a preset voxel size;
step 1-4, judging the value of each position of the voxel according to whether the position is covered by discrete points or not
Figure 39443DEST_PATH_IMAGE001
The value is set to 1 when the data is covered, and is set to be not coveredIs set to 0.
In a further embodiment, the step 2 further comprises:
the network takes a voxelized hand area as input, and 3D heatmap representing joint point probability distribution is predicted through a 3D convolutional neural network; the coordinates of each joint point and the corresponding bone length are obtained by processing the 3D heatmap, so that the kinematic constraint can be added to the predicted result by modifying the loss function.
In a further embodiment, adding kinematic constraints to the predicted result requires processing a 3D heatmap, where the 3D heatmap predicted by the network represents the probability distribution of a single joint point, and the position of the maximum value in the heatmap is the position of the joint point; obtaining the positions of the joint points by using the 3D heatmap, and then calculating the length of the skeleton according to the corresponding relation between the joint points; a Soft-argmax function is proposed to obtain the coordinates of the joint points from the 3D heatmap in a derivable way:
Figure 150619DEST_PATH_IMAGE002
wherein X represents a size of
Figure 751846DEST_PATH_IMAGE004
3D heatmap of size;
Figure 641173DEST_PATH_IMAGE005
representative is the soft-max function;
Figure 674988DEST_PATH_IMAGE006
the position of the 3D heatmap maximum is represented.
In a further embodiment, kinematic constraints are added to the prediction results after obtaining the coordinates of each joint point corresponding to the 3D heatmap:
step 3-1, setting a standard range for each bone length according to training data, namely the maximum length and the minimum length of the bone;
step 3-2, comparing the skeleton length obtained from the 3D heatmap with a set standard range, and punishing the skeleton length when the skeleton length is higher than the maximum length or lower than the minimum length, so as to add kinematic constraint to the prediction result, wherein the loss function of the hand posture estimation network added with the kinematic constraint is as follows:
Figure 104701DEST_PATH_IMAGE007
wherein the content of the first and second substances,Lis an overall loss function, comprising a total of three parts,
Figure 907572DEST_PATH_IMAGE008
respectively representing the constraint of 3D heatmap, the constraint that the length of the skeleton exceeds the maximum length, and the constraint that the length of the skeleton is lower than the minimum length;Nthe number of joint points is represented by,N-1the number of bones is represented as a function of time,
Figure 233380DEST_PATH_IMAGE009
and
Figure 678793DEST_PATH_IMAGE010
respectively representing the predicted and true 3D heatmap,
Figure 866191DEST_PATH_IMAGE011
is the bone length calculated from the predicted joint coordinates,
Figure 772836DEST_PATH_IMAGE012
and
Figure 20278DEST_PATH_IMAGE013
representing preset longest and shortest bone lengths,
Figure 277953DEST_PATH_IMAGE014
the weights of the individual components of the loss function are represented.
A system for hand pose estimation incorporating a kinematically constrained 3D network, comprising a first module for converting hand regions in an original depth map into voxelized input; a second module for introducing a kinematically constrained 3D hand pose estimation network; and a third module for evaluating the accuracy of the predicted joint positions and evaluating the rationality of the hand gesture formed by the predicted joint positions.
In a further embodiment, the first module further locates a hand region from the depth map using a hand region location network and projects the hand region in the depth map into 3D space; carrying out discretization processing on the data projected to the 3D space according to a preset voxel size; determining the value of each position of the voxel according to whether the position is covered by discrete points
Figure 65780DEST_PATH_IMAGE015
The value is set to 1 when the data is covered, and is set to 0 when the data is not covered.
The second module is further used for obtaining a prediction result of the joint point by using a hand posture estimation network introducing kinematic constraint and taking a voxelized hand area as input; the network takes a voxelized hand area as input, and 3D heatmap representing joint point probability distribution is predicted through a 3D convolutional neural network; the coordinates of each joint point and the corresponding bone length are obtained by processing the 3D heatmap, so that the kinematic constraint can be added to the predicted result by modifying the loss function.
The second module is further used for adding kinematic constraint to the predicted result to process the 3D heatmap, the 3D heatmap predicted by the network represents the probability distribution of a single joint point, and the position of the maximum value in the heatmap is the position of the joint point; obtaining the positions of the joint points by using the 3D heatmap, and then calculating the length of the skeleton according to the corresponding relation between the joint points; a Soft-argmax function is proposed to obtain the coordinates of the joint points from the 3D heatmap in a derivable way:
Figure 499036DEST_PATH_IMAGE016
wherein X represents a size of
Figure 898137DEST_PATH_IMAGE004
3D heatmap of size;
Figure 128262DEST_PATH_IMAGE005
representative is the soft-max function;
Figure 172310DEST_PATH_IMAGE006
the position of the 3D heatmap maximum is represented.
The third module further sets a standard range for each bone length according to the training data, namely the maximum length and the minimum length of the bone; comparing the skeleton length obtained from the 3D heatmap with a set standard range, and penalizing the skeleton length higher than the maximum length or lower than the minimum length so as to add kinematic constraint to the prediction result, wherein the loss function of the hand posture estimation network added with the kinematic constraint is as follows:
Figure 804280DEST_PATH_IMAGE017
wherein the content of the first and second substances,Lis an overall loss function, comprising a total of three parts,
Figure 439529DEST_PATH_IMAGE008
respectively representing the constraint of 3D heatmap, the constraint that the length of the skeleton exceeds the maximum length, and the constraint that the length of the skeleton is lower than the minimum length;Nthe number of joint points is represented by,N-1the number of bones is represented as a function of time,
Figure 422529DEST_PATH_IMAGE009
and
Figure 4689DEST_PATH_IMAGE010
respectively representing the predicted and true 3D heatmap,
Figure 491165DEST_PATH_IMAGE011
is the bone length calculated from the predicted joint coordinates,
Figure 503507DEST_PATH_IMAGE012
and
Figure 708224DEST_PATH_IMAGE013
representing preset longest and shortest bone lengths,
Figure 828496DEST_PATH_IMAGE014
the weights of the individual components of the loss function are represented.
Has the advantages that: the invention provides a method and a system for estimating hand postures by a 3D network introducing kinematic constraints, wherein the hand posture estimation network introducing the kinematic constraints takes a voxelized hand area as input to obtain a prediction result of a joint point; the network takes a voxelized hand area as input, and 3D heatmap representing joint point probability distribution is predicted through a 3D convolutional neural network; the coordinates of each joint point and the corresponding bone length are obtained by processing the 3D heatmap, so that the kinematic constraint can be added to the predicted result by modifying the loss function. Obtaining the positions of the joint points by using the 3Dheatmap, and then calculating the length of the skeleton according to the corresponding relation between the joint points; a Soft-argmax function is proposed to obtain the coordinates of the joint points from the 3D heatmap in a derivable way. Through the operation, the invention can better judge the rationality of the bone length of the prediction result and the rationality of the predicted hand posture.
Drawings
FIG. 1 is a schematic diagram of a hand area location network Com-RefineNet according to the present invention.
FIG. 2 is a view of a hand region after voxelization in accordance with the present invention.
FIG. 3 is a schematic view of the rational and unreasonable hand gestures of the present invention.
FIG. 4 is a schematic diagram of the overall prediction structure according to the present invention.
FIG. 5 is a schematic diagram of a hand pose estimation network incorporating kinematic constraints according to the present invention.
FIG. 6 is a schematic diagram of a distribution diagram of the joint points according to the present invention.
FIG. 7 is a diagram illustrating different hand gesture predictions according to the present invention.
Detailed Description
The task of hand posture estimation is divided into two steps, wherein the first step is to locate a hand area from a depth map, and the second step is to predict the coordinates of each joint point from the hand area. The method for positioning the hand area with excellent performance at present comprises two steps, wherein the first step is that based on the characteristic that the depth value is lower as the hand is closer to a camera in a depth map, a rough hand positioning result is obtained by setting a depth threshold value on the assumption that the hand is the object closest to the camera; the second step is to input the rough hand positioning area into the hand area positioning network (Com-rainnet) shown in fig. 1, predict the coordinates of the palm of the hand, and then obtain the final hand positioning result according to the preset hand size.
The test index adopted by the Com-RefineNet network is the predicted Euclidean distance average value of the palm center position and the real position.
The traditional convolution neural network-based method for joint point prediction is to input a depth map as 2D data into a 2D convolution neural network and directly regress the positions of joint points. However, the depth map actually represents 2.5D data, which is directly regarded as 2D data and processed by using a 2D convolutional neural network, and key information in the depth map cannot be well extracted. In addition, directly regressing the position of the joint point with the depth map as an input is a highly nonlinear process, which increases the difficulty of learning the network.
The applicant thinks that, aiming at the problems existing in the traditional hand posture estimation method, the current effective mode is to adopt a voxelized input form, use a 3D convolutional neural network to extract the characteristics, predict a 3D heatmap representing the joint point position probability distribution, and enable the joint point prediction to achieve higher precision. Converting the hand region in the original depth map into a voxelized input requires four steps: firstly, a hand region in a depth map is projected to a 3D space; secondly, discretizing the data projected to the 3D space according to the preset voxel size; third step, bodyValue of each position of element
Figure 435057DEST_PATH_IMAGE015
Will be set to either 1 or 0 depending on whether the location is covered by a discrete point. The voxelized hand area is shown in fig. 2, where the blue dots indicate that the value of the position is 1.
The hand is a structural object, and the hand gesture formed by connecting all joint points through bones needs to meet physical constraints. Therefore, in the hand posture estimation task, in addition to the accuracy of the predicted joint point position, it is necessary to consider whether the hand posture formed by the predicted joint point is reasonable. As shown in FIG. 3, the joint point position of the thumb of the right drawing is very close to that of the left drawing, but the degree of bending of the thumb of the right drawing is unreasonable. Therefore, skeleton length constraint is added to the predicted hand gesture to ensure the reasonability of the hand gesture.
In this application, we propose a 3D network that introduces kinematic constraints for accurate and reasonable hand pose estimation. The prediction process of the whole network can be divided into two steps, the first step is to locate the hand region from the depth map by using a hand region positioning network (Com-RefineNet) as shown in fig. 1, the second step is to obtain the prediction result of the joint points by using a hand posture estimation network (RVHE) introducing kinematic constraints and taking the voxelized hand region as input, and the overall structure diagram is shown in fig. 4.
The RVHE network adopts the test index which is the average value of the Euclidean distance between the predicted position of each joint point and the real position.
The hand posture estimation network structure introduced with the kinematic constraint is consistent with the structure in the article, the whole flow is shown in fig. 5, the network takes a voxelized hand area as input, and 3D heatmap representing joint point probability distribution is predicted through a 3D convolutional neural network. By processing the 3D heatmap, the coordinates of each joint point and the corresponding bone length can be obtained, so that the kinematic constraint can be added to the predicted result by modifying the loss function.
Adding kinematic constraints to the predicted result requires processing of a 3D heatmap, where the 3D heatmap predicted by the network represents the probability distribution of a single joint point, and the position of the maximum value in the heatmap is the position of the joint point. The length of the bone can be calculated by obtaining the positions of the joint points by using the 3D heatmap and then according to the corresponding relation between the joint points. But directly using argmax function to obtain coordinates is an irreducible way, which affects the end-to-end back propagation chain during network training, so we propose a Soft-argmax function to obtain the coordinates of the joint point from 3D heatmap in an derivable way.
The Soft-argmax function proposes the premise that for a sufficiently sharp profile, the position of the maximum may be approximately equivalent to the expectation of this profile. Considering that one 3D heatmap corresponds to one joint point, the distribution of each heatmap is close to the leptoprotic distribution, and after passing the value of the heatmap through soft-max, the distribution of each heatmap is sharper. The Soft-argmax function can thus be used to obtain the coordinates of the joint points in each heatmap in a derivable way. The Soft-argmax function is shown in equation (1):
Figure 880951DEST_PATH_IMAGE018
(1)
wherein X represents a size of
Figure 228756DEST_PATH_IMAGE019
3D heatmap of size;
Figure 903451DEST_PATH_IMAGE005
representative is the soft-max function;
Figure 431032DEST_PATH_IMAGE006
the position of the 3D heatmap maximum is represented.
Through the operation of formula (1), we can obtain the coordinates of each joint point corresponding to the 3D heatmap, and then add kinematic constraint to the prediction result. Firstly, a standard range is set for each bone length according to training data, namely the maximum length and the minimum length of the bone; the bone length from the 3D heatmap is then compared to a set standard range, and either above the maximum length or below the minimum length is penalized, adding kinematic constraints to the prediction. The loss function of the hand pose estimation network with added kinematic constraints is shown in equation (2).
Figure 532980DEST_PATH_IMAGE020
(2)
Wherein the content of the first and second substances,Lis an overall loss function, comprising a total of three parts,
Figure 961556DEST_PATH_IMAGE008
respectively representing the constraint of 3D heatmap, the constraint that the length of the skeleton exceeds the maximum length, and the constraint that the length of the skeleton is lower than the minimum length;Nthe number of joint points is represented by,N-1the number of bones is represented as a function of time,
Figure 174363DEST_PATH_IMAGE009
and
Figure 739206DEST_PATH_IMAGE010
respectively representing the predicted and true 3D heatmap,
Figure 277634DEST_PATH_IMAGE011
is the bone length calculated from the predicted joint coordinates,
Figure 927927DEST_PATH_IMAGE012
and
Figure 210004DEST_PATH_IMAGE013
representing preset longest and shortest bone lengths,
Figure 897862DEST_PATH_IMAGE014
the weights of the individual components of the loss function are represented.
The existing gesture data sets with higher quality are three, namely an NYU data set, an ICVL data set and an MSRA data set. The NYU data set is used for training and testing because the NYU data set comprises most gestures, most marked joint points and most accurate annotation information. The network is trained to predict the positions of 14 joint points in total, including the palm center position, two wrist joint points, the carpometacarpal joint of the thumb, the metacarpophalangeal joints of five fingers and the finger tip, and the joint distribution diagram is shown in fig. 6.
To test the performance of the network, a total of three experiments were performed in the NYU test set.
(1) Hand region correction
In order to test the correction capability of Com-RefineNet on the palm center position, firstly, a rough hand positioning mode (Com) is utilized to obtain rough coordinates of the palm center and calculate a positioning error; and then sending the rough hand area to Com-RefineNet to obtain the corrected palm center position, and calculating the positioning error, wherein two calculation results can be shown in table 1, wherein the group route represents the real palm center position.
TABLE 1 errors of different positioning methods
Figure 341613DEST_PATH_IMAGE021
As can be seen from Table 1, using Com-RefineNet can provide more accurate hand region positioning results for subsequent predicted networks.
(2) Peeling test
To further verify the Com-RefineNet and the added physical constraints as the improvement of V2V-PoseNet, a peeling experiment was performed to propose five different prediction structures. The first is the infrastructure, V2V-PoseNet, which uses coarse hand positioning and does not add physical constraints. The second structure is that on the basis of the first structure, the skeleton length constraint is added to V2V-PoseNet. The third structure is that the rough hand positioning is replaced by Com-refine net on the basis of the first structure. The fourth structure both replaces the coarse localization by Com-reflonenet and adds a physical constraint, RVHE. The fifth structure is to take the hand area with the true palm position and add physical constraints to the prediction network. The average prediction error for all the joints of the five structures described above is calculated here and the results are shown in table 2.
TABLE 2 prediction error for different configurations
Figure 416885DEST_PATH_IMAGE022
In table 2, V2V represents the joint location prediction network without added physical constraints, Com represents the coarse hand position, ComRN represents the use of Com-rainnet, Cs is the added physical constraints, and Gt is the true hand area position. As can be seen from table 2, the prediction error of the first structure is up to 20 mm. Comparing the method using rough positioning with Com-reflinenet, i.e. the first structure with the third structure and the second structure with the fourth structure, it can be seen that the error is reduced by 6mm and 5mm, respectively, using the modified hand area; comparing the methods without and with physical constraints, i.e. the first structure compared to the second structure and the third structure compared to the fourth structure, it can be seen that the addition of physical constraints reduces the error by 1mm and 0.7mm, respectively. The fifth configuration achieved the best prediction error of 9.23mm using hand region positioning without bias. Therefore, accurate hand region positioning is very important for final prediction of the network, and a prediction result can be more reasonable and accurate by adding certain physical constraints on the basis of accurately positioning the hand region.
(3) Comprehensive experiment
Connecting Com-RefineNet and V2V-PoseNet added with physical constraints in series, namely RVHE, and comparing with other advanced methods in the gesture attitude estimation field on two test indexes. A total of five methods were selected. Respectively, DeepPrior + +, FeedBack, REN, Deepmodel.
The average prediction error for all joints was obtained by testing on the NYU data set (see table 3).
TABLE 3 positioning error for different methods
Figure DEST_PATH_IMAGE023
It can be seen from table 3 that compared with other advanced methods, the method has reached a more accurate level, reduced by nearly 6mm compared with deepPrior, and has a similar accuracy to REN prediction.
(4) Qualitative analysis
The output of the RVHE is translated into the position of the joint point and plotted on a depth map with the predicted joint indicated in red.
The node positions and the bones are connected, the green represents the real result, and partial prediction results can be shown in FIG. 7. It can be seen that RVHE can provide accurate hand pose estimation results.

Claims (5)

1. A method for estimating hand postures by a 3D network introducing kinematic constraints is characterized by comprising the following steps:
step 1, converting a hand area positioned in an original depth map into voxelized input;
step 1-1, positioning a hand region from a depth map by using a hand region positioning network;
step 1-2, projecting the hand area positioned in the depth map to a 3D space;
step 1-3, discretizing data projected to a 3D space according to a preset voxel size;
step 1-4, according to each position of voxel
Figure DEST_PATH_IMAGE001
Judging whether the voxel is covered by discrete points or not
Figure 983156DEST_PATH_IMAGE002
The value is set to 1 when the data is covered, and is set to 0 when the data is not covered;
step 2, introducing a kinematic constraint 3D hand posture estimation network; the network takes a voxelized hand area as input, and 3D heatmap representing joint point probability distribution is predicted through a 3D convolutional neural network; the coordinates of each joint point and the corresponding bone length are obtained by processing the 3D heatmap, so that the kinematic constraint can be added to the predicted result by utilizing a mode of modifying a loss function; adding kinematic constraint to the predicted result requires processing the 3D heatmap, the 3D heatmap predicted by the network represents the probability distribution of a single joint point, and the position of the maximum value in the heatmap is the position of the joint point; obtaining the positions of the joint points by using the 3D heatmap, and then calculating the length of the skeleton according to the corresponding relation between the joint points; a Soft-argmax function is proposed to obtain the coordinates of the joint points from the 3D heatmap in a derivable way:
Figure 766304DEST_PATH_IMAGE004
wherein X represents a size of
Figure DEST_PATH_IMAGE005
3D heatmap of size;
Figure 462864DEST_PATH_IMAGE006
representative is the soft-max function;
Figure DEST_PATH_IMAGE007
represents the position of the maximum value of the 3D heatmap;
step 3, evaluating the accuracy of the predicted joint point positions and evaluating the reasonability of the hand postures formed by the predicted joint points; and (3) adding kinematic constraint to the prediction result after obtaining the coordinates of each joint point corresponding to the 3D heatmap:
step 3-1, setting a standard range for each bone length according to training data, namely the maximum length and the minimum length of the bone;
step 3-2, comparing the skeleton length obtained from the 3D heatmap with a set standard range, and punishing the skeleton length when the skeleton length is higher than the maximum length or lower than the minimum length, so as to add kinematic constraint to the prediction result, wherein the loss function of the hand posture estimation network added with the kinematic constraint is as follows:
Figure DEST_PATH_IMAGE009
wherein the content of the first and second substances,Lis an overall loss function, comprising a total of three parts,
Figure 687697DEST_PATH_IMAGE010
respectively representing the constraint of 3D heatmap, the constraint that the length of the skeleton exceeds the maximum length, and the constraint that the length of the skeleton is lower than the minimum length;Nthe number of joint points is represented by,N-1the number of bones is represented as a function of time,
Figure DEST_PATH_IMAGE011
and
Figure 192628DEST_PATH_IMAGE012
respectively representing the predicted and true 3D heatmap,
Figure DEST_PATH_IMAGE013
is the bone length calculated from the predicted joint coordinates,
Figure 615519DEST_PATH_IMAGE014
and
Figure DEST_PATH_IMAGE015
representing preset longest and shortest bone lengths,
Figure 64955DEST_PATH_IMAGE016
the weights of the individual components of the loss function are represented.
2. A system for implementing the method of claim 1, comprising the following modules:
a first module for converting hand regions located in an original depth map into voxelized input;
a second module for introducing a kinematically constrained 3D hand pose estimation network;
and a third module for evaluating the accuracy of the predicted joint positions and evaluating the reasonableness of the hand gesture formed by the predicted joint positions.
3. The system of claim 2, wherein: the first module further locates a hand region from the depth map using a hand region location network; projecting the hand region located in the depth map to a 3D space; carrying out discretization processing on the data projected to the 3D space according to a preset voxel size; according to each position of voxel
Figure 293811DEST_PATH_IMAGE001
Judging whether the voxel is covered by discrete points or not
Figure 653248DEST_PATH_IMAGE002
The value is set to 1 when the data is covered, and is set to 0 when the data is not covered;
the second module is further used for obtaining a prediction result of the joint point by using a hand posture estimation network introducing kinematic constraint and taking a voxelized hand area as input; the network takes a voxelized hand area as input, and 3D heatmap representing joint point probability distribution is predicted through a 3D convolutional neural network; the coordinates of each joint point and the corresponding bone length are obtained by processing the 3D heatmap, so that the kinematic constraint can be added to the predicted result by modifying the loss function.
4. The system of claim 2, wherein: the second module is further used for adding kinematic constraint to the predicted result to process the 3D heatmap, the 3D heatmap predicted by the network represents the probability distribution of a single joint point, and the position of the maximum value in the heatmap is the position of the joint point; obtaining the positions of the joint points by using the 3D heatmap, and then calculating the length of the skeleton according to the corresponding relation between the joint points; a Soft-argmax function is proposed to obtain the coordinates of the joint points from the 3D heatmap in a derivable way:
Figure 43778DEST_PATH_IMAGE018
wherein X represents a size of
Figure 121455DEST_PATH_IMAGE005
3D heatmap of size;
Figure 888423DEST_PATH_IMAGE006
representative is the soft-max function;
Figure 367946DEST_PATH_IMAGE007
the position of the 3D heatmap maximum is represented.
5. The system of claim 2, wherein: the third module further sets a standard range for each bone length according to the training data, namely the maximum length and the minimum length of the bone; comparing the skeleton length obtained from the 3D heatmap with a set standard range, and penalizing the skeleton length higher than the maximum length or lower than the minimum length so as to add kinematic constraint to the prediction result, wherein the loss function of the hand posture estimation network added with the kinematic constraint is as follows:
Figure 70323DEST_PATH_IMAGE020
wherein the content of the first and second substances,Lis an overall loss function, comprising a total of three parts,
Figure 494351DEST_PATH_IMAGE010
respectively representing the constraint of 3D heatmap, the constraint that the length of the skeleton exceeds the maximum length, and the constraint that the length of the skeleton is lower than the minimum length;Nthe number of joint points is represented by,N-1the number of bones is represented as a function of time,
Figure 940376DEST_PATH_IMAGE011
and
Figure 401968DEST_PATH_IMAGE012
respectively representing the predicted and true 3D heatmap,
Figure 9667DEST_PATH_IMAGE013
is the bone length calculated from the predicted joint coordinates,
Figure 452150DEST_PATH_IMAGE014
and
Figure 170707DEST_PATH_IMAGE015
representing preset longest and shortest bone lengths,
Figure 890402DEST_PATH_IMAGE016
the weights of the individual components of the loss function are represented.
CN202010597038.7A 2020-06-28 2020-06-28 Method and system for estimating hand posture by introducing kinematic constraint 3D network Active CN111489379B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010597038.7A CN111489379B (en) 2020-06-28 2020-06-28 Method and system for estimating hand posture by introducing kinematic constraint 3D network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010597038.7A CN111489379B (en) 2020-06-28 2020-06-28 Method and system for estimating hand posture by introducing kinematic constraint 3D network

Publications (2)

Publication Number Publication Date
CN111489379A CN111489379A (en) 2020-08-04
CN111489379B true CN111489379B (en) 2020-10-02

Family

ID=71813429

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010597038.7A Active CN111489379B (en) 2020-06-28 2020-06-28 Method and system for estimating hand posture by introducing kinematic constraint 3D network

Country Status (1)

Country Link
CN (1) CN111489379B (en)

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10937185B2 (en) * 2018-12-03 2021-03-02 Everseen Limited System and method to detect articulate body pose

Also Published As

Publication number Publication date
CN111489379A (en) 2020-08-04

Similar Documents

Publication Publication Date Title
CN109299284B (en) Knowledge graph representation learning method based on structural information and text description
CN104573665B (en) A kind of continuous action recognition methods based on improvement viterbi algorithm
CN109948522B (en) X-ray hand bone maturity interpretation method based on deep neural network
CN112233723A (en) Protein structure prediction method and system based on deep learning
CN107704432A (en) A kind of adaptive Interactive Multiple-Model method for tracking target of transition probability
CN113128355A (en) Unmanned aerial vehicle image real-time target detection method based on channel pruning
CN111204476B (en) Vision-touch fusion fine operation method based on reinforcement learning
WO2021051526A1 (en) Multi-view 3d human pose estimation method and related apparatus
CN107590139B (en) Knowledge graph representation learning method based on cyclic matrix translation
CN112464611B (en) Automatic PCB wiring system based on cloud-end collaborative intelligent processing
CN111401151B (en) Accurate three-dimensional hand posture estimation method
CN109783799B (en) Relation extraction method based on semantic dependency graph
CN113191243B (en) Human hand three-dimensional attitude estimation model establishment method based on camera distance and application thereof
CN109711374B (en) Human body bone point identification method and device
CN109241442B (en) Project recommendation method based on predictive value filling, readable storage medium and terminal
CN113435293B (en) Human body posture estimation method based on joint relation
CN111489379B (en) Method and system for estimating hand posture by introducing kinematic constraint 3D network
CN110489616A (en) A kind of search ordering method based on Ranknet and Lambdamart algorithm
CN117350330A (en) Semi-supervised entity alignment method based on hybrid teaching
CN114821420A (en) Time sequence action positioning method based on multi-time resolution temporal semantic aggregation network
CN110942007B (en) Method and device for determining hand skeleton parameters, electronic equipment and storage medium
CN114529739A (en) Image matching method
CN110729024B (en) Protein structure model quality evaluation method based on topological structure similarity
CN115587187A (en) Knowledge graph complementing method based on small sample
Dhondea et al. DFTS2: Simulating Deep Feature Transmission Over Packet Loss Channels

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CP03 Change of name, title or address
CP03 Change of name, title or address

Address after: 211100 floor 3, building 3, Qilin artificial intelligence Industrial Park, 266 Chuangyan Road, Nanjing, Jiangsu

Patentee after: Zhongke Nanjing artificial intelligence Innovation Research Institute

Patentee after: INSTITUTE OF AUTOMATION, CHINESE ACADEMY OF SCIENCES

Address before: 211100 floor 3, building 3, No. 266, Chuangyan Road, Jiangning District, Nanjing City, Jiangsu Province

Patentee before: NANJING ARTIFICIAL INTELLIGENCE CHIP INNOVATION INSTITUTE, INSTITUTE OF AUTOMATION, CHINESE ACADEMY OF SCIENCES

Patentee before: INSTITUTE OF AUTOMATION, CHINESE ACADEMY OF SCIENCES