CN110135340A - 3D hand gestures estimation method based on cloud - Google Patents

3D hand gestures estimation method based on cloud Download PDF

Info

Publication number
CN110135340A
CN110135340A CN201910402435.1A CN201910402435A CN110135340A CN 110135340 A CN110135340 A CN 110135340A CN 201910402435 A CN201910402435 A CN 201910402435A CN 110135340 A CN110135340 A CN 110135340A
Authority
CN
China
Prior art keywords
hand
point
hand gestures
cloud
network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910402435.1A
Other languages
Chinese (zh)
Inventor
邹露
黄章进
张智森
温泉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Science and Technology of China USTC
Original Assignee
University of Science and Technology of China USTC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Science and Technology of China USTC filed Critical University of Science and Technology of China USTC
Priority to CN201910402435.1A priority Critical patent/CN110135340A/en
Publication of CN110135340A publication Critical patent/CN110135340A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/107Static hand or arm
    • G06V40/11Hand-related biometrics; Hand pose recognition

Landscapes

  • Engineering & Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a kind of high-precision 3D hand gestures estimation methods, it include: using deep neural network structure end to end, the hand point cloud data that the capture of depth transducer equipment can directly be handled avoids space waste and computing redundancy caused by depth data is converted to voxel or multi-view image later;A spatial alternation algorithm is innovatively devised, so that this method has rotation, translation invariance to the point cloud data of input;To further decrease global error brought by the error of fingertip location estimation, a fingertip location optimization algorithm is innovatively devised.Therefore, the present invention can capture the hand gestures expression that complicated hand gestures change and estimate an accurate low-dimensional.

Description

3D hand gestures estimation method based on cloud
Technical field
The present invention relates to computer visions and Attitude estimation technical field more particularly to a kind of 3D hand appearance based on cloud State estimation method.
Background technique
In recent years, widely available with depth transducer, the 3D hand gestures estimation based on depth camera achieves aobvious The progress of work.Meanwhile benefiting from the immense success that deep neural network obtains in Computer Vision Task, convolutional neural networks (convolutional neural network, CNN) achieves frightened in the hand gestures estimation task based on depth image The effect of people.However, convolutional neural networks cannot be utilized directly in depth image usually using 2D picture as the input of network 3D information.
Depth image is encoded to 3D point cloud data by Ge et al. proposition, carries out 3D posture to it using 3D convolutional neural networks Estimate (Ge L, Liang H, Yuan J, et al.3d convolutional neural networks for efficient and robust hand pose estimation from single depth images[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:1991- 2000).However, index will be presented with the increase of the scale of point cloud data in the memory space and parameter amount of 3D convolutional neural networks Grade increases, and causes the accuracy rate of such methods and real-time not good enough.At the same time, it due to the sparsity of 3D point cloud, usually wraps Containing a large amount of white space (i.e. no spatial data points), to cause a large amount of space waste.Although Ge et al. will be sparse 3D point cloud be converted into again handling it after dense 3D point cloud, not only increase unnecessary calculation amount, Er Qiegai Become the spatial distribution of original point cloud data, it is undesirable so as to cause accuracy rate.
In addition, Ge et al. also propose based on multiple view convolutional neural networks 3D hand gestures homing method (Ge L, Liang H,Yuan J,et al.Robust 3d hand pose estimation in single depth images: from single-view cnn to multi-view cnns[C]//Proceedings of the IEEE Conference on computer vision and pattern recognition.2016:3593-3601), this method It needs that 3D point cloud is projected as to first the 2D image of different perspectives, then recycle convolution by complicated data prediction Neural network carries out posture recurrence.
Summary of the invention
The object of the present invention is to provide and a kind of 3D hand gestures estimation method based on cloud, calculation amount is small, calculate As a result accuracy rate is high.
The purpose of the present invention is what is be achieved through the following technical solutions:
A kind of 3D hand gestures estimation method based on cloud, comprising:
The depth image of hand is converted into 3D point cloud, and carries out down-sampled processing;
Training spatial alternation network, to treated that 3D point cloud is normalized by down-sampled;
3D point cloud after normalized and the surface normal for putting cloud are input to trained hand gestures recurrence Network is realized the prediction of hand joint point position, and is operated by spatial inverse transform, and the tentative prediction knot of hand gestures is obtained Fruit;
It is modified using tentative prediction result of the finger tip corrective networks to hand gestures, obtains final hand gestures.
As seen from the above technical solution provided by the invention, it directly using point cloud data as input avoids that cloud will be put Spatial redundancy and computing cost caused by data are converted to voxel later;Meanwhile this method is made by spatial alternation network There is rotation, translation invariance to input point cloud data;Fingertip location also finally is corrected by finger tip corrective networks, ties prediction Fruit is more accurate.
Detailed description of the invention
In order to illustrate the technical solution of the embodiments of the present invention more clearly, required use in being described below to embodiment Attached drawing be briefly described, it should be apparent that, drawings in the following description are only some embodiments of the invention, for this For the those of ordinary skill in field, without creative efforts, it can also be obtained according to these attached drawings other Attached drawing.
Fig. 1 is a kind of flow chart of the 3D hand gestures estimation method based on cloud provided in an embodiment of the present invention;
Fig. 2 is the test result provided in an embodiment of the present invention on ICVL, MARA and NYU data set.
Specific embodiment
With reference to the attached drawing in the embodiment of the present invention, technical solution in the embodiment of the present invention carries out clear, complete Ground description, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.Based on this The embodiment of invention, every other implementation obtained by those of ordinary skill in the art without making creative efforts Example, belongs to protection scope of the present invention.
The embodiment of the present invention provides a kind of 3D hand gestures estimation method based on cloud, as shown in Figure 1, it mainly includes Following steps:
The depth image of hand is converted to 3D point cloud, and carries out down-sampled processing by step 1.
In the embodiment of the present invention, the depth image can be depth camera, depth transducer or other relevant devices Acquired image.
After depth image is converted to 3D point cloud, down-sampled is N number of point, and down-sampled treated that 3D point cloud can be expressed asWherein, piIndicate at i-th point, a corresponding 3D coordinate, the numerical value of N can be set according to the actual situation, example Such as, 1024 can be set as.
Step 2, training spatial alternation network, to treated that 3D point cloud is normalized by down-sampled.
In the embodiment of the present invention, paper (Jaderberg M, Simonyan K, Zisserman can be used for reference A.Spatial transformer networks[C]//Advances in neural information processing Systems.2015:2017-2025 the network structure disclosed in).Spatial alternation network passes through affine transformation (affine Transformation), the normalization of point cloud is realized, principle can indicate are as follows:
As shown in above formula,3D coordinate before correspondent transform,3D coordinate after correspondent transform, It is denoted herein as the form of homogeneous coordinates.AθFor affine transformation matrix, the prediction result of spatial alternation network is corresponded to,It is imitative Penetrate the inverse matrix of transformation matrix.Wherein, parameter a1~a9With parameter a10~a12Respectively correspond 3D rotation transformation and 3D translation transformation. Point in 3D point cloud is multiplied by affine transformation matrix AθThe normalization of point cloud data can be realized
As shown in above formula,For point piNormalization result.
Step 3 is to make full use of the collected deep image information of depth transducer, by the 3D point cloud after normalized And the surface normal of point cloud is input to trained hand gestures Recurrent networks, realizes the prediction of hand joint point position, And operated by spatial inverse transform, obtain the tentative prediction result of hand gestures.
In the embodiment of the present invention, hand gestures Recurrent networks are PointNet++ network, for carrying out to 3D point cloud data Layer-by-layer feature abstraction and posture return.
PointNet++ network refers to tool there are three the PointNet network of level of abstraction, and PointNet++ network can be found in opinion Text (Qi C R, Yi L, Su H, et al.Pointnet++:Deep hierarchical feature learning on point sets in a metric space[C]//Advances in Neural Information Processing Systems.2017:5099-5108).PointNet network can be found in paper (Qi C R, Su H, Mo K, et al.Pointnet:Deep learning on point sets for 3d classification and segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:652-660)。
In the training stage, it is assumed that the training sample after shared T normalizationTraining sampleInclude PointXyz coordinate and the point surface normalPt GTIndicate hand real joint point coordinate;The then training stage Optimization aim is defined as:
Wherein, ω indicates network parameter to be optimized;ω*Network parameter after indicating optimization;Indicate that hand gestures return Return network PointNet++, output is the matrix of 3*M, i.e., the 3D coordinate of the hand joint point after M normalization.
Since the output of hand gestures Recurrent networks have passed through the normalization of spatial alternation network, needed in the training stage Luv space is changed into coordinate value contravariant.The processing operation of the inverse transformation is the inverse of the affine transformation matrix that step 2 obtains Matrix.
Step 4 is modified using tentative prediction result of the finger tip corrective networks to hand gestures, obtains final hand Posture.
Abovementioned steps 3 are tentative prediction as a result, in order to promote the precision of hand gestures estimation, are designed using PointNet as base The finger tip corrective networks of plinth, the hand joint point position that the finger tip corrective networks are predicted with hand gestures Recurrent networks is (before i.e. State the result of step 3) around K point (can search for obtain by k nearest neighbor) as input, to be repaired to finger tip coordinate Just.
The embodiment of the present invention obtain it is following the utility model has the advantages that
1) it directly using point cloud data as network inputs, does not need to carry out data complicated pretreatment, it can also be effectively Space waste and computing redundancy caused by avoiding converting data to voxel or multi-view image later.
2) the collected deep image information of depth transducer can be made full use of, hand gestures return end to end for training Network.
3) by spatial alternation thought, hand point cloud is normalized, so that this method has the point cloud data of input There are rotation, translation invariance.
4) fingertip location is corrected by finger tip corrective networks, keeps prediction result more accurate.
In order to verify the performance of above scheme of the embodiment of the present invention, surveyed on data set ICVL, MARA and NYU Examination, experimental result is as shown in Fig. 2, the method proposed achieves good performance on all data sets.
Through the above description of the embodiments, those skilled in the art can be understood that above-described embodiment can The mode of necessary general hardware platform can also be added to realize by software by software realization.Based on this understanding, The technical solution of above-described embodiment can be embodied in the form of software products, which can store non-easy at one In the property lost storage medium (can be CD-ROM, USB flash disk, mobile hard disk etc.), including some instructions are with so that a computer is set Standby (can be personal computer, server or the network equipment etc.) executes method described in each embodiment of the present invention.
The foregoing is only a preferred embodiment of the present invention, but scope of protection of the present invention is not limited thereto, Within the technical scope of the present disclosure, any changes or substitutions that can be easily thought of by anyone skilled in the art, It should be covered by the protection scope of the present invention.Therefore, protection scope of the present invention should be with the protection model of claims Subject to enclosing.

Claims (4)

1. a kind of 3D hand gestures estimation method based on cloud characterized by comprising
The depth image of hand is converted into 3D point cloud, and carries out down-sampled processing;
Training spatial alternation network, to treated that 3D point cloud is normalized by down-sampled;
3D point cloud after normalized and the surface normal for putting cloud are input to trained hand gestures Recurrent networks, It realizes the prediction of hand joint point position, and is operated by spatial inverse transform, obtain the tentative prediction result of hand gestures;
It is modified using tentative prediction result of the finger tip corrective networks to hand gestures, obtains final hand gestures.
2. a kind of 3D hand gestures estimation method based on cloud according to claim 1, which is characterized in that trained Spatial alternation network is used for by affine transformation, to realize the normalization of a cloud, formula are as follows:
Wherein, piIndicating at i-th point, N is the point number of 3D point cloud after down-sampled processing,For point piNormalization as a result, Aθ For the Prediction Parameters of symmetric space converting network.
3. a kind of 3D hand gestures estimation method based on cloud according to claim 2, which is characterized in that the hand Posture Recurrent networks are PointNet++ network, are returned for carrying out layer-by-layer feature abstraction and posture;
In the training stage, it is assumed that the training sample after shared T normalizationTraining sampleIt contains a littleXyz coordinate and the point surface normalPt GTIndicate hand real joint point coordinate;Then the training stage is excellent Change target are as follows:
Wherein, ω indicates network parameter to be optimized;ω*Network parameter after indicating optimization;Indicate that hand gestures return net Network PointNet++, output are the matrix of 3*M, i.e., the 3D coordinate of the hand joint point after M normalization;
It is operated again by spatial inverse transform, luv space is changed into the hand joint predicted point position coordinates contravariant.
4. a kind of 3D hand gestures estimation method based on cloud according to claim 3, which is characterized in that the finger tip Corrective networks, to be modified to fingertip location, obtain final using K point around hand joint point position as input Hand joint point coordinate.
CN201910402435.1A 2019-05-15 2019-05-15 3D hand gestures estimation method based on cloud Pending CN110135340A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910402435.1A CN110135340A (en) 2019-05-15 2019-05-15 3D hand gestures estimation method based on cloud

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910402435.1A CN110135340A (en) 2019-05-15 2019-05-15 3D hand gestures estimation method based on cloud

Publications (1)

Publication Number Publication Date
CN110135340A true CN110135340A (en) 2019-08-16

Family

ID=67574135

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910402435.1A Pending CN110135340A (en) 2019-05-15 2019-05-15 3D hand gestures estimation method based on cloud

Country Status (1)

Country Link
CN (1) CN110135340A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110705487A (en) * 2019-10-08 2020-01-17 清华大学深圳国际研究生院 Palm print acquisition equipment and method and image acquisition device thereof

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2015053896A1 (en) * 2013-10-11 2015-04-16 Intel Corporation 3d object tracking
CN107066935A (en) * 2017-01-25 2017-08-18 网易(杭州)网络有限公司 Hand gestures method of estimation and device based on deep learning
CN108491752A (en) * 2018-01-16 2018-09-04 北京航空航天大学 A kind of hand gestures method of estimation based on hand Segmentation convolutional network
CN109086683A (en) * 2018-07-11 2018-12-25 清华大学 A kind of manpower posture homing method and system based on cloud semantically enhancement

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2015053896A1 (en) * 2013-10-11 2015-04-16 Intel Corporation 3d object tracking
CN107066935A (en) * 2017-01-25 2017-08-18 网易(杭州)网络有限公司 Hand gestures method of estimation and device based on deep learning
CN108491752A (en) * 2018-01-16 2018-09-04 北京航空航天大学 A kind of hand gestures method of estimation based on hand Segmentation convolutional network
CN109086683A (en) * 2018-07-11 2018-12-25 清华大学 A kind of manpower posture homing method and system based on cloud semantically enhancement

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
JADERBERG M 等: ""Spatial transformer networks"", 《NEURAL INFORMATION PROCESSING SYSTEMS》 *
LIUHAO GE 等: ""Hand PointNet: 3D Hand Pose Estimation using Point Sets"", 《2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110705487A (en) * 2019-10-08 2020-01-17 清华大学深圳国际研究生院 Palm print acquisition equipment and method and image acquisition device thereof
CN110705487B (en) * 2019-10-08 2022-07-29 清华大学深圳国际研究生院 Palm print acquisition equipment and method and image acquisition device thereof

Similar Documents

Publication Publication Date Title
CN108764048B (en) Face key point detection method and device
WO2021103648A1 (en) Hand key point detection method, gesture recognition method, and related devices
CN109919984A (en) A kind of point cloud autoegistration method based on local feature description's
CN111951384B (en) Three-dimensional face reconstruction method and system based on single face picture
US20240046557A1 (en) Method, device, and non-transitory computer-readable storage medium for reconstructing a three-dimensional model
CN112819971B (en) Method, device, equipment and medium for generating virtual image
CN112328715B (en) Visual positioning method, training method of related model, related device and equipment
CN111951381B (en) Three-dimensional face reconstruction system based on single face picture
CN111831844A (en) Image retrieval method, image retrieval device, image retrieval apparatus, and medium
WO2021051526A1 (en) Multi-view 3d human pose estimation method and related apparatus
CN105654483A (en) Three-dimensional point cloud full-automatic registration method
Xu et al. GraspCNN: Real-time grasp detection using a new oriented diameter circle representation
CN111797692B (en) Depth image gesture estimation method based on semi-supervised learning
CN111709268B (en) Human hand posture estimation method and device based on human hand structure guidance in depth image
CN110838122A (en) Point cloud segmentation method and device and computer storage medium
CN117218246A (en) Training method and device for image generation model, electronic equipment and storage medium
CN116958420A (en) High-precision modeling method for three-dimensional face of digital human teacher
Qin et al. PointSkelCNN: Deep Learning‐Based 3D Human Skeleton Extraction from Point Clouds
CN110135340A (en) 3D hand gestures estimation method based on cloud
CN113920267B (en) Three-dimensional scene model construction method, device, equipment and storage medium
CN112991445B (en) Model training method, gesture prediction method, device, equipment and storage medium
CN113487713B (en) Point cloud feature extraction method and device and electronic equipment
CN114820899A (en) Attitude estimation method and device based on multi-view rendering
CN110942007B (en) Method and device for determining hand skeleton parameters, electronic equipment and storage medium
Pei et al. Loop closure in 2d lidar and rgb-d slam

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20190816