CN110135340A - 3D hand gestures estimation method based on cloud - Google Patents
3D hand gestures estimation method based on cloud Download PDFInfo
- Publication number
- CN110135340A CN110135340A CN201910402435.1A CN201910402435A CN110135340A CN 110135340 A CN110135340 A CN 110135340A CN 201910402435 A CN201910402435 A CN 201910402435A CN 110135340 A CN110135340 A CN 110135340A
- Authority
- CN
- China
- Prior art keywords
- hand
- point
- hand gestures
- cloud
- network
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/107—Static hand or arm
- G06V40/11—Hand-related biometrics; Hand pose recognition
Landscapes
- Engineering & Computer Science (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Theoretical Computer Science (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a kind of high-precision 3D hand gestures estimation methods, it include: using deep neural network structure end to end, the hand point cloud data that the capture of depth transducer equipment can directly be handled avoids space waste and computing redundancy caused by depth data is converted to voxel or multi-view image later;A spatial alternation algorithm is innovatively devised, so that this method has rotation, translation invariance to the point cloud data of input;To further decrease global error brought by the error of fingertip location estimation, a fingertip location optimization algorithm is innovatively devised.Therefore, the present invention can capture the hand gestures expression that complicated hand gestures change and estimate an accurate low-dimensional.
Description
Technical field
The present invention relates to computer visions and Attitude estimation technical field more particularly to a kind of 3D hand appearance based on cloud
State estimation method.
Background technique
In recent years, widely available with depth transducer, the 3D hand gestures estimation based on depth camera achieves aobvious
The progress of work.Meanwhile benefiting from the immense success that deep neural network obtains in Computer Vision Task, convolutional neural networks
(convolutional neural network, CNN) achieves frightened in the hand gestures estimation task based on depth image
The effect of people.However, convolutional neural networks cannot be utilized directly in depth image usually using 2D picture as the input of network
3D information.
Depth image is encoded to 3D point cloud data by Ge et al. proposition, carries out 3D posture to it using 3D convolutional neural networks
Estimate (Ge L, Liang H, Yuan J, et al.3d convolutional neural networks for efficient
and robust hand pose estimation from single depth images[C]//Proceedings of
the IEEE Conference on Computer Vision and Pattern Recognition.2017:1991-
2000).However, index will be presented with the increase of the scale of point cloud data in the memory space and parameter amount of 3D convolutional neural networks
Grade increases, and causes the accuracy rate of such methods and real-time not good enough.At the same time, it due to the sparsity of 3D point cloud, usually wraps
Containing a large amount of white space (i.e. no spatial data points), to cause a large amount of space waste.Although Ge et al. will be sparse
3D point cloud be converted into again handling it after dense 3D point cloud, not only increase unnecessary calculation amount, Er Qiegai
Become the spatial distribution of original point cloud data, it is undesirable so as to cause accuracy rate.
In addition, Ge et al. also propose based on multiple view convolutional neural networks 3D hand gestures homing method (Ge L,
Liang H,Yuan J,et al.Robust 3d hand pose estimation in single depth images:
from single-view cnn to multi-view cnns[C]//Proceedings of the IEEE
Conference on computer vision and pattern recognition.2016:3593-3601), this method
It needs that 3D point cloud is projected as to first the 2D image of different perspectives, then recycle convolution by complicated data prediction
Neural network carries out posture recurrence.
Summary of the invention
The object of the present invention is to provide and a kind of 3D hand gestures estimation method based on cloud, calculation amount is small, calculate
As a result accuracy rate is high.
The purpose of the present invention is what is be achieved through the following technical solutions:
A kind of 3D hand gestures estimation method based on cloud, comprising:
The depth image of hand is converted into 3D point cloud, and carries out down-sampled processing;
Training spatial alternation network, to treated that 3D point cloud is normalized by down-sampled;
3D point cloud after normalized and the surface normal for putting cloud are input to trained hand gestures recurrence
Network is realized the prediction of hand joint point position, and is operated by spatial inverse transform, and the tentative prediction knot of hand gestures is obtained
Fruit;
It is modified using tentative prediction result of the finger tip corrective networks to hand gestures, obtains final hand gestures.
As seen from the above technical solution provided by the invention, it directly using point cloud data as input avoids that cloud will be put
Spatial redundancy and computing cost caused by data are converted to voxel later;Meanwhile this method is made by spatial alternation network
There is rotation, translation invariance to input point cloud data;Fingertip location also finally is corrected by finger tip corrective networks, ties prediction
Fruit is more accurate.
Detailed description of the invention
In order to illustrate the technical solution of the embodiments of the present invention more clearly, required use in being described below to embodiment
Attached drawing be briefly described, it should be apparent that, drawings in the following description are only some embodiments of the invention, for this
For the those of ordinary skill in field, without creative efforts, it can also be obtained according to these attached drawings other
Attached drawing.
Fig. 1 is a kind of flow chart of the 3D hand gestures estimation method based on cloud provided in an embodiment of the present invention;
Fig. 2 is the test result provided in an embodiment of the present invention on ICVL, MARA and NYU data set.
Specific embodiment
With reference to the attached drawing in the embodiment of the present invention, technical solution in the embodiment of the present invention carries out clear, complete
Ground description, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.Based on this
The embodiment of invention, every other implementation obtained by those of ordinary skill in the art without making creative efforts
Example, belongs to protection scope of the present invention.
The embodiment of the present invention provides a kind of 3D hand gestures estimation method based on cloud, as shown in Figure 1, it mainly includes
Following steps:
The depth image of hand is converted to 3D point cloud, and carries out down-sampled processing by step 1.
In the embodiment of the present invention, the depth image can be depth camera, depth transducer or other relevant devices
Acquired image.
After depth image is converted to 3D point cloud, down-sampled is N number of point, and down-sampled treated that 3D point cloud can be expressed asWherein, piIndicate at i-th point, a corresponding 3D coordinate, the numerical value of N can be set according to the actual situation, example
Such as, 1024 can be set as.
Step 2, training spatial alternation network, to treated that 3D point cloud is normalized by down-sampled.
In the embodiment of the present invention, paper (Jaderberg M, Simonyan K, Zisserman can be used for reference
A.Spatial transformer networks[C]//Advances in neural information processing
Systems.2015:2017-2025 the network structure disclosed in).Spatial alternation network passes through affine transformation (affine
Transformation), the normalization of point cloud is realized, principle can indicate are as follows:
As shown in above formula,3D coordinate before correspondent transform,3D coordinate after correspondent transform,
It is denoted herein as the form of homogeneous coordinates.AθFor affine transformation matrix, the prediction result of spatial alternation network is corresponded to,It is imitative
Penetrate the inverse matrix of transformation matrix.Wherein, parameter a1~a9With parameter a10~a12Respectively correspond 3D rotation transformation and 3D translation transformation.
Point in 3D point cloud is multiplied by affine transformation matrix AθThe normalization of point cloud data can be realized
As shown in above formula,For point piNormalization result.
Step 3 is to make full use of the collected deep image information of depth transducer, by the 3D point cloud after normalized
And the surface normal of point cloud is input to trained hand gestures Recurrent networks, realizes the prediction of hand joint point position,
And operated by spatial inverse transform, obtain the tentative prediction result of hand gestures.
In the embodiment of the present invention, hand gestures Recurrent networks are PointNet++ network, for carrying out to 3D point cloud data
Layer-by-layer feature abstraction and posture return.
PointNet++ network refers to tool there are three the PointNet network of level of abstraction, and PointNet++ network can be found in opinion
Text (Qi C R, Yi L, Su H, et al.Pointnet++:Deep hierarchical feature learning on
point sets in a metric space[C]//Advances in Neural Information Processing
Systems.2017:5099-5108).PointNet network can be found in paper (Qi C R, Su H, Mo K, et
al.Pointnet:Deep learning on point sets for 3d classification and
segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and
Pattern Recognition.2017:652-660)。
In the training stage, it is assumed that the training sample after shared T normalizationTraining sampleInclude
PointXyz coordinate and the point surface normalPt GTIndicate hand real joint point coordinate;The then training stage
Optimization aim is defined as:
Wherein, ω indicates network parameter to be optimized;ω*Network parameter after indicating optimization;Indicate that hand gestures return
Return network PointNet++, output is the matrix of 3*M, i.e., the 3D coordinate of the hand joint point after M normalization.
Since the output of hand gestures Recurrent networks have passed through the normalization of spatial alternation network, needed in the training stage
Luv space is changed into coordinate value contravariant.The processing operation of the inverse transformation is the inverse of the affine transformation matrix that step 2 obtains
Matrix.
Step 4 is modified using tentative prediction result of the finger tip corrective networks to hand gestures, obtains final hand
Posture.
Abovementioned steps 3 are tentative prediction as a result, in order to promote the precision of hand gestures estimation, are designed using PointNet as base
The finger tip corrective networks of plinth, the hand joint point position that the finger tip corrective networks are predicted with hand gestures Recurrent networks is (before i.e.
State the result of step 3) around K point (can search for obtain by k nearest neighbor) as input, to be repaired to finger tip coordinate
Just.
The embodiment of the present invention obtain it is following the utility model has the advantages that
1) it directly using point cloud data as network inputs, does not need to carry out data complicated pretreatment, it can also be effectively
Space waste and computing redundancy caused by avoiding converting data to voxel or multi-view image later.
2) the collected deep image information of depth transducer can be made full use of, hand gestures return end to end for training
Network.
3) by spatial alternation thought, hand point cloud is normalized, so that this method has the point cloud data of input
There are rotation, translation invariance.
4) fingertip location is corrected by finger tip corrective networks, keeps prediction result more accurate.
In order to verify the performance of above scheme of the embodiment of the present invention, surveyed on data set ICVL, MARA and NYU
Examination, experimental result is as shown in Fig. 2, the method proposed achieves good performance on all data sets.
Through the above description of the embodiments, those skilled in the art can be understood that above-described embodiment can
The mode of necessary general hardware platform can also be added to realize by software by software realization.Based on this understanding,
The technical solution of above-described embodiment can be embodied in the form of software products, which can store non-easy at one
In the property lost storage medium (can be CD-ROM, USB flash disk, mobile hard disk etc.), including some instructions are with so that a computer is set
Standby (can be personal computer, server or the network equipment etc.) executes method described in each embodiment of the present invention.
The foregoing is only a preferred embodiment of the present invention, but scope of protection of the present invention is not limited thereto,
Within the technical scope of the present disclosure, any changes or substitutions that can be easily thought of by anyone skilled in the art,
It should be covered by the protection scope of the present invention.Therefore, protection scope of the present invention should be with the protection model of claims
Subject to enclosing.
Claims (4)
1. a kind of 3D hand gestures estimation method based on cloud characterized by comprising
The depth image of hand is converted into 3D point cloud, and carries out down-sampled processing;
Training spatial alternation network, to treated that 3D point cloud is normalized by down-sampled;
3D point cloud after normalized and the surface normal for putting cloud are input to trained hand gestures Recurrent networks,
It realizes the prediction of hand joint point position, and is operated by spatial inverse transform, obtain the tentative prediction result of hand gestures;
It is modified using tentative prediction result of the finger tip corrective networks to hand gestures, obtains final hand gestures.
2. a kind of 3D hand gestures estimation method based on cloud according to claim 1, which is characterized in that trained
Spatial alternation network is used for by affine transformation, to realize the normalization of a cloud, formula are as follows:
Wherein, piIndicating at i-th point, N is the point number of 3D point cloud after down-sampled processing,For point piNormalization as a result, Aθ
For the Prediction Parameters of symmetric space converting network.
3. a kind of 3D hand gestures estimation method based on cloud according to claim 2, which is characterized in that the hand
Posture Recurrent networks are PointNet++ network, are returned for carrying out layer-by-layer feature abstraction and posture;
In the training stage, it is assumed that the training sample after shared T normalizationTraining sampleIt contains a littleXyz coordinate and the point surface normalPt GTIndicate hand real joint point coordinate;Then the training stage is excellent
Change target are as follows:
Wherein, ω indicates network parameter to be optimized;ω*Network parameter after indicating optimization;Indicate that hand gestures return net
Network PointNet++, output are the matrix of 3*M, i.e., the 3D coordinate of the hand joint point after M normalization;
It is operated again by spatial inverse transform, luv space is changed into the hand joint predicted point position coordinates contravariant.
4. a kind of 3D hand gestures estimation method based on cloud according to claim 3, which is characterized in that the finger tip
Corrective networks, to be modified to fingertip location, obtain final using K point around hand joint point position as input
Hand joint point coordinate.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910402435.1A CN110135340A (en) | 2019-05-15 | 2019-05-15 | 3D hand gestures estimation method based on cloud |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910402435.1A CN110135340A (en) | 2019-05-15 | 2019-05-15 | 3D hand gestures estimation method based on cloud |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110135340A true CN110135340A (en) | 2019-08-16 |
Family
ID=67574135
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910402435.1A Pending CN110135340A (en) | 2019-05-15 | 2019-05-15 | 3D hand gestures estimation method based on cloud |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110135340A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110705487A (en) * | 2019-10-08 | 2020-01-17 | 清华大学深圳国际研究生院 | Palm print acquisition equipment and method and image acquisition device thereof |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2015053896A1 (en) * | 2013-10-11 | 2015-04-16 | Intel Corporation | 3d object tracking |
CN107066935A (en) * | 2017-01-25 | 2017-08-18 | 网易(杭州)网络有限公司 | Hand gestures method of estimation and device based on deep learning |
CN108491752A (en) * | 2018-01-16 | 2018-09-04 | 北京航空航天大学 | A kind of hand gestures method of estimation based on hand Segmentation convolutional network |
CN109086683A (en) * | 2018-07-11 | 2018-12-25 | 清华大学 | A kind of manpower posture homing method and system based on cloud semantically enhancement |
-
2019
- 2019-05-15 CN CN201910402435.1A patent/CN110135340A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2015053896A1 (en) * | 2013-10-11 | 2015-04-16 | Intel Corporation | 3d object tracking |
CN107066935A (en) * | 2017-01-25 | 2017-08-18 | 网易(杭州)网络有限公司 | Hand gestures method of estimation and device based on deep learning |
CN108491752A (en) * | 2018-01-16 | 2018-09-04 | 北京航空航天大学 | A kind of hand gestures method of estimation based on hand Segmentation convolutional network |
CN109086683A (en) * | 2018-07-11 | 2018-12-25 | 清华大学 | A kind of manpower posture homing method and system based on cloud semantically enhancement |
Non-Patent Citations (2)
Title |
---|
JADERBERG M 等: ""Spatial transformer networks"", 《NEURAL INFORMATION PROCESSING SYSTEMS》 * |
LIUHAO GE 等: ""Hand PointNet: 3D Hand Pose Estimation using Point Sets"", 《2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION》 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110705487A (en) * | 2019-10-08 | 2020-01-17 | 清华大学深圳国际研究生院 | Palm print acquisition equipment and method and image acquisition device thereof |
CN110705487B (en) * | 2019-10-08 | 2022-07-29 | 清华大学深圳国际研究生院 | Palm print acquisition equipment and method and image acquisition device thereof |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108764048B (en) | Face key point detection method and device | |
WO2021103648A1 (en) | Hand key point detection method, gesture recognition method, and related devices | |
CN109919984A (en) | A kind of point cloud autoegistration method based on local feature description's | |
CN111951384B (en) | Three-dimensional face reconstruction method and system based on single face picture | |
US20240046557A1 (en) | Method, device, and non-transitory computer-readable storage medium for reconstructing a three-dimensional model | |
CN112819971B (en) | Method, device, equipment and medium for generating virtual image | |
CN112328715B (en) | Visual positioning method, training method of related model, related device and equipment | |
CN111951381B (en) | Three-dimensional face reconstruction system based on single face picture | |
CN111831844A (en) | Image retrieval method, image retrieval device, image retrieval apparatus, and medium | |
WO2021051526A1 (en) | Multi-view 3d human pose estimation method and related apparatus | |
CN105654483A (en) | Three-dimensional point cloud full-automatic registration method | |
Xu et al. | GraspCNN: Real-time grasp detection using a new oriented diameter circle representation | |
CN111797692B (en) | Depth image gesture estimation method based on semi-supervised learning | |
CN111709268B (en) | Human hand posture estimation method and device based on human hand structure guidance in depth image | |
CN110838122A (en) | Point cloud segmentation method and device and computer storage medium | |
CN117218246A (en) | Training method and device for image generation model, electronic equipment and storage medium | |
CN116958420A (en) | High-precision modeling method for three-dimensional face of digital human teacher | |
Qin et al. | PointSkelCNN: Deep Learning‐Based 3D Human Skeleton Extraction from Point Clouds | |
CN110135340A (en) | 3D hand gestures estimation method based on cloud | |
CN113920267B (en) | Three-dimensional scene model construction method, device, equipment and storage medium | |
CN112991445B (en) | Model training method, gesture prediction method, device, equipment and storage medium | |
CN113487713B (en) | Point cloud feature extraction method and device and electronic equipment | |
CN114820899A (en) | Attitude estimation method and device based on multi-view rendering | |
CN110942007B (en) | Method and device for determining hand skeleton parameters, electronic equipment and storage medium | |
Pei et al. | Loop closure in 2d lidar and rgb-d slam |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190816 |