CN111539288A - Real-time detection method for gestures of both hands - Google Patents

Real-time detection method for gestures of both hands Download PDF

Info

Publication number
CN111539288A
CN111539288A CN202010301111.1A CN202010301111A CN111539288A CN 111539288 A CN111539288 A CN 111539288A CN 202010301111 A CN202010301111 A CN 202010301111A CN 111539288 A CN111539288 A CN 111539288A
Authority
CN
China
Prior art keywords
hand
joint point
real
joint
time detection
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010301111.1A
Other languages
Chinese (zh)
Other versions
CN111539288B (en
Inventor
高成英
李文盛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
National Sun Yat Sen University
Original Assignee
National Sun Yat Sen University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by National Sun Yat Sen University filed Critical National Sun Yat Sen University
Priority to CN202010301111.1A priority Critical patent/CN111539288B/en
Publication of CN111539288A publication Critical patent/CN111539288A/en
Application granted granted Critical
Publication of CN111539288B publication Critical patent/CN111539288B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/107Static hand or arm
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/267Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/46Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
    • G06V10/462Salient features, e.g. scale invariant feature transforms [SIFT]
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Human Computer Interaction (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a real-time detection method of two-hand postures, which can reconstruct skeleton models of two hands by reconstructing the two-hand postures by adopting 2d joint point positions and 3d joint point positions, can clearly construct even the two-hand postures of complex interaction, solves the problem that the two-hand postures of complex interaction can not be detected in the prior art, and simultaneously can reduce the operation difficulty of reconstructing the two-hand skeleton models and improve the speed of reconstructing the two-hand skeleton models by adopting a mode of fitting the 2d joint point positions and the 3d joint point positions, thereby ensuring the real-time property of detecting the two-hand postures and solving the problem that the real-time property is difficult to realize in the prior art.

Description

Real-time detection method for gestures of both hands
Technical Field
The invention relates to the technical field of gesture detection, in particular to a real-time detection method for gestures of two hands.
Background
The hand plays a very critical role in human daily life, hand gestures contain a large amount of non-language communication information, tracking and reconstruction of hand gestures become more and more important, prediction of 3D hand gestures is a long-term research direction in computer vision, and a large number of applications are applied in the fields of virtual/augmented reality (VR/AR), human-computer interaction, human motion tracking and control, and the like, wherein real-time and accurate detection of hand gestures is required in all of the applications.
However, the conventional method for detecting the hand posture has the following disadvantages: 1. only two hands with simple gestures can be detected, and two-hand gestures with complex interaction cannot be detected; 2. when the mesh of the hand posture is reconstructed, a large amount of calculation and more hardware resources are needed, and the real-time performance is difficult to meet.
Disclosure of Invention
The invention aims to provide a real-time detection method for the posture of two hands, which solves the problems that the posture of two hands with complex interaction cannot be detected and the real-time performance is difficult to realize in the prior art.
The invention is realized by the following technical scheme:
a real-time detection method of double-hand posture is based on a monocular camera and specifically comprises the following steps:
step S1, capturing single-frame images of both hands by a monocular camera, inputting the single-frame images into an image segmentation network for segmentation, and segmenting into segmentation results of three categories including a left hand, a right hand and a background;
step S2, extracting a left-hand heat map comprising the position of the left-hand 2d joint point and a right-hand heat map comprising the position of the right-hand 2d joint point according to the segmentation result;
step S3, calculating the position of a left-hand 3d joint point and the position of a right-hand 3d joint point according to a left-hand heat map comprising the positions of the left-hand 2d joint points and a right-hand heat map comprising the positions of the right-hand 2d joint points;
and step S4, fitting the positions of the left-hand 2d joint points and the positions of the left-hand 3d joint points with the left-hand skeleton model, and fitting the positions of the right-hand 2d joint points and the positions of the right-hand 3d joint points with the right-hand skeleton model to obtain parameters of the left-hand skeleton model and the right-hand skeleton model, so as to obtain the postures of the two hands.
As a further alternative to the real-time detection method of the two-hand posture, the step S1 includes the steps of:
step S11, extracting image features according to the input double-hand single-frame image;
step S12, performing up-sampling operation on the image features to obtain probability graphs of three categories including a left hand, a right hand and a background;
and step S13, obtaining segmentation results of three categories including the left hand, the right hand and the background according to the probability graph including the left hand, the right hand and the background.
As a further alternative to the real-time detection method of two-handed gestures, the image segmentation network comprises a first convolutional layer, a second convolutional layer, and a transposed convolutional layer.
As a further alternative to the real-time detection method of the two-hand posture, the step S11 includes the steps of:
step S111, inputting the two-hand single-frame image into a first convolution layer for down-sampling processing;
in step S112, the downsampled image is input to the second convolution layer and image feature extraction is performed.
As a further alternative to the real-time detection method of the two-hand posture, the step S2 includes the steps of:
step S21, overlapping the segmentation results of the three categories including the left hand, the right hand and the background with the original single-frame image, inputting the result into a two-dimensional joint point extraction network after overlapping, and performing down-sampling processing to obtain posture characteristics;
step S22, performing upsampling processing on the posture features to obtain a left-hand heat map including the position of the left-hand 2d joint point and a right-hand heat map including the position of the right-hand 2d joint point.
As a further alternative to the real-time detection method of two-hand gestures, the two-dimensional joint point extraction network includes a network of Hourglass structures and a third convolutional layer.
As a further alternative to the real-time detection method of the two-hand posture, the step S3 includes the steps of:
step S31, extracting the confidence coefficient of the left-hand 2d joint point and the confidence coefficient of the right-hand 2d joint point according to the left-hand heat map and the right-hand heat map;
step S32, the left hand 2d joint point position and the confidence coefficient of the left hand 2d joint point, and the right hand 2d joint point position and the confidence coefficient of the right hand 2d joint point are input into a three-dimensional joint point extraction network, and the left hand 3d joint point position and the right hand 3d joint point position are obtained.
As a further alternative to the real-time detection method of two-hand gestures, the three-dimensional joint point extraction network comprises a first fully-connected layer, a dual linear module, and a second fully-connected layer.
As a further alternative to the real-time detection method of two-hand gestures, the dual linear module comprises a first dual linear module and a second dual linear module, the first dual linear module and the second dual linear module each comprising two fully-connected layers.
As a further alternative of the real-time detection method of the two-hand posture, the fitting in step S4 is performed by a minimized energy equation including a 2d joint point constraint term, a 3d joint point constraint term, a joint angle constraint term, and a time constraint term.
The invention has the beneficial effects that:
by using the method, the two-hand posture reconstruction is carried out by adopting the 2d joint point position and the 3d joint point position, so that the skeleton models of two hands can be reconstructed, even the two-hand posture with complex interaction can be clearly constructed, the problem that the two-hand posture with complex interaction can not be detected in the prior art is solved, meanwhile, the calculation difficulty of reconstructing the two-hand skeleton models can be reduced by adopting the mode of fitting the 2d joint point position and the 3d joint point position, the speed of reconstructing the two-hand skeleton models is improved, the real-time property of detecting the two-hand posture is ensured, and the problem that the real-time property is difficult to realize in the prior art is solved.
Drawings
FIG. 1 is a schematic flow chart of a real-time detection method of two-hand gestures according to the present invention;
FIG. 2 is a schematic diagram illustrating the components of an image segmentation network in the real-time detection method for two-hand gestures according to the present invention;
FIG. 3 is a schematic diagram illustrating a two-dimensional joint extraction network in the real-time detection method of two-hand gestures according to the present invention;
FIG. 4 is a schematic diagram illustrating a three-dimensional joint extraction network in a real-time detection method for two-hand gestures according to the present invention;
description of reference numerals: 1. a first winding layer; 2. a second convolutional layer; 3. transposing the convolution layer; 4. a network of the Hourglass architecture; 5. a third convolutional layer; 6. a first fully-connected layer; 7. a first dual linear module; 8. a second dual linear module; 9. a second fully connected layer.
Detailed Description
The invention will be described in detail with reference to the drawings and specific embodiments, which are illustrative of the invention and are not to be construed as limiting the invention.
As shown in fig. 1 to 4, a real-time detection method for a two-hand gesture, which is based on a monocular camera, specifically includes the following steps:
step S1, capturing single-frame images of both hands by a monocular camera, inputting the single-frame images into an image segmentation network for segmentation, and segmenting into segmentation results of three categories including a left hand, a right hand and a background;
step S2, extracting a left-hand heat map comprising the position of the left-hand 2d joint point and a right-hand heat map comprising the position of the right-hand 2d joint point according to the segmentation result;
step S3, calculating the position of a left-hand 3d joint point and the position of a right-hand 3d joint point according to a left-hand heat map comprising the positions of the left-hand 2d joint points and a right-hand heat map comprising the positions of the right-hand 2d joint points;
and step S4, fitting the positions of the left-hand 2d joint points and the positions of the left-hand 3d joint points with the left-hand skeleton model, and fitting the positions of the right-hand 2d joint points and the positions of the right-hand 3d joint points with the right-hand skeleton model to obtain parameters of the left-hand skeleton model and the right-hand skeleton model, so as to obtain the postures of the two hands.
In this embodiment, carry out both hands posture through adopting 2d joint point position and 3d joint point position and rebuild, can rebuild out the skeleton model of both hands, even complicated interactive both hands posture also can clearly be found, the problem that can't detect complicated interactive both hands posture that prior art exists has been solved, and simultaneously, carry out the mode of fitting through adopting 2d joint point position and 3d joint point position, can reduce the operation degree of difficulty of rebuilding both hands skeleton model, promote the speed of rebuilding both hands skeleton model, thereby the real-time of detecting both hands posture has been guaranteed, thereby the problem that the real-time nature is difficult to realize that prior art exists has been solved.
It should be noted that, the skeleton model of two hands, each hand includes 21 joint points 2d and 21 joint points 3d, wherein the joint point at the wrist is used as the root joint point, there are four joint points on each finger, and the skeleton of each hand has 26 degrees of freedom, wherein there are 6 degrees of freedom at the root joint point at the wrist, and there are 4 degrees of freedom in each finger.
Preferably, the step S1 includes the steps of:
step S11, extracting image features according to the input double-hand single-frame image;
step S12, performing up-sampling operation on the image features to obtain probability graphs of three categories including a left hand, a right hand and a background;
and step S13, obtaining segmentation results of three categories including the left hand, the right hand and the background according to the probability graph including the left hand, the right hand and the background.
In the embodiment, a captured single-frame image is input into an image segmentation network to obtain a segmentation image comprising three categories of a left hand, a right hand and a background; the method specifically comprises the following steps: the image segmentation network firstly extracts image characteristics through downsampling, then restores the image characteristics to original pixels through upsampling, and adds the characteristics which are the same as the pixels in the downsampling process during the upsampling process to be used as input of the next upsampling process, so that the characteristics in the original image can be guaranteed not to be lost.
Preferably, the image segmentation network comprises a first convolutional layer 1, a second convolutional layer 2 and a transposed convolutional layer 3.
In this embodiment, the first convolutional layer 1 is an encoder, the transposed convolutional layer 3 is a decoder, and the resolution of the image is reduced by the encoder and restored by the decoder.
Preferably, the step S11 includes the steps of:
step S111, inputting the two-hand single-frame image into a first convolution layer for down-sampling processing;
in step S112, the downsampled image is input to the second convolution layer and image feature extraction is performed.
In this embodiment, the first convolutional layer 1 includes five convolutional layers having a convolutional kernel size of 3 and a step size of 2, and is capable of reducing the resolution of an input image to half of the original resolution, reducing the resolution to thirty-half of the original image after five successive dimensionalities reduction, the convolutional kernel size of the second convolutional layer 2 is 3 and a step size of 1, and is capable of extracting image features, and the transposed convolutional layer 3 includes five convolutional layers having a convolutional kernel size of 3 and a step size of 2, and is capable of increasing the resolution of the input features to twice of the original resolution.
Preferably, the step S2 includes the steps of:
step S21, overlapping the segmentation results of the three categories including the left hand, the right hand and the background with the original single-frame image, inputting the result into a two-dimensional joint point extraction network after overlapping, and performing down-sampling processing to obtain posture characteristics;
step S22, performing upsampling processing on the posture features to obtain a left-hand heat map including the position of the left-hand 2d joint point and a right-hand heat map including the position of the right-hand 2d joint point.
In this embodiment, the original single frame image and the segmentation result are superimposed together, and the superimposed single frame image and the segmentation result are input into a two-dimensional joint point extraction network, the network firstly performs down-sampling to extract posture features, and then performs up-sampling to obtain 42H × W probability maps, where H is the height of the original image, W is the width of the original image, each probability map represents the position of one joint point, the corresponding position with the maximum value in the probability map is the position of the corresponding two-dimensional joint point, and the corresponding 42 joint points can be extracted from the probability map, where 21 joint points exist on the left hand and 21 joint points exist on the right hand.
Preferably, the two-dimensional joint point extraction network includes a network 4 of a Hourglass structure and a third convolution layer 5.
Preferably, the step S3 includes the steps of:
step S31, extracting the confidence coefficient of the left-hand 2d joint point and the confidence coefficient of the right-hand 2d joint point according to the left-hand heat map and the right-hand heat map;
step S32, the left hand 2d joint point position and the confidence coefficient of the left hand 2d joint point, and the right hand 2d joint point position and the confidence coefficient of the right hand 2d joint point are input into a three-dimensional joint point extraction network, and the left hand 3d joint point position and the right hand 3d joint point position are obtained.
In this embodiment, the position of the point with the largest value in each heat map is the position of the 2d joint point, and this value is the confidence of the 2d joint point prediction, so the confidence of the left-hand 2d joint point position and the left-hand 2d joint point, and the confidence of the right-hand 2d joint point position and the right-hand 2d joint point can be extracted through the heat maps.
Preferably, the three-dimensional joint point extraction network comprises a first fully-connected layer 6, a dyadic linear module and a second fully-connected layer 9.
Preferably, the dual linear module includes a first dual linear module 7 and a second dual linear module 8, and the first dual linear module 7 and the second dual linear module 8 respectively include two fully connected layers.
Preferably, the fitting in step S4 is performed by a minimization energy equation including a 2d joint point constraint term, a 3d joint point constraint term, a joint angle constraint term, and a time constraint term.
Example (b):
step S1, shooting single-frame images of both hands by a monocular camera, and inputting the shot single-frame images into an image segmentation network to obtain three types of segmentation maps, each of which is: left hand, right hand, and background; specifically, the image segmentation network firstly extracts image features through downsampling, then restores the image features to original pixels through upsampling, and adds the features which are the same as the pixels during the downsampling during the upsampling to be used as the input of the next upsampling, so that the features in the original image can be ensured not to be lost, the output of the network is a probability graph of H W3, wherein H is the height of the original image, W is the width of the original image, in the result of H W, values of three channels corresponding to each point are used as the probabilities of three categories, and the result of H W1 is extracted from the probability graph, in the result, the value of a background part is 0, the value of a left hand part is 1, and the value of a right hand part is 2; it should be noted that, the segmentation result corresponds to the original image one to one, the position where the point with the median value of 1 in the segmentation result is located corresponds to the original image, that is, the pixel point where the left hand is located, and the point with the median value of 2 in the segmentation result corresponds to the pixel point where the right hand is located in the original image; when training the image segmentation network, calculating the cross entropy of the predicted value and the true value by using the following loss function:
Figure BDA0002454017040000081
wherein M represents three categories, 3, S in the present inventioniAnd
Figure BDA0002454017040000082
respectively representing the real value and the predicted value of the ith class segmentation result.
And step S2, superposing the original single-frame image and the segmentation result together to obtain one H W4 feature as input into a two-dimensional joint point extraction network, firstly down-sampling the extracted feature by the network, and then up-sampling the extracted feature to obtain 42H W probability maps, wherein each probability map represents the position of one joint point, the position corresponding to the point with the maximum value in the probability maps is the position of the corresponding two-dimensional joint point, and the corresponding 42 joint points can be extracted from the probability maps, wherein the left hand has 21 joint points, and the right hand has 21 joint points.
Step S3 is a process of representing the position of each joint point by using a heat map, and extracting the position of the point with the maximum value from the heat map, that is, 2 obtained by predictiond position of joint point, maximum value c in ith thermogrami∈[0,1]For predicting the confidence of the ith joint point, batch normalization operation and sigmoid activation operation are required to be performed after each layer, and the following loss function is adopted in the training of the step:
Figure BDA0002454017040000091
wherein N is the number of 2d joint points, and 42 u is taken in the inventioniAnd
Figure BDA0002454017040000092
respectively representing the true value and the predicted value of the ith key point;
combining the position of the left-hand 2d joint point and the confidence coefficient of the left-hand 2d joint point, combining the position of the right-hand 2d joint point and the confidence coefficient of the right-hand 2d joint point, inputting a combined result into a three-dimensional joint point extraction network to obtain the position of a left-hand 3d joint point and the position of a right-hand 3d joint point, specifically, firstly expanding an input vector to 1024 dimensions through a full connecting layer, then passing through two dual linear modules, and finally converting the input vector to 42 x 3 through the full connecting layer to obtain the global positions of 42 left-hand and right-hand joint points;
it should be noted that the following loss function is used for training:
Figure BDA0002454017040000093
wherein, JiIs the true value of the position of the joint point,
Figure BDA0002454017040000101
the predicted value of the joint point position is shown, and N is the number of the joint points.
Step S4, using a moving skeleton model to fit to the predicted 2d/3d joint points, wherein the skeleton model of each hand comprises 26 degrees of freedom, t ∈ R3And R ∈ SO (3) respectively represent the global position and rotation angle of the root joint point, θ ∈ R20Representing fingersTake Θ ═ t, R, θ } as a parameter of the skeletal model, by transforming M (Θ) ∈ R21×3Obtaining the global position of the hand joint point, and recording the parameters of the left and right hand skeletons as thetaLAnd ΘR,ΘH={ΘLRDenotes the skeletal parameters of both hands, fitting the skeletal model to 3d joint points by minimizing the following, where JiRepresents the global position of the ith 3d joint point:
Figure BDA0002454017040000102
in addition, the 2d joint points are used as additional constraints to make the predicted result more fit to the features of the hand in the original image. Fitting the skeleton to the 2d joint point by minimizing the following formula, where uiDenotes the position of the ith 2d joint, pi is used to project the 3d joint onto the 2d plane:
Figure BDA0002454017040000103
in order to keep the posture of the hand skeleton model normal, it is necessary to ensure that the hand joints do not have large-angle bending, and therefore, limitation needs to be added to the joint angles. Here we only constrain the parameters predicted from the first frame, let
Figure BDA0002454017040000104
And
Figure BDA0002454017040000105
the upper limit and the lower limit of the ith joint angle are respectively, and the joint angle is monitored by the following formula:
Figure BDA0002454017040000106
in order to avoid the excessive change of the hand posture amplitude obtained by reconstruction between adjacent frames, the change rate of the parameters obtained by prediction of two adjacent frames needs to be constrained, as shown in the following formula:
Figure BDA0002454017040000111
constraining the skeleton fitting process through the four formulas, and fitting by minimizing the following energy equation to obtain thetaHWhere w is the weight of each term, w is the time when predicting the parameters of the first frame3Not 0, in the subsequent prediction, w3Is 0:
E=ω1E3D2E2D3E
during training, the left-hand segmentation, the 2d joint point prediction and the 3d joint point prediction tasks are pre-trained respectively, and then the prediction of the 2d/3d joint points is trained end to end.
The technical solutions provided by the embodiments of the present invention are described in detail above, and the principles and embodiments of the present invention are explained herein by using specific examples, and the descriptions of the embodiments are only used to help understanding the principles of the embodiments of the present invention; meanwhile, for a person skilled in the art, according to the embodiments of the present invention, there may be variations in the specific implementation manners and application ranges, and in summary, the content of the present description should not be construed as a limitation to the present invention.

Claims (10)

1. A real-time detection method of bimanual gestures, said method based on a monocular camera, characterized in that: the method specifically comprises the following steps:
step S1, capturing a double-hand single-frame image through a monocular camera, inputting the single-frame image into an image segmentation network for segmentation, and segmenting into segmentation results of three categories including a left hand, a right hand and a background;
step S2, extracting a left-hand heat map comprising the position of the left-hand 2d joint point and a right-hand heat map comprising the position of the right-hand 2d joint point according to the segmentation result;
step S3, calculating the position of a left-hand 3d joint point and the position of a right-hand 3d joint point according to a left-hand heat map comprising the positions of the left-hand 2d joint points and a right-hand heat map comprising the positions of the right-hand 2d joint points;
and step S4, fitting the positions of the left-hand 2d joint points and the positions of the left-hand 3d joint points with the left-hand skeleton model, and fitting the positions of the right-hand 2d joint points and the positions of the right-hand 3d joint points with the right-hand skeleton model to obtain parameters of the left-hand skeleton model and the right-hand skeleton model, so as to obtain the postures of the two hands.
2. A method of real-time detection of bimanual gestures as claimed in claim 1, further comprising: the step S1 includes the steps of:
step S11, extracting image features according to the input double-hand single-frame image;
step S12, performing up-sampling operation on the image features to obtain probability graphs of three categories including a left hand, a right hand and a background;
and step S13, obtaining segmentation results of three categories including the left hand, the right hand and the background according to the probability graph including the left hand, the right hand and the background.
3. A method of real-time detection of bimanual gestures as claimed in claim 2, further comprising: the image segmentation network includes a first convolutional layer, a second convolutional layer, and a transposed convolutional layer.
4. A method of real-time detection of bimanual gestures as claimed in claim 3, further comprising: the step S11 includes the steps of:
step S111, inputting the two-hand single-frame image into a first convolution layer for down-sampling processing;
in step S112, the downsampled image is input to the second convolution layer and image feature extraction is performed.
5. A method for real-time detection of bimanual gestures according to claim 1 or 4, further comprising: the step S2 includes the steps of:
step S21, overlapping the segmentation results of the three categories including the left hand, the right hand and the background with the original single-frame image, inputting the result into a two-dimensional joint point extraction network after overlapping, and performing down-sampling processing to obtain posture characteristics;
step S22, performing upsampling processing on the posture features to obtain a left-hand heat map including the position of the left-hand 2d joint point and a right-hand heat map including the position of the right-hand 2d joint point.
6. A method of real-time detection of bimanual gestures according to claim 5, further comprising: the two-dimensional joint point extraction network comprises a network of a Hourglass structure and a third convolution layer.
7. The method of claim 6, wherein the real-time detection of the two-hand gesture comprises: the step S3 includes the steps of:
step S31, extracting the confidence coefficient of the left-hand 2d joint point and the confidence coefficient of the right-hand 2d joint point according to the left-hand heat map and the right-hand heat map;
step S32, the left hand 2d joint point position and the confidence coefficient of the left hand 2d joint point, and the right hand 2d joint point position and the confidence coefficient of the right hand 2d joint point are input into a three-dimensional joint point extraction network, and the left hand 3d joint point position and the right hand 3d joint point position are obtained.
8. A method of real-time detection of bimanual gestures according to claim 7, further comprising: the three-dimensional joint point extraction network comprises a first fully-connected layer, a dual linear module and a second fully-connected layer.
9. A method of real-time detection of bimanual gestures as claimed in claim 8, further comprising: the dual linear module comprises a first dual linear module and a second dual linear module, wherein the first dual linear module and the second dual linear module respectively comprise two full connection layers.
10. A method of real-time detection of bimanual gestures according to claim 9, further comprising: the fitting in step S4 is a fitting by a minimized energy equation including a 2d joint point constraint term, a 3d joint point constraint term, a joint angle constraint term, and a time constraint term.
CN202010301111.1A 2020-04-16 2020-04-16 Real-time detection method for gestures of both hands Active CN111539288B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010301111.1A CN111539288B (en) 2020-04-16 2020-04-16 Real-time detection method for gestures of both hands

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010301111.1A CN111539288B (en) 2020-04-16 2020-04-16 Real-time detection method for gestures of both hands

Publications (2)

Publication Number Publication Date
CN111539288A true CN111539288A (en) 2020-08-14
CN111539288B CN111539288B (en) 2023-04-07

Family

ID=71976803

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010301111.1A Active CN111539288B (en) 2020-04-16 2020-04-16 Real-time detection method for gestures of both hands

Country Status (1)

Country Link
CN (1) CN111539288B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112233222A (en) * 2020-09-29 2021-01-15 深圳市易尚展示股份有限公司 Human body parametric three-dimensional model deformation method based on neural network joint point estimation
CN113158774A (en) * 2021-03-05 2021-07-23 北京华捷艾米科技有限公司 Hand segmentation method, device, storage medium and equipment

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107992858A (en) * 2017-12-25 2018-05-04 深圳市唯特视科技有限公司 A kind of real-time three-dimensional gesture method of estimation based on single RGB frame
CN109635630A (en) * 2018-10-23 2019-04-16 百度在线网络技术(北京)有限公司 Hand joint point detecting method, device and storage medium
CN109800676A (en) * 2018-12-29 2019-05-24 上海易维视科技股份有限公司 Gesture identification method and system based on depth information
CN110287844A (en) * 2019-06-19 2019-09-27 北京工业大学 Traffic police's gesture identification method based on convolution posture machine and long memory network in short-term
CN110741385A (en) * 2019-06-26 2020-01-31 Oppo广东移动通信有限公司 Gesture recognition method and device and location tracking method and device
CN110837778A (en) * 2019-10-12 2020-02-25 南京信息工程大学 Traffic police command gesture recognition method based on skeleton joint point sequence

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107992858A (en) * 2017-12-25 2018-05-04 深圳市唯特视科技有限公司 A kind of real-time three-dimensional gesture method of estimation based on single RGB frame
CN109635630A (en) * 2018-10-23 2019-04-16 百度在线网络技术(北京)有限公司 Hand joint point detecting method, device and storage medium
CN109800676A (en) * 2018-12-29 2019-05-24 上海易维视科技股份有限公司 Gesture identification method and system based on depth information
CN110287844A (en) * 2019-06-19 2019-09-27 北京工业大学 Traffic police's gesture identification method based on convolution posture machine and long memory network in short-term
CN110741385A (en) * 2019-06-26 2020-01-31 Oppo广东移动通信有限公司 Gesture recognition method and device and location tracking method and device
CN110837778A (en) * 2019-10-12 2020-02-25 南京信息工程大学 Traffic police command gesture recognition method based on skeleton joint point sequence

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112233222A (en) * 2020-09-29 2021-01-15 深圳市易尚展示股份有限公司 Human body parametric three-dimensional model deformation method based on neural network joint point estimation
CN113158774A (en) * 2021-03-05 2021-07-23 北京华捷艾米科技有限公司 Hand segmentation method, device, storage medium and equipment
CN113158774B (en) * 2021-03-05 2023-12-29 北京华捷艾米科技有限公司 Hand segmentation method, device, storage medium and equipment

Also Published As

Publication number Publication date
CN111539288B (en) 2023-04-07

Similar Documents

Publication Publication Date Title
CN109377530B (en) Binocular depth estimation method based on depth neural network
CN111311729B (en) Natural scene three-dimensional human body posture reconstruction method based on bidirectional projection network
CN111160164B (en) Action Recognition Method Based on Human Skeleton and Image Fusion
CN108537754B (en) Face image restoration system based on deformation guide picture
CN113160375B (en) Three-dimensional reconstruction and camera pose estimation method based on multi-task learning algorithm
CN112330729A (en) Image depth prediction method and device, terminal device and readable storage medium
CN112950471A (en) Video super-resolution processing method and device, super-resolution reconstruction model and medium
CN112329525A (en) Gesture recognition method and device based on space-time diagram convolutional neural network
CN111539288B (en) Real-time detection method for gestures of both hands
CN112837215B (en) Image shape transformation method based on generation countermeasure network
CN111950477A (en) Single-image three-dimensional face reconstruction method based on video surveillance
CN113628348A (en) Method and equipment for determining viewpoint path in three-dimensional scene
CN113221726A (en) Hand posture estimation method and system based on visual and inertial information fusion
CN113283525A (en) Image matching method based on deep learning
CN111951195A (en) Image enhancement method and device
CN113807361A (en) Neural network, target detection method, neural network training method and related products
CN113554039A (en) Method and system for generating optical flow graph of dynamic image based on multi-attention machine system
CN115187638A (en) Unsupervised monocular depth estimation method based on optical flow mask
CN112712019A (en) Three-dimensional human body posture estimation method based on graph convolution network
CN116524121A (en) Monocular video three-dimensional human body reconstruction method, system, equipment and medium
CN115035456A (en) Video denoising method and device, electronic equipment and readable storage medium
CN117612204A (en) Construction method and system of three-dimensional hand gesture estimator
Al Ismaeil et al. Real-time enhancement of dynamic depth videos with non-rigid deformations
CN115909496A (en) Two-dimensional hand posture estimation method and system based on multi-scale feature fusion network
CN115565039A (en) Monocular input dynamic scene new view synthesis method based on self-attention mechanism

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant