CN111539288A - Real-time detection method for gestures of both hands - Google Patents
Real-time detection method for gestures of both hands Download PDFInfo
- Publication number
- CN111539288A CN111539288A CN202010301111.1A CN202010301111A CN111539288A CN 111539288 A CN111539288 A CN 111539288A CN 202010301111 A CN202010301111 A CN 202010301111A CN 111539288 A CN111539288 A CN 111539288A
- Authority
- CN
- China
- Prior art keywords
- hand
- joint point
- real
- joint
- time detection
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/107—Static hand or arm
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/048—Activation functions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/26—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
- G06V10/267—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/46—Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
- G06V10/462—Salient features, e.g. scale invariant feature transforms [SIFT]
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Biomedical Technology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Multimedia (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Human Computer Interaction (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a real-time detection method of two-hand postures, which can reconstruct skeleton models of two hands by reconstructing the two-hand postures by adopting 2d joint point positions and 3d joint point positions, can clearly construct even the two-hand postures of complex interaction, solves the problem that the two-hand postures of complex interaction can not be detected in the prior art, and simultaneously can reduce the operation difficulty of reconstructing the two-hand skeleton models and improve the speed of reconstructing the two-hand skeleton models by adopting a mode of fitting the 2d joint point positions and the 3d joint point positions, thereby ensuring the real-time property of detecting the two-hand postures and solving the problem that the real-time property is difficult to realize in the prior art.
Description
Technical Field
The invention relates to the technical field of gesture detection, in particular to a real-time detection method for gestures of two hands.
Background
The hand plays a very critical role in human daily life, hand gestures contain a large amount of non-language communication information, tracking and reconstruction of hand gestures become more and more important, prediction of 3D hand gestures is a long-term research direction in computer vision, and a large number of applications are applied in the fields of virtual/augmented reality (VR/AR), human-computer interaction, human motion tracking and control, and the like, wherein real-time and accurate detection of hand gestures is required in all of the applications.
However, the conventional method for detecting the hand posture has the following disadvantages: 1. only two hands with simple gestures can be detected, and two-hand gestures with complex interaction cannot be detected; 2. when the mesh of the hand posture is reconstructed, a large amount of calculation and more hardware resources are needed, and the real-time performance is difficult to meet.
Disclosure of Invention
The invention aims to provide a real-time detection method for the posture of two hands, which solves the problems that the posture of two hands with complex interaction cannot be detected and the real-time performance is difficult to realize in the prior art.
The invention is realized by the following technical scheme:
a real-time detection method of double-hand posture is based on a monocular camera and specifically comprises the following steps:
step S1, capturing single-frame images of both hands by a monocular camera, inputting the single-frame images into an image segmentation network for segmentation, and segmenting into segmentation results of three categories including a left hand, a right hand and a background;
step S2, extracting a left-hand heat map comprising the position of the left-hand 2d joint point and a right-hand heat map comprising the position of the right-hand 2d joint point according to the segmentation result;
step S3, calculating the position of a left-hand 3d joint point and the position of a right-hand 3d joint point according to a left-hand heat map comprising the positions of the left-hand 2d joint points and a right-hand heat map comprising the positions of the right-hand 2d joint points;
and step S4, fitting the positions of the left-hand 2d joint points and the positions of the left-hand 3d joint points with the left-hand skeleton model, and fitting the positions of the right-hand 2d joint points and the positions of the right-hand 3d joint points with the right-hand skeleton model to obtain parameters of the left-hand skeleton model and the right-hand skeleton model, so as to obtain the postures of the two hands.
As a further alternative to the real-time detection method of the two-hand posture, the step S1 includes the steps of:
step S11, extracting image features according to the input double-hand single-frame image;
step S12, performing up-sampling operation on the image features to obtain probability graphs of three categories including a left hand, a right hand and a background;
and step S13, obtaining segmentation results of three categories including the left hand, the right hand and the background according to the probability graph including the left hand, the right hand and the background.
As a further alternative to the real-time detection method of two-handed gestures, the image segmentation network comprises a first convolutional layer, a second convolutional layer, and a transposed convolutional layer.
As a further alternative to the real-time detection method of the two-hand posture, the step S11 includes the steps of:
step S111, inputting the two-hand single-frame image into a first convolution layer for down-sampling processing;
in step S112, the downsampled image is input to the second convolution layer and image feature extraction is performed.
As a further alternative to the real-time detection method of the two-hand posture, the step S2 includes the steps of:
step S21, overlapping the segmentation results of the three categories including the left hand, the right hand and the background with the original single-frame image, inputting the result into a two-dimensional joint point extraction network after overlapping, and performing down-sampling processing to obtain posture characteristics;
step S22, performing upsampling processing on the posture features to obtain a left-hand heat map including the position of the left-hand 2d joint point and a right-hand heat map including the position of the right-hand 2d joint point.
As a further alternative to the real-time detection method of two-hand gestures, the two-dimensional joint point extraction network includes a network of Hourglass structures and a third convolutional layer.
As a further alternative to the real-time detection method of the two-hand posture, the step S3 includes the steps of:
step S31, extracting the confidence coefficient of the left-hand 2d joint point and the confidence coefficient of the right-hand 2d joint point according to the left-hand heat map and the right-hand heat map;
step S32, the left hand 2d joint point position and the confidence coefficient of the left hand 2d joint point, and the right hand 2d joint point position and the confidence coefficient of the right hand 2d joint point are input into a three-dimensional joint point extraction network, and the left hand 3d joint point position and the right hand 3d joint point position are obtained.
As a further alternative to the real-time detection method of two-hand gestures, the three-dimensional joint point extraction network comprises a first fully-connected layer, a dual linear module, and a second fully-connected layer.
As a further alternative to the real-time detection method of two-hand gestures, the dual linear module comprises a first dual linear module and a second dual linear module, the first dual linear module and the second dual linear module each comprising two fully-connected layers.
As a further alternative of the real-time detection method of the two-hand posture, the fitting in step S4 is performed by a minimized energy equation including a 2d joint point constraint term, a 3d joint point constraint term, a joint angle constraint term, and a time constraint term.
The invention has the beneficial effects that:
by using the method, the two-hand posture reconstruction is carried out by adopting the 2d joint point position and the 3d joint point position, so that the skeleton models of two hands can be reconstructed, even the two-hand posture with complex interaction can be clearly constructed, the problem that the two-hand posture with complex interaction can not be detected in the prior art is solved, meanwhile, the calculation difficulty of reconstructing the two-hand skeleton models can be reduced by adopting the mode of fitting the 2d joint point position and the 3d joint point position, the speed of reconstructing the two-hand skeleton models is improved, the real-time property of detecting the two-hand posture is ensured, and the problem that the real-time property is difficult to realize in the prior art is solved.
Drawings
FIG. 1 is a schematic flow chart of a real-time detection method of two-hand gestures according to the present invention;
FIG. 2 is a schematic diagram illustrating the components of an image segmentation network in the real-time detection method for two-hand gestures according to the present invention;
FIG. 3 is a schematic diagram illustrating a two-dimensional joint extraction network in the real-time detection method of two-hand gestures according to the present invention;
FIG. 4 is a schematic diagram illustrating a three-dimensional joint extraction network in a real-time detection method for two-hand gestures according to the present invention;
description of reference numerals: 1. a first winding layer; 2. a second convolutional layer; 3. transposing the convolution layer; 4. a network of the Hourglass architecture; 5. a third convolutional layer; 6. a first fully-connected layer; 7. a first dual linear module; 8. a second dual linear module; 9. a second fully connected layer.
Detailed Description
The invention will be described in detail with reference to the drawings and specific embodiments, which are illustrative of the invention and are not to be construed as limiting the invention.
As shown in fig. 1 to 4, a real-time detection method for a two-hand gesture, which is based on a monocular camera, specifically includes the following steps:
step S1, capturing single-frame images of both hands by a monocular camera, inputting the single-frame images into an image segmentation network for segmentation, and segmenting into segmentation results of three categories including a left hand, a right hand and a background;
step S2, extracting a left-hand heat map comprising the position of the left-hand 2d joint point and a right-hand heat map comprising the position of the right-hand 2d joint point according to the segmentation result;
step S3, calculating the position of a left-hand 3d joint point and the position of a right-hand 3d joint point according to a left-hand heat map comprising the positions of the left-hand 2d joint points and a right-hand heat map comprising the positions of the right-hand 2d joint points;
and step S4, fitting the positions of the left-hand 2d joint points and the positions of the left-hand 3d joint points with the left-hand skeleton model, and fitting the positions of the right-hand 2d joint points and the positions of the right-hand 3d joint points with the right-hand skeleton model to obtain parameters of the left-hand skeleton model and the right-hand skeleton model, so as to obtain the postures of the two hands.
In this embodiment, carry out both hands posture through adopting 2d joint point position and 3d joint point position and rebuild, can rebuild out the skeleton model of both hands, even complicated interactive both hands posture also can clearly be found, the problem that can't detect complicated interactive both hands posture that prior art exists has been solved, and simultaneously, carry out the mode of fitting through adopting 2d joint point position and 3d joint point position, can reduce the operation degree of difficulty of rebuilding both hands skeleton model, promote the speed of rebuilding both hands skeleton model, thereby the real-time of detecting both hands posture has been guaranteed, thereby the problem that the real-time nature is difficult to realize that prior art exists has been solved.
It should be noted that, the skeleton model of two hands, each hand includes 21 joint points 2d and 21 joint points 3d, wherein the joint point at the wrist is used as the root joint point, there are four joint points on each finger, and the skeleton of each hand has 26 degrees of freedom, wherein there are 6 degrees of freedom at the root joint point at the wrist, and there are 4 degrees of freedom in each finger.
Preferably, the step S1 includes the steps of:
step S11, extracting image features according to the input double-hand single-frame image;
step S12, performing up-sampling operation on the image features to obtain probability graphs of three categories including a left hand, a right hand and a background;
and step S13, obtaining segmentation results of three categories including the left hand, the right hand and the background according to the probability graph including the left hand, the right hand and the background.
In the embodiment, a captured single-frame image is input into an image segmentation network to obtain a segmentation image comprising three categories of a left hand, a right hand and a background; the method specifically comprises the following steps: the image segmentation network firstly extracts image characteristics through downsampling, then restores the image characteristics to original pixels through upsampling, and adds the characteristics which are the same as the pixels in the downsampling process during the upsampling process to be used as input of the next upsampling process, so that the characteristics in the original image can be guaranteed not to be lost.
Preferably, the image segmentation network comprises a first convolutional layer 1, a second convolutional layer 2 and a transposed convolutional layer 3.
In this embodiment, the first convolutional layer 1 is an encoder, the transposed convolutional layer 3 is a decoder, and the resolution of the image is reduced by the encoder and restored by the decoder.
Preferably, the step S11 includes the steps of:
step S111, inputting the two-hand single-frame image into a first convolution layer for down-sampling processing;
in step S112, the downsampled image is input to the second convolution layer and image feature extraction is performed.
In this embodiment, the first convolutional layer 1 includes five convolutional layers having a convolutional kernel size of 3 and a step size of 2, and is capable of reducing the resolution of an input image to half of the original resolution, reducing the resolution to thirty-half of the original image after five successive dimensionalities reduction, the convolutional kernel size of the second convolutional layer 2 is 3 and a step size of 1, and is capable of extracting image features, and the transposed convolutional layer 3 includes five convolutional layers having a convolutional kernel size of 3 and a step size of 2, and is capable of increasing the resolution of the input features to twice of the original resolution.
Preferably, the step S2 includes the steps of:
step S21, overlapping the segmentation results of the three categories including the left hand, the right hand and the background with the original single-frame image, inputting the result into a two-dimensional joint point extraction network after overlapping, and performing down-sampling processing to obtain posture characteristics;
step S22, performing upsampling processing on the posture features to obtain a left-hand heat map including the position of the left-hand 2d joint point and a right-hand heat map including the position of the right-hand 2d joint point.
In this embodiment, the original single frame image and the segmentation result are superimposed together, and the superimposed single frame image and the segmentation result are input into a two-dimensional joint point extraction network, the network firstly performs down-sampling to extract posture features, and then performs up-sampling to obtain 42H × W probability maps, where H is the height of the original image, W is the width of the original image, each probability map represents the position of one joint point, the corresponding position with the maximum value in the probability map is the position of the corresponding two-dimensional joint point, and the corresponding 42 joint points can be extracted from the probability map, where 21 joint points exist on the left hand and 21 joint points exist on the right hand.
Preferably, the two-dimensional joint point extraction network includes a network 4 of a Hourglass structure and a third convolution layer 5.
Preferably, the step S3 includes the steps of:
step S31, extracting the confidence coefficient of the left-hand 2d joint point and the confidence coefficient of the right-hand 2d joint point according to the left-hand heat map and the right-hand heat map;
step S32, the left hand 2d joint point position and the confidence coefficient of the left hand 2d joint point, and the right hand 2d joint point position and the confidence coefficient of the right hand 2d joint point are input into a three-dimensional joint point extraction network, and the left hand 3d joint point position and the right hand 3d joint point position are obtained.
In this embodiment, the position of the point with the largest value in each heat map is the position of the 2d joint point, and this value is the confidence of the 2d joint point prediction, so the confidence of the left-hand 2d joint point position and the left-hand 2d joint point, and the confidence of the right-hand 2d joint point position and the right-hand 2d joint point can be extracted through the heat maps.
Preferably, the three-dimensional joint point extraction network comprises a first fully-connected layer 6, a dyadic linear module and a second fully-connected layer 9.
Preferably, the dual linear module includes a first dual linear module 7 and a second dual linear module 8, and the first dual linear module 7 and the second dual linear module 8 respectively include two fully connected layers.
Preferably, the fitting in step S4 is performed by a minimization energy equation including a 2d joint point constraint term, a 3d joint point constraint term, a joint angle constraint term, and a time constraint term.
Example (b):
step S1, shooting single-frame images of both hands by a monocular camera, and inputting the shot single-frame images into an image segmentation network to obtain three types of segmentation maps, each of which is: left hand, right hand, and background; specifically, the image segmentation network firstly extracts image features through downsampling, then restores the image features to original pixels through upsampling, and adds the features which are the same as the pixels during the downsampling during the upsampling to be used as the input of the next upsampling, so that the features in the original image can be ensured not to be lost, the output of the network is a probability graph of H W3, wherein H is the height of the original image, W is the width of the original image, in the result of H W, values of three channels corresponding to each point are used as the probabilities of three categories, and the result of H W1 is extracted from the probability graph, in the result, the value of a background part is 0, the value of a left hand part is 1, and the value of a right hand part is 2; it should be noted that, the segmentation result corresponds to the original image one to one, the position where the point with the median value of 1 in the segmentation result is located corresponds to the original image, that is, the pixel point where the left hand is located, and the point with the median value of 2 in the segmentation result corresponds to the pixel point where the right hand is located in the original image; when training the image segmentation network, calculating the cross entropy of the predicted value and the true value by using the following loss function:
wherein M represents three categories, 3, S in the present inventioniAndrespectively representing the real value and the predicted value of the ith class segmentation result.
And step S2, superposing the original single-frame image and the segmentation result together to obtain one H W4 feature as input into a two-dimensional joint point extraction network, firstly down-sampling the extracted feature by the network, and then up-sampling the extracted feature to obtain 42H W probability maps, wherein each probability map represents the position of one joint point, the position corresponding to the point with the maximum value in the probability maps is the position of the corresponding two-dimensional joint point, and the corresponding 42 joint points can be extracted from the probability maps, wherein the left hand has 21 joint points, and the right hand has 21 joint points.
Step S3 is a process of representing the position of each joint point by using a heat map, and extracting the position of the point with the maximum value from the heat map, that is, 2 obtained by predictiond position of joint point, maximum value c in ith thermogrami∈[0,1]For predicting the confidence of the ith joint point, batch normalization operation and sigmoid activation operation are required to be performed after each layer, and the following loss function is adopted in the training of the step:
wherein N is the number of 2d joint points, and 42 u is taken in the inventioniAndrespectively representing the true value and the predicted value of the ith key point;
combining the position of the left-hand 2d joint point and the confidence coefficient of the left-hand 2d joint point, combining the position of the right-hand 2d joint point and the confidence coefficient of the right-hand 2d joint point, inputting a combined result into a three-dimensional joint point extraction network to obtain the position of a left-hand 3d joint point and the position of a right-hand 3d joint point, specifically, firstly expanding an input vector to 1024 dimensions through a full connecting layer, then passing through two dual linear modules, and finally converting the input vector to 42 x 3 through the full connecting layer to obtain the global positions of 42 left-hand and right-hand joint points;
it should be noted that the following loss function is used for training:
wherein, JiIs the true value of the position of the joint point,the predicted value of the joint point position is shown, and N is the number of the joint points.
Step S4, using a moving skeleton model to fit to the predicted 2d/3d joint points, wherein the skeleton model of each hand comprises 26 degrees of freedom, t ∈ R3And R ∈ SO (3) respectively represent the global position and rotation angle of the root joint point, θ ∈ R20Representing fingersTake Θ ═ t, R, θ } as a parameter of the skeletal model, by transforming M (Θ) ∈ R21×3Obtaining the global position of the hand joint point, and recording the parameters of the left and right hand skeletons as thetaLAnd ΘR,ΘH={ΘL,ΘRDenotes the skeletal parameters of both hands, fitting the skeletal model to 3d joint points by minimizing the following, where JiRepresents the global position of the ith 3d joint point:
in addition, the 2d joint points are used as additional constraints to make the predicted result more fit to the features of the hand in the original image. Fitting the skeleton to the 2d joint point by minimizing the following formula, where uiDenotes the position of the ith 2d joint, pi is used to project the 3d joint onto the 2d plane:
in order to keep the posture of the hand skeleton model normal, it is necessary to ensure that the hand joints do not have large-angle bending, and therefore, limitation needs to be added to the joint angles. Here we only constrain the parameters predicted from the first frame, letAndthe upper limit and the lower limit of the ith joint angle are respectively, and the joint angle is monitored by the following formula:
in order to avoid the excessive change of the hand posture amplitude obtained by reconstruction between adjacent frames, the change rate of the parameters obtained by prediction of two adjacent frames needs to be constrained, as shown in the following formula:
constraining the skeleton fitting process through the four formulas, and fitting by minimizing the following energy equation to obtain thetaHWhere w is the weight of each term, w is the time when predicting the parameters of the first frame3Not 0, in the subsequent prediction, w3Is 0:
E=ω1E3D+ω2E2D+ω3E
during training, the left-hand segmentation, the 2d joint point prediction and the 3d joint point prediction tasks are pre-trained respectively, and then the prediction of the 2d/3d joint points is trained end to end.
The technical solutions provided by the embodiments of the present invention are described in detail above, and the principles and embodiments of the present invention are explained herein by using specific examples, and the descriptions of the embodiments are only used to help understanding the principles of the embodiments of the present invention; meanwhile, for a person skilled in the art, according to the embodiments of the present invention, there may be variations in the specific implementation manners and application ranges, and in summary, the content of the present description should not be construed as a limitation to the present invention.
Claims (10)
1. A real-time detection method of bimanual gestures, said method based on a monocular camera, characterized in that: the method specifically comprises the following steps:
step S1, capturing a double-hand single-frame image through a monocular camera, inputting the single-frame image into an image segmentation network for segmentation, and segmenting into segmentation results of three categories including a left hand, a right hand and a background;
step S2, extracting a left-hand heat map comprising the position of the left-hand 2d joint point and a right-hand heat map comprising the position of the right-hand 2d joint point according to the segmentation result;
step S3, calculating the position of a left-hand 3d joint point and the position of a right-hand 3d joint point according to a left-hand heat map comprising the positions of the left-hand 2d joint points and a right-hand heat map comprising the positions of the right-hand 2d joint points;
and step S4, fitting the positions of the left-hand 2d joint points and the positions of the left-hand 3d joint points with the left-hand skeleton model, and fitting the positions of the right-hand 2d joint points and the positions of the right-hand 3d joint points with the right-hand skeleton model to obtain parameters of the left-hand skeleton model and the right-hand skeleton model, so as to obtain the postures of the two hands.
2. A method of real-time detection of bimanual gestures as claimed in claim 1, further comprising: the step S1 includes the steps of:
step S11, extracting image features according to the input double-hand single-frame image;
step S12, performing up-sampling operation on the image features to obtain probability graphs of three categories including a left hand, a right hand and a background;
and step S13, obtaining segmentation results of three categories including the left hand, the right hand and the background according to the probability graph including the left hand, the right hand and the background.
3. A method of real-time detection of bimanual gestures as claimed in claim 2, further comprising: the image segmentation network includes a first convolutional layer, a second convolutional layer, and a transposed convolutional layer.
4. A method of real-time detection of bimanual gestures as claimed in claim 3, further comprising: the step S11 includes the steps of:
step S111, inputting the two-hand single-frame image into a first convolution layer for down-sampling processing;
in step S112, the downsampled image is input to the second convolution layer and image feature extraction is performed.
5. A method for real-time detection of bimanual gestures according to claim 1 or 4, further comprising: the step S2 includes the steps of:
step S21, overlapping the segmentation results of the three categories including the left hand, the right hand and the background with the original single-frame image, inputting the result into a two-dimensional joint point extraction network after overlapping, and performing down-sampling processing to obtain posture characteristics;
step S22, performing upsampling processing on the posture features to obtain a left-hand heat map including the position of the left-hand 2d joint point and a right-hand heat map including the position of the right-hand 2d joint point.
6. A method of real-time detection of bimanual gestures according to claim 5, further comprising: the two-dimensional joint point extraction network comprises a network of a Hourglass structure and a third convolution layer.
7. The method of claim 6, wherein the real-time detection of the two-hand gesture comprises: the step S3 includes the steps of:
step S31, extracting the confidence coefficient of the left-hand 2d joint point and the confidence coefficient of the right-hand 2d joint point according to the left-hand heat map and the right-hand heat map;
step S32, the left hand 2d joint point position and the confidence coefficient of the left hand 2d joint point, and the right hand 2d joint point position and the confidence coefficient of the right hand 2d joint point are input into a three-dimensional joint point extraction network, and the left hand 3d joint point position and the right hand 3d joint point position are obtained.
8. A method of real-time detection of bimanual gestures according to claim 7, further comprising: the three-dimensional joint point extraction network comprises a first fully-connected layer, a dual linear module and a second fully-connected layer.
9. A method of real-time detection of bimanual gestures as claimed in claim 8, further comprising: the dual linear module comprises a first dual linear module and a second dual linear module, wherein the first dual linear module and the second dual linear module respectively comprise two full connection layers.
10. A method of real-time detection of bimanual gestures according to claim 9, further comprising: the fitting in step S4 is a fitting by a minimized energy equation including a 2d joint point constraint term, a 3d joint point constraint term, a joint angle constraint term, and a time constraint term.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010301111.1A CN111539288B (en) | 2020-04-16 | 2020-04-16 | Real-time detection method for gestures of both hands |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010301111.1A CN111539288B (en) | 2020-04-16 | 2020-04-16 | Real-time detection method for gestures of both hands |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111539288A true CN111539288A (en) | 2020-08-14 |
CN111539288B CN111539288B (en) | 2023-04-07 |
Family
ID=71976803
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010301111.1A Active CN111539288B (en) | 2020-04-16 | 2020-04-16 | Real-time detection method for gestures of both hands |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111539288B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112233222A (en) * | 2020-09-29 | 2021-01-15 | 深圳市易尚展示股份有限公司 | Human body parametric three-dimensional model deformation method based on neural network joint point estimation |
CN113158774A (en) * | 2021-03-05 | 2021-07-23 | 北京华捷艾米科技有限公司 | Hand segmentation method, device, storage medium and equipment |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107992858A (en) * | 2017-12-25 | 2018-05-04 | 深圳市唯特视科技有限公司 | A kind of real-time three-dimensional gesture method of estimation based on single RGB frame |
CN109635630A (en) * | 2018-10-23 | 2019-04-16 | 百度在线网络技术(北京)有限公司 | Hand joint point detecting method, device and storage medium |
CN109800676A (en) * | 2018-12-29 | 2019-05-24 | 上海易维视科技股份有限公司 | Gesture identification method and system based on depth information |
CN110287844A (en) * | 2019-06-19 | 2019-09-27 | 北京工业大学 | Traffic police's gesture identification method based on convolution posture machine and long memory network in short-term |
CN110741385A (en) * | 2019-06-26 | 2020-01-31 | Oppo广东移动通信有限公司 | Gesture recognition method and device and location tracking method and device |
CN110837778A (en) * | 2019-10-12 | 2020-02-25 | 南京信息工程大学 | Traffic police command gesture recognition method based on skeleton joint point sequence |
-
2020
- 2020-04-16 CN CN202010301111.1A patent/CN111539288B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107992858A (en) * | 2017-12-25 | 2018-05-04 | 深圳市唯特视科技有限公司 | A kind of real-time three-dimensional gesture method of estimation based on single RGB frame |
CN109635630A (en) * | 2018-10-23 | 2019-04-16 | 百度在线网络技术(北京)有限公司 | Hand joint point detecting method, device and storage medium |
CN109800676A (en) * | 2018-12-29 | 2019-05-24 | 上海易维视科技股份有限公司 | Gesture identification method and system based on depth information |
CN110287844A (en) * | 2019-06-19 | 2019-09-27 | 北京工业大学 | Traffic police's gesture identification method based on convolution posture machine and long memory network in short-term |
CN110741385A (en) * | 2019-06-26 | 2020-01-31 | Oppo广东移动通信有限公司 | Gesture recognition method and device and location tracking method and device |
CN110837778A (en) * | 2019-10-12 | 2020-02-25 | 南京信息工程大学 | Traffic police command gesture recognition method based on skeleton joint point sequence |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112233222A (en) * | 2020-09-29 | 2021-01-15 | 深圳市易尚展示股份有限公司 | Human body parametric three-dimensional model deformation method based on neural network joint point estimation |
CN113158774A (en) * | 2021-03-05 | 2021-07-23 | 北京华捷艾米科技有限公司 | Hand segmentation method, device, storage medium and equipment |
CN113158774B (en) * | 2021-03-05 | 2023-12-29 | 北京华捷艾米科技有限公司 | Hand segmentation method, device, storage medium and equipment |
Also Published As
Publication number | Publication date |
---|---|
CN111539288B (en) | 2023-04-07 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109377530B (en) | Binocular depth estimation method based on depth neural network | |
CN111311729B (en) | Natural scene three-dimensional human body posture reconstruction method based on bidirectional projection network | |
CN111160164B (en) | Action Recognition Method Based on Human Skeleton and Image Fusion | |
CN108537754B (en) | Face image restoration system based on deformation guide picture | |
CN113160375B (en) | Three-dimensional reconstruction and camera pose estimation method based on multi-task learning algorithm | |
CN112330729A (en) | Image depth prediction method and device, terminal device and readable storage medium | |
CN112950471A (en) | Video super-resolution processing method and device, super-resolution reconstruction model and medium | |
CN112329525A (en) | Gesture recognition method and device based on space-time diagram convolutional neural network | |
CN111539288B (en) | Real-time detection method for gestures of both hands | |
CN112837215B (en) | Image shape transformation method based on generation countermeasure network | |
CN111950477A (en) | Single-image three-dimensional face reconstruction method based on video surveillance | |
CN113628348A (en) | Method and equipment for determining viewpoint path in three-dimensional scene | |
CN113221726A (en) | Hand posture estimation method and system based on visual and inertial information fusion | |
CN113283525A (en) | Image matching method based on deep learning | |
CN111951195A (en) | Image enhancement method and device | |
CN113807361A (en) | Neural network, target detection method, neural network training method and related products | |
CN113554039A (en) | Method and system for generating optical flow graph of dynamic image based on multi-attention machine system | |
CN115187638A (en) | Unsupervised monocular depth estimation method based on optical flow mask | |
CN112712019A (en) | Three-dimensional human body posture estimation method based on graph convolution network | |
CN116524121A (en) | Monocular video three-dimensional human body reconstruction method, system, equipment and medium | |
CN115035456A (en) | Video denoising method and device, electronic equipment and readable storage medium | |
CN117612204A (en) | Construction method and system of three-dimensional hand gesture estimator | |
Al Ismaeil et al. | Real-time enhancement of dynamic depth videos with non-rigid deformations | |
CN115909496A (en) | Two-dimensional hand posture estimation method and system based on multi-scale feature fusion network | |
CN115565039A (en) | Monocular input dynamic scene new view synthesis method based on self-attention mechanism |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |