CN110569817B - System and method for realizing gesture recognition based on vision - Google Patents

System and method for realizing gesture recognition based on vision Download PDF

Info

Publication number
CN110569817B
CN110569817B CN201910865437.4A CN201910865437A CN110569817B CN 110569817 B CN110569817 B CN 110569817B CN 201910865437 A CN201910865437 A CN 201910865437A CN 110569817 B CN110569817 B CN 110569817B
Authority
CN
China
Prior art keywords
finger
hand
coordinates
value
state
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910865437.4A
Other languages
Chinese (zh)
Other versions
CN110569817A (en
Inventor
王敬宇
孙海峰
王晶
戚琦
黄伟亭
任鹏飞
穆正阳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Posts and Telecommunications
Original Assignee
Beijing University of Posts and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Posts and Telecommunications filed Critical Beijing University of Posts and Telecommunications
Priority to CN201910865437.4A priority Critical patent/CN110569817B/en
Publication of CN110569817A publication Critical patent/CN110569817A/en
Application granted granted Critical
Publication of CN110569817B publication Critical patent/CN110569817B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/107Static hand or arm
    • G06V40/113Recognition of static hand signs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/107Static hand or arm
    • G06V40/117Biometrics derived from hands

Abstract

The system for realizing gesture recognition based on vision comprises the following modules: the hand posture estimation device comprises a hand detection module, a hand posture estimation module and a hand posture estimation module; the method for realizing gesture recognition based on vision comprises the following operation steps: (1) inputting the aligned RGB pictures into a hand detection module to obtain a hand boundary frame; (2) the hand posture estimation module intercepts a corresponding hand part in the depth map to obtain the 3D coordinates of key joint points of the hand; (3) inputting the 3D coordinates of the key joint points of the hand part into a gesture recognition module to obtain digital gesture codes; (4) according to the digital gesture codes, similarity measurement is carried out on the gestures, so that gesture recognition is realized; the system and the method have good accuracy, real-time performance and robustness.

Description

System and method for realizing gesture recognition based on vision
Technical Field
The invention relates to a system and a method for realizing gesture recognition based on vision, which belong to the technical field of information, in particular to the technical field of computer vision.
Background
With the dramatic improvement of computer computing power and the vigorous development of deep learning, artificial intelligence has shown strong vitality and development prospect in recent years, the appearance of technologies such as face recognition and voice recognition changes the mode of human-computer interaction, but people are still exploring more accurate and efficient interaction modes which are more in line with the use habits of human beings, and with the evolution of user interfaces, especially the rapid development of virtual reality technology and augmented reality technology, the realization of non-contact remote human-computer interaction by gestures is regarded as the most representative and innovative interaction mode of the next generation. In the field of intelligent home furnishing, the intelligent household appliances or robots can be controlled more conveniently by introducing gesture control; in the fields of virtual reality and augmented reality, a stronger sense of reality can be obtained through a more natural expression mode of gesture operation; in the fields of games and education, user experience can be greatly enhanced, and the like.
Gesture recognition passes through hand wearing equipment information acquisition at first, like data gloves and optical marking equipment, the main articular spatial position of direct detection hand, and detection effect is good, but the price is expensive, uses at the field price/performance ratio low commonly used. Wearing additional equipment can guarantee the accuracy and stability of gesture recognition, but the way of natural expression of gestures is covered, and the additional burden is brought to the user.
In recent years, advances in depth imaging and depth learning have made significant breakthroughs in hand pose estimation based on depth data. First, due to the widespread use of commercial depth cameras (e.g., microsoft Kinect and intel Realsense), hand pose estimation technology has almost shifted to using only depth input, where depth information solves the ambiguity problem in monocular RGB input. Secondly, the deep learning fundamentally solves the solution of the task in the visual field, and particularly, the Convolutional Neural Network (CNN) becomes one of the most advanced learning frameworks in the image recognition field. The key to their success is their ability to learn the complex appearance of real-world objects from large amounts of labeled data, particularly the common visual features that can be used in tasks such as object detection, semantic segmentation, human pose estimation, and many others.
However, most of current hand posture estimation solutions based on depth data do not have a special hand detection module, a point closest to a camera is found on a depth map through a traditional image processing mode, an area near the point is determined as a hand, and the network interference resistance is poor due to the fact that the position of the hand is determined through the mode, the robustness is low, the position of the hand is limited to a certain extent, the method is not suitable for the situation that the front of the hand is shielded, and when the moving range of the hand is large, the positioning is often not accurate enough.
Hand detection and 3D hand posture estimation based on depth data, and then gesture recognition becomes a technical problem to be solved urgently in the fields of computer vision, man-machine interaction, virtual technology and the like.
Disclosure of Invention
In view of this, the present invention is directed to a system and a method for implementing gesture recognition based on computer vision, which achieve the purpose that a user directly interacts with a computer or a virtual scene without the help of a third-party tool, and thus obtain an immersive user experience.
In order to achieve the above object, the present invention provides a system for implementing gesture recognition based on vision, which includes the following modules:
the hand detection module: the function of the module is to obtain the bounding box of the hand from the input aligned RGB picture; the module is modified based on an SSD network and consists of three parts: a basic network sub-module, an additional layer sub-module and a prediction layer sub-module;
the basic network sub-module has the main functions of completing feature extraction, generating a feature map with a larger resolution, and generating a default bounding box with a set size and a set aspect ratio by using the feature map; the submodule is modified based on a VGG16 network, and specifically comprises the following steps: the submodule uses all convolutional layers of the VGG16 network, and two fully-connected layers of the VGG16 network are replaced by two common convolutional layers;
the additional layer submodule consists of a series of convolution layers, two convolution layers form a group, and the submodule has the main function of generating a feature map with smaller resolution and generating a default bounding box with set size and set aspect ratio by using the feature map;
the prediction layer submodule consists of convolution layers, and the submodule has the main function of performing two convolution filtering processes on each feature map, and predicting the position offset of a default boundary box on the feature map and the category confidence of the default boundary box respectively, namely the probability that the default boundary box contains a hand.
The convolution layer for predicting the position offset of the default boundary frame consists of 4 xq convolution kernels with the size of 3 x3 xp, wherein a parameter q is the number of the default boundary frames generated on each point of the feature map, and a parameter p is the number of channels of the feature map;
the convolutional layer predicting the confidence of the default bounding box class consists of c × q convolutional kernels of size 3 × 3 × p, where the parameter c is the total number of classes predicted.
A hand pose estimation module: the function of the module is: utilizing a hand boundary frame obtained by a hand detection module to carry out data preprocessing on a depth map corresponding to the aligned RBG picture, intercepting a corresponding hand part in the depth map, and inputting the hand part into a hand posture estimation network to obtain a 3D coordinate of a hand key joint point; the 3D coordinates of the key joint points are the positions of the joint points in an image coordinate system, and can be converted into a camera coordinate system, and when a camera is calibrated, the camera coordinate system is a world coordinate system; in order to improve robustness, when the hand detection module does not detect a hand, the hand posture estimation module intercepts points within a certain depth threshold of the depth map as a part of the hand; the hand pose estimation module directly uses the Resnet18 network to predict the 3D coordinates of the key joint points of the hand in the image coordinate system, i.e., (u, v, D);
a gesture recognition module: the module has the functions of identifying the relation between the state of a single finger and the finger and outputting a digital gesture code based on the output result of the hand posture estimation module, namely the 3D coordinate of the key joint point of the hand in an image coordinate system; according to the digital gesture codes, similarity measurement is carried out on the gestures, so that gesture recognition is realized; the recognition precision mainly depends on the precision of the hand posture estimation module, and the angles of palms and cameras, the sizes of palms and the like are stronger in robustness;
the specific contents of the default bounding box for generating the feature map and generating the set size and the set aspect ratio by the hand detection module are as follows:
the hand detection module extracts a plurality of feature maps with different resolutions from each convolution layer, and generates q default bounding boxes with a set size and a set aspect ratio at each point of the feature maps by using the feature maps.
The resolution of the feature map of a lower level is higher, and the generated default bounding box is smaller and is responsible for detecting small objects; the resolution of the feature map of a higher level is smaller, the generated default bounding box is larger, and the default bounding box is responsible for detecting large objects and combining the default bounding boxes of various sizes so as to improve the robustness of the system to the sizes of the detected objects;
when the hand detection module does not detect the hand, the hand posture estimation module intercepts points within a certain depth threshold of the depth map as the specific content of the hand part:
because the hand detection module may have incomplete hand detection or no hand detection, directly intercepting the depth value in the bounding box of the hand may cause the depth value of the hand to be seriously lost, so that the depth map needs to be preprocessed according to the bounding box of the hand, and a reasonable hand area is intercepted;
the specific method comprises the following steps:
calculating coordinates (u) of a center point of a bounding box of a hand when hand detection is incompleteo,vo) Calculating the average depth value d of each point in the depth map region corresponding to the bounding box of the handoForming point coordinates (u) in an image coordinate systemo,vo,do) Then, the coordinates (u) of the point in the image coordinate system are calculatedo,vo,do) Converting the image data into a camera coordinate system to be used as the center of a cubic boundary frame with a fixed size, intercepting a hand region by using the cubic boundary frame by using a hand posture estimation module, keeping points in the intercepted frame at an original value, setting points outside the frame as background points, and then converting the points in the frame into the image coordinate system to be used as the hand region for hand posture estimation; the size of the cubic boundary frame can be set according to the requirement so as to be suitable for hands with different shapes;
when the hand is not detected, sampling some points closest to the camera as the area of the hand to carry out hand posture estimation;
the gesture recognition module recognizes the specific contents of the relationship between the state of a single finger and the finger according to the 3D coordinates of the key joint points of the hand in the image coordinate system:
the state of a single finger is determined by the variance and relative values of the x, y, z coordinates of the key joint points on the finger, i.e.:
when the finger state is upward, the variance of x and z coordinates of key joint points on the finger is small, and the variance of y coordinates is large;
when the finger is in a bent state, the variance of x coordinates of key joint points on the finger is small, and the variance of y and z coordinates is large;
when the finger state is forward (aiming at other four fingers except the thumb), the variance of x and y coordinates of key joint points on the finger is small, and the variance of z coordinates is large;
when the finger state is a side edge (only aiming at the thumb), the variance of x and y coordinates of key joint points on the finger is larger, and the variance of z coordinates is small;
when the finger state is semi-closed, the variances of y and z coordinates of key joint points on the finger are larger and close, and the variance of x coordinates is smaller;
when the finger state is closed, the y coordinate of the key joint point on the finger from the interphalangeal to the palm is not monotonously increased.
The interphalangeal relationship is determined by the state of a single finger and the relative coordinates of the corresponding joint points between two fingers, namely:
when the interphalangeal relations of the two fingers are combined, the states of the two fingers are upward, and the difference of the x coordinate values of the corresponding joint points between the fingers is small;
when the interphalangeal relations of the two fingers are separated, the difference of the x coordinate values of the corresponding joint points between the fingers is larger;
when the interphalangeal relationship of the two fingers is crossed, the difference value of the x coordinates of the corresponding joint points between the fingers is positive or negative;
when the interphalangeal relationship of the two fingers is a loop, the x coordinates of the joint points corresponding to the interphalangeal and the palm center are close, and the x coordinates of the other corresponding joint points are larger in distance.
The digital gesture coded content is:
the digital gesture code is a digital vector consisting of 12 numbers, and the specific steps are as follows: (f)1,f2,f3,f4,f5,f12,f13,f14,f15,f23,f34,f45)TWherein the element fiThe index i belongs to {1, 2, 3, 4, 5}, and specifically: f. of1Representing the state of a single finger of the thumb, f2RepresentsState of index finger alone, f3Representing the state of a single finger of the middle finger, f4Representing the state of a single ring finger, f5Representing the state of a single little finger; element fijThe relation between the fingers is shown, the subscript i belongs to {1, 2, 3, 4, 5}, the subscript j belongs to {2, 3, 4, 5}, and specifically: f. of12Indicating the interphalangeal relationship between the thumb and index finger, f13Indicating the interphalangeal relationship between the thumb and middle finger, f14Indicating the interphalangeal relationship between the thumb and ring finger, f15Representing the interphalangeal relationship between the thumb and the little finger, f23Indicating the interphalangeal relationship between the index finger and the middle finger, f34Indicating the interphalangeal relationship between the middle finger and ring finger, f45Representing the interphalangeal relationship between the ring finger and the little finger;
the specific values are as follows:
fithe value is 1, which indicates that the finger state is upward; f. ofiThe value is 2, which indicates that the finger is bent; f. ofiThe value is 3, which indicates that the finger state is forward; f. ofiThe value is 4, which indicates that the finger state is a side edge; f. ofiThe value is 5, which indicates that the finger state is semi-closed; f. ofiThe value is 6, which indicates that the finger state is closed; f. ofiA value of 0 indicates undefined;
fija value of 1 indicates that the inter-finger relationship is separation; f. ofijThe value is 2, which represents the inter-finger relationship as a combination; f. ofijThe value is 3, which indicates that the inter-finger relationship is bifurcated; f. ofijThe value is 4, and the inter-finger relation is represented as a loop; f. ofijA value of 0 indicates undefined;
the invention also provides a method for realizing gesture recognition based on vision, which comprises the following operation steps:
(1) inputting the aligned RGB pictures into a hand detection module to obtain a hand boundary frame;
(2) the hand posture estimation module carries out data preprocessing on a depth map corresponding to the aligned RBG picture by using the bounding box of the hand, and intercepts the corresponding part of the hand in the depth map to obtain the 3D coordinates of key joint points of the hand; in order to improve robustness, when the hand detection module does not detect the hand, points within a certain depth threshold of the depth map are intercepted as the part of the hand;
(3) inputting the 3D coordinates of the key joint points of the hand part into a gesture recognition module to obtain digital gesture codes;
(4) and according to the digital gesture codes, carrying out similarity measurement on the gestures, thereby realizing gesture recognition.
The specific content of the step (4) is as follows: similarity measurement is carried out on the two gestures by calculating L1 paradigm distance d of the two digital gesture codes, and the calculation method is shown as the following formula:
d=∑i|xi-yi|
in the above formula, x ═ x1,x2,…,xn)T,y=(y1,y2,…,yn)TDigital gesture codes that are two gestures, respectively; the smaller d represents the greater similarity of the two gestures, and when the d is smaller than a set threshold value, the two gestures are judged to be the same, so that gesture recognition is realized.
In the step (1), the step (2) and the step (3), the RGB picture input as the system, the depth map corresponding to the RGB picture, and the 3D coordinates of the key joint point of the hand output as the system all belong to the same coordinate system, i.e., an image coordinate system, so that the recognition accuracy can be improved and the stability of the accuracy can be maintained; the 3D coordinates of the key joint points of the hand are the positions of the joint points in an image coordinate system, can be converted into a camera coordinate system, and the camera coordinate system is a world coordinate system after the camera is calibrated;
the invention has the beneficial effects that: by adding the real-time hand detection module, the problem that the hand detection is lost in a hand posture estimation scheme based on a depth map and the hand posture estimation scheme is difficult to apply to a complex scene is solved, particularly, the position of a hand can be accurately determined under the condition that the front of the hand is shielded, and the system has good accuracy, real-time performance and robustness; the invention also provides an implementation scheme of gesture coding, improves the practicability of gesture recognition, and provides a new idea for the landing application of gesture control.
Drawings
FIG. 1 is a block diagram of a system for visually recognizing gestures according to the present invention;
FIG. 2 is a network diagram of a hand detection module of the system for visually recognizing gestures according to the present invention;
FIG. 3 is a schematic diagram of 14 key joint points of a hand in an embodiment of the present invention;
FIG. 4 is a single finger state diagram;
FIG. 5 is a schematic view of the relationship between the two elements;
FIG. 6 is a flow chart of a method for implementing gesture recognition based on vision according to the present invention;
FIG. 7 is a graph showing the results of an experiment according to an embodiment of the present invention;
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail with reference to the accompanying drawings.
Referring to fig. 1, a system for visually recognizing gestures according to the present invention is described, the system comprising the following modules:
the hand detection module: the function of this module is to derive the hand bounding box from the input aligned RGB picture (color picture in RGB color space); the module is modified based on the SSD network (for SSD networks see the documents Liu Wei, Anguelov Dragomir, Erhan Dumitru, et al. SSD: Single Shot MultiBox Detector. in: ECCV.2016: 21-37.). Referring to fig. 2, the module consists of three parts: a basic network sub-module, an additional layer sub-module and a prediction layer sub-module;
the basic network sub-module has the main functions of completing feature extraction, generating a feature map with a larger resolution, and generating a default bounding box with a set size and a set aspect ratio by using the feature map; the submodule is modified based on a VGG16 network, and specifically comprises the following steps: the submodule uses all the convolutional layers of the VGG16 network and replaces the two fully-connected layers of the VGG16 network with two common convolutional layers (i.e. the Conv6 layer and the Conv7 layer in fig. 2); for VGG16 network, please refer to document K.Simony and dA.Zisserman, "Very deep capacitive networks for large-scale image retrieval," arXiv preprinting arXiv:1409.1556,2014.
The additional layer submodule consists of a series of convolution layers, two convolution layers form a group, and the submodule has the main function of generating a feature map with smaller resolution and generating a default bounding box with set size and set aspect ratio by using the feature map;
the prediction layer submodule consists of convolution layers, and the submodule has the main function of performing two convolution filtering processes on each feature map, and predicting the position offset of a default boundary box on the feature map and the category confidence of the default boundary box respectively, namely the probability that the default boundary box contains a hand.
The convolutional layer for predicting the position offset of the default bounding box is composed of 4 × q convolutional kernels with the size of 3 × 3 × p, wherein the parameter q is the number of default bounding boxes (4 or 6 in the embodiment) generated at each point of the feature map, and the parameter p is the number of channels of the feature map;
the convolutional layer predicting the confidence of default bounding box classes consists of c × q convolutional kernels of size 3 × 3 × p, where the parameter c is the total number of classes predicted (2 in the embodiment, i.e., hand and background classes).
A hand pose estimation module: the function of the module is: utilizing a hand boundary frame obtained by a hand detection module to carry out data preprocessing on a depth map corresponding to the aligned RBG picture, and intercepting a part of a corresponding hand in the depth map to obtain a 3D coordinate of a key joint point of the hand; the 3D coordinates of the key joint points are the positions of the joint points in an image coordinate system, and can be converted into a camera coordinate system, and when a camera is calibrated, the camera coordinate system is a world coordinate system; in order to improve robustness, when the hand detection module does not detect a hand, the hand posture estimation module intercepts points within a certain depth threshold of the depth map as a part of the hand; the hand pose estimation module directly uses the Resnet18 network to predict the 3D coordinates of the key joint points of the hand in the image coordinate system, i.e., (u, v, D); for the Resnet18 network, please refer to the references: he Kaim, Zhang Xiangyu, Ren Shaoqing, et al, deep residual learning for image recognition. in: CVPR.2016: 770-778.
Referring to fig. 3, in the embodiment, key joint points of the hand are as shown in fig. 3, and there are 14.
A gesture recognition module: the module has the functions of identifying the relation between the state of a single finger and the finger and outputting a digital gesture code based on the output result of the hand posture estimation module, namely the 3D coordinate of the key joint point of the hand in an image coordinate system; according to the digital gesture codes, similarity measurement is carried out on the gestures, so that gesture recognition is realized; the recognition precision mainly depends on the precision of the hand posture estimation module, and the angles of palms and cameras, the sizes of palms and the like are stronger in robustness;
the specific contents of the default bounding box for generating the feature map and generating the set size and the set aspect ratio by the hand detection module are as follows:
the hand detection module extracts a plurality of feature maps with different resolutions from each convolution layer, and generates q default bounding boxes with a set size and a set aspect ratio at each point of the feature maps by using the feature maps.
The resolution of the feature map of a lower level is higher, and the generated default bounding box is smaller and is responsible for detecting small objects; the resolution of the feature map of a higher level is smaller, the generated default bounding box is larger, and the default bounding box is responsible for detecting large objects and combining the default bounding boxes of various sizes so as to improve the robustness of the system to the sizes of the detected objects;
referring to fig. 2, in the embodiment, six feature maps, i.e., Conv4_3, Conv7, Conv8_2, Conv9_2, Conv10_2 and Conv11_2, are extracted. For a network input image with a resolution of 300 × 300, the resolutions of the six feature maps are 38 × 38, 19 × 19, 10 × 10, 5 × 5, 3 × 3 and 1 × 1, q default bounding boxes with different aspect ratios are generated on each feature map unit, q is 4, 6, 4 and 4 for the above 6 feature maps, and the corresponding aspect ratio is 4
Figure GDA0003268770900000081
Each feature map then has its corresponding convolution kernel in the prediction layer sub-module, the feature map being processedAnd after the convolution operation, obtaining the position offset and the category confidence of each default bounding box.
At each point of the feature map, 2 squares (corresponding to an aspect ratio α of 1) and 2 squares (corresponding to an aspect ratio) are generated, respectively
Figure GDA0003268770900000082
Or 4 (corresponding to aspect ratio)
Figure GDA0003268770900000083
Rectangle default bounding box, default bounding box width is
Figure GDA0003268770900000084
Gao Wei
Figure GDA0003268770900000085
Where m is the total number of said profiles, where m is 6, smin=0.2,smax0.9, i.e. sk0.2+0.14 (k-1). When alpha is 1, one more side length is added
Figure GDA0003268770900000086
The square bounding box of (1).
For a feature map with a resolution of n × n, q default bounding boxes are generated at each point, and n × n × k default bounding boxes are generated from one feature map, in this embodiment, a total of 8732 default bounding boxes are generated from 6 feature maps, i.e., 38 × 38 × 4+19 × 19 × 6+10 × 10 × 6+5 × 5 × 6+3 × 3 × 4+1 × 1 × 4.
When the hand detection module does not detect the hand, the hand posture estimation module intercepts points within a certain depth threshold of the depth map as the specific content of the hand part:
because the hand detection module may have a situation that the hand detection is incomplete (e.g., a part of the hand is missing, but the bounding box of the hand is approximately correct) or the hand cannot be detected, directly intercepting the depth value in the bounding box of the hand may cause the depth value of the hand to be seriously missing, so that the depth map needs to be preprocessed according to the bounding box of the hand, and a reasonable area of the hand is intercepted;
the specific method comprises the following steps:
calculating coordinates (u) of a center point of a bounding box of a hand when hand detection is incompleteo,vo) Calculating the average depth value d of each point in the depth map region corresponding to the bounding box of the handoForming point coordinates (u) in an image coordinate systemo,vo,do) Then, the coordinates (u) of the point in the image coordinate system are calculatedo,vo,do) Converting the image data into a camera coordinate system to be used as the center of a cubic boundary frame with a fixed size, intercepting a hand region by using the cubic boundary frame by using a hand posture estimation module, keeping points in the intercepted frame at an original value, setting points outside the frame as background points, and then converting the points in the frame into the image coordinate system to be used as the hand region for hand posture estimation; the size of the cubic boundary frame can be set according to the requirement so as to be suitable for hands with different shapes;
when the hand is not detected, sampling some points closest to the camera as the area of the hand to carry out hand posture estimation;
the gesture recognition module recognizes the specific contents of the relationship between the state of a single finger and the finger according to the 3D coordinates of the key joint points of the hand in the image coordinate system:
referring to fig. 4, fig. 4 shows a single finger state, which sequentially from left to right: upward, forward, sideways, curved, semi-closed, and closed.
The state of a single finger is determined by the variance and relative values of the x, y, z coordinates of the key joint points on the finger, i.e.:
when the finger state is upward, the variance of x and z coordinates of key joint points on the finger is small, and the variance of y coordinates is large;
when the finger is in a bent state, the variance of x coordinates of key joint points on the finger is small, and the variance of y and z coordinates is large;
when the finger state is forward (aiming at other four fingers except the thumb), the variance of x and y coordinates of key joint points on the finger is small, and the variance of z coordinates is large;
when the finger state is a side edge (only aiming at the thumb), the variance of x and y coordinates of key joint points on the finger is larger, and the variance of z coordinates is small;
when the finger state is semi-closed, the variances of y and z coordinates of key joint points on the finger are larger and close, and the variance of x coordinates is smaller;
when the finger state is closed, the y coordinate of the key joint point on the finger from the interphalangeal to the palm is not monotonously increased.
Referring to fig. 5, fig. 5 shows the interphalangeal relationship, from left to right: combine, separate, cross and loop.
The interphalangeal relationship is determined by the state of a single finger and the relative coordinates of the corresponding joint points between two fingers, namely:
when the interphalangeal relations of the two fingers are combined, the states of the two fingers are upward, and the difference of the x coordinate values of the corresponding joint points between the fingers is small;
when the interphalangeal relations of the two fingers are separated, the difference of the x coordinate values of the corresponding joint points between the fingers is larger;
when the interphalangeal relationship of the two fingers is crossed, the difference value of the x coordinates of the corresponding joint points between the fingers is positive or negative;
when the interphalangeal relationship of the two fingers is a loop, the x coordinates of the joint points corresponding to the interphalangeal and the palm center are close, and the x coordinates of the other corresponding joint points are larger in distance.
Referring to table 1, the digital gesture code is:
TABLE 1
Figure DA00032687709057735900
Figure GDA0003268770900000091
The digital gesture code is a digital vector consisting of 12 numbers, and the specific steps are as follows: (f)1,f2,f3,f4,f5,f12,f13,f14,f15,f23,f34,f45)TWherein the element fiThe index i belongs to {1, 2, 3, 4, 5}, and specifically: f. of1Representing the state of a single finger of the thumb, f2Representing the state of a single index finger, f3Representing the state of a single finger of the middle finger, f4Representing the state of a single ring finger, f5Representing the state of a single little finger; element fijThe relation between the fingers is shown, the subscript i belongs to {1, 2, 3, 4, 5}, the subscript j belongs to {2, 3, 4, 5}, and specifically: f. of12Indicating the interphalangeal relationship between the thumb and index finger, f13Indicating the interphalangeal relationship between the thumb and middle finger, f14Indicating the interphalangeal relationship between the thumb and ring finger, f15Representing the interphalangeal relationship between the thumb and the little finger, f23Indicating the interphalangeal relationship between the index finger and the middle finger, f34Indicating the interphalangeal relationship between the middle finger and ring finger, f45Representing the interphalangeal relationship between the ring finger and the little finger;
the specific values are as follows:
fithe value is 1, which indicates that the finger state is upward; f. ofiThe value is 2, which indicates that the finger is bent; f. ofiThe value is 3, which indicates that the finger state is forward; f. ofiThe value is 4, which indicates that the finger state is a side edge; f. ofiThe value is 5, which indicates that the finger state is semi-closed; f. ofiThe value is 6, which indicates that the finger state is closed; f. ofiA value of 0 indicates undefined;
fija value of 1 indicates that the inter-finger relationship is separation; f. ofijThe value is 2, which represents the inter-finger relationship as a combination; f. ofijThe value is 3, which indicates that the inter-finger relationship is bifurcated; f. ofijThe value is 4, and the inter-finger relation is represented as a loop; f. ofijA value of 0 indicates undefined;
referring to fig. 6, a method for visually recognizing gestures according to the present invention is described, the method comprising the following steps:
(1) inputting the aligned RGB pictures into a hand detection module to obtain a hand boundary frame;
(2) the hand posture estimation module carries out data preprocessing on a depth map corresponding to the aligned RBG picture by using the bounding box of the hand, and intercepts the corresponding part of the hand in the depth map to obtain the 3D coordinates of key joint points of the hand; in order to improve robustness, when the hand detection module does not detect the hand, points within a certain depth threshold of the depth map are intercepted as the part of the hand;
(3) inputting the 3D coordinates of the key joint points of the hand part into a gesture recognition module to obtain digital gesture codes;
(4) and according to the digital gesture codes, carrying out similarity measurement on the gestures, thereby realizing gesture recognition.
The specific content of the step (4) is as follows: similarity measurement is carried out on the two gestures by calculating L1 paradigm distance d of the two digital gesture codes, and the calculation method is shown as the following formula:
d=∑i|xi-yi|
in the above formula, x ═ x1,x2,…,xn)T,y=(y1,y2,…,yn)TDigital gesture codes that are two gestures, respectively; the smaller d represents the greater similarity of the two gestures, and when the d is smaller than a set threshold value, the two gestures are judged to be the same, so that gesture recognition is realized.
In the step (1), the step (2) and the step (3), the RGB picture input as the system, the depth map corresponding to the RGB picture, and the 3D coordinates of the key joint point of the hand output as the system all belong to the same coordinate system, i.e., an image coordinate system, so that the recognition accuracy can be improved and the stability of the accuracy can be maintained; the 3D coordinates of the key joint points of the hand are the positions of the joint points in an image coordinate system, can be converted into a camera coordinate system, and the camera coordinate system is a world coordinate system after the camera is calibrated;
the inventor carries out a large number of experiments on the proposed system and method of the present invention using EgoHands (see http:// vision. sic. indiana. edu/projects/EgoHands /) and NYU dataset (see https:// jonathantompson. githu. io/NYU _ Handhan _ Pose _ Dataset. htm), respectively, and finally carries out system tests on the real dataset collected by the Intel Realsense D415 camera, the experimental results are shown in FIG. 7, the first column is a bounding box output by the Hand detection module, the second column is a depth map of the Hand region obtained by the Hand Pose module after data preprocessing, the third column is positions of 14 key joint points of the Hand output by the Hand Pose module, the fourth column is corresponding digital gesture codes, experiments prove that the system and method of the present invention have high accuracy and strong robustness when the actual scene is applied, and the average processing time of Nvisla is 030.100. GPU, about 30 frames/second, substantially meeting the real-time requirements.

Claims (6)

1. The system for realizing gesture recognition based on vision is characterized in that: the system comprises the following modules:
the hand detection module: the function of the module is to obtain the bounding box of the hand from the input aligned RGB picture; the module is modified based on an SSD network and consists of three parts: a basic network sub-module, an additional layer sub-module and a prediction layer sub-module;
the basic network sub-module has the main functions of completing feature extraction, generating a feature map with a larger resolution, and generating a default bounding box with a set size and a set aspect ratio by using the feature map; the submodule is modified based on a VGG16 network, and specifically comprises the following steps: the submodule uses all convolutional layers of the VGG16 network, and two fully-connected layers of the VGG16 network are replaced by two common convolutional layers;
the additional layer submodule consists of a series of convolution layers, two convolution layers form a group, and the submodule has the main function of generating a feature map with smaller resolution and generating a default bounding box with set size and set aspect ratio by using the feature map;
the prediction layer submodule consists of convolution layers and has the main function of performing two convolution filtering treatments on each feature map and respectively predicting the position offset of a default boundary box on the feature map and the category confidence coefficient of the default boundary box, namely the probability that the default boundary box contains a hand;
the convolution layer for predicting the position offset of the default boundary frame consists of 4 xq convolution kernels with the size of 3 x3 xp, wherein a parameter q is the number of the default boundary frames generated on each point of the feature map, and a parameter p is the number of channels of the feature map;
predicting the convolution layer of the confidence coefficient of the default bounding box category, wherein the convolution layer consists of c × q convolution kernels with the size of 3 × 3 × p, and the parameter c is the total number of the predicted categories;
a hand pose estimation module: the function of the module is: utilizing a hand boundary frame obtained by a hand detection module to carry out data preprocessing on a depth map corresponding to the aligned RBG picture, and intercepting a part of a corresponding hand in the depth map to obtain a 3D coordinate of a key joint point of the hand; the 3D coordinates of the key joint points are the positions of the joint points in an image coordinate system, and can be converted into a camera coordinate system, and when a camera is calibrated, the camera coordinate system is a world coordinate system; in order to improve robustness, when the hand detection module does not detect a hand, the hand posture estimation module intercepts points within a certain depth threshold of the depth map as a part of the hand; the hand pose estimation module directly uses the Resnet18 network to predict the 3D coordinates of the key joint points of the hand in the image coordinate system, i.e., (u, v, D);
a gesture recognition module: the module has the functions of identifying the relation between the state of a single finger and the finger and outputting a digital gesture code based on the output result of the hand posture estimation module, namely the 3D coordinate of the key joint point of the hand in an image coordinate system; according to the digital gesture codes, similarity measurement is carried out on the gestures, so that gesture recognition is realized; the recognition precision mainly depends on the precision of the hand posture estimation module, and the angles of palms and cameras, the sizes of palms and the like are stronger in robustness;
the digital gesture coded content is:
the digital gesture code is a digital vector consisting of 12 numbers, and the specific steps are as follows: (f)1,f2,f3,f4,f5,f12,f13,f14,f15,f23,f34,f45)TWherein the element fiThe index i belongs to {1, 2, 3, 4, 5}, and specifically: f. of1Representing the state of a single finger of the thumb, f2Representing a single finger of the index fingerState, f3Representing the state of a single finger of the middle finger, f4Representing the state of a single ring finger, f5Representing the state of a single little finger; element fijThe relation between the fingers is shown, the subscript i belongs to {1, 2, 3, 4, 5}, the subscript j belongs to {2, 3, 4, 5}, and specifically: f. of12Indicating the interphalangeal relationship between the thumb and index finger, f13Indicating the interphalangeal relationship between the thumb and middle finger, f14Indicating the interphalangeal relationship between the thumb and ring finger, f15Representing the interphalangeal relationship between the thumb and the little finger, f23Indicating the interphalangeal relationship between the index finger and the middle finger, f34Indicating the interphalangeal relationship between the middle finger and ring finger, f45Representing the interphalangeal relationship between the ring finger and the little finger;
the specific values are as follows:
fithe value is 1, which indicates that the finger state is upward; f. ofiThe value is 2, which indicates that the finger is bent; f. ofiThe value is 3, which indicates that the finger state is forward; f. ofiThe value is 4, which indicates that the finger state is a side edge; f. ofiThe value is 5, which indicates that the finger state is semi-closed; f. ofiThe value is 6, which indicates that the finger state is closed; f. ofiA value of 0 indicates undefined;
fija value of 1 indicates that the inter-finger relationship is separation; f. ofijThe value is 2, which represents the inter-finger relationship as a combination; f. ofijThe value is 3, which indicates that the inter-finger relationship is bifurcated; f. ofijThe value is 4, and the inter-finger relation is represented as a loop; f. ofijA value of 0 indicates undefined.
2. The system of claim 1, wherein the gesture recognition is performed based on vision, and wherein: the specific contents of the default bounding box for generating the feature map and generating the set size and the set aspect ratio by the hand detection module are as follows:
the hand detection module extracts a plurality of feature maps with different resolutions from each convolution layer, and generates q default bounding boxes with set size and set aspect ratio at each point of the feature maps by using the feature maps;
the resolution of the feature map of a lower level is higher, and the generated default bounding box is smaller and is responsible for detecting small objects; and the resolution of the feature map of a higher level is smaller, the generated default bounding box is larger, and the default bounding box is responsible for detecting large objects and combining the default bounding boxes of various sizes so as to improve the robustness of the system to the sizes of the detected objects.
3. The system of claim 1, wherein the gesture recognition is performed based on vision, and wherein: when the hand detection module does not detect the hand, the hand posture estimation module intercepts points within a certain depth threshold of the depth map as the specific content of the hand part:
because the hand detection module may have incomplete hand detection or no hand detection, directly intercepting the depth value in the bounding box of the hand may cause the depth value of the hand to be seriously lost, so that the depth map needs to be preprocessed according to the bounding box of the hand, and a reasonable hand area is intercepted;
the specific method comprises the following steps:
calculating coordinates (u) of a center point of a bounding box of a hand when hand detection is incompleteo,vo) Calculating the average depth value d of each point in the depth map region corresponding to the bounding box of the handoForming point coordinates (u) in an image coordinate systemo,vo,do) Then, the coordinates (u) of the point in the image coordinate system are calculatedo,vo,do) Converting the image data into a camera coordinate system to be used as the center of a cubic boundary frame with a fixed size, intercepting a hand region by using the cubic boundary frame by using a hand posture estimation module, keeping points in the intercepted frame at an original value, setting points outside the frame as background points, and then converting the points in the frame into the image coordinate system to be used as the hand region for hand posture estimation; the size of the cubic boundary frame can be set according to the requirement so as to be suitable for hands with different shapes;
when the hands are not detected, some points closest to the camera are sampled to be used as the areas of the hands for hand posture estimation.
4. The system of claim 1, wherein the gesture recognition is performed based on vision, and wherein: the gesture recognition module recognizes the specific contents of the relationship between the state of a single finger and the finger according to the 3D coordinates of the key joint points of the hand in the image coordinate system:
the state of a single finger is determined by the variance and relative values of the x, y, z coordinates of the key joint points on the finger, i.e.:
when the finger state is upward, the variance of x and z coordinates of key joint points on the finger is small, and the variance of y coordinates is large;
when the finger is in a bent state, the variance of x coordinates of key joint points on the finger is small, and the variance of y and z coordinates is large;
when the finger state is forward, aiming at other four fingers except the thumb, the variance of x and y coordinates of key joint points on the fingers is small, and the variance of z coordinates is large;
when the finger state is the side edge, the variance of x and y coordinates of key joint points on the finger is larger and the variance of z coordinates is smaller only for the thumb;
when the finger state is semi-closed, the variances of y and z coordinates of key joint points on the finger are larger and close, and the variance of x coordinates is smaller;
when the finger state is closed, the y coordinate of the key joint point from the interphalangeal to the palm on the finger is not monotonously increased;
the interphalangeal relationship is determined by the state of a single finger and the relative coordinates of the corresponding joint points between two fingers, namely:
when the interphalangeal relations of the two fingers are combined, the states of the two fingers are upward, and the difference of the x coordinate values of the corresponding joint points between the fingers is small;
when the interphalangeal relations of the two fingers are separated, the difference of the x coordinate values of the corresponding joint points between the fingers is larger;
when the interphalangeal relationship of the two fingers is crossed, the difference value of the x coordinates of the corresponding joint points between the fingers is positive or negative;
when the interphalangeal relationship of the two fingers is a loop, the x coordinates of the joint points corresponding to the interphalangeal and the palm center are close, and the x coordinates of the other corresponding joint points are larger in distance.
5. The method for realizing gesture recognition based on vision is characterized by comprising the following steps: the method comprises the following operation steps:
(1) inputting the aligned RGB pictures into a hand detection module to obtain a hand boundary frame;
(2) the hand posture estimation module carries out data preprocessing on a depth map corresponding to the aligned RBG picture by using the bounding box of the hand, and intercepts the corresponding part of the hand in the depth map to obtain the 3D coordinates of key joint points of the hand; in order to improve robustness, when the hand detection module does not detect the hand, points within a certain depth threshold of the depth map are intercepted as the part of the hand;
(3) inputting the 3D coordinates of the key joint points of the hand part into a gesture recognition module to obtain digital gesture codes;
(4) according to the digital gesture codes, similarity measurement is carried out on the gestures, so that gesture recognition is realized;
the specific content is as follows: similarity measurement is carried out on the two gestures by calculating L1 paradigm distance d of the two digital gesture codes, and the calculation method is shown as the following formula:
d=∑i|xi-yi|
in the above formula, x ═ x1,x2,...,xn)T,y=(y1,y2,...,yn)TDigital gesture codes that are two gestures, respectively; the smaller d represents the greater similarity of the two gestures, and when the d is smaller than a set threshold value, the two gestures are judged to be the same, so that gesture recognition is realized;
the digital gesture code is a digital vector consisting of 12 numbers, and the digital gesture code specifically comprises the following steps: (f)1,f2,f3,f4,f5,f12,f13,f14,f15,f23,f34,f45)TWherein the element fiThe index i belongs to {1, 2, 3, 4, 5}, and specifically: f. of1Representing the state of a single finger of the thumb, f2Representing the state of a single index finger, f3Representing the state of a single finger of the middle finger, f4Representing the state of a single ring finger, f5Representing the state of a single little finger; element fijThe relation between the fingers is shown, the subscript i belongs to {1, 2, 3, 4, 5}, the subscript j belongs to {2, 3, 4, 5}, and specifically: f. of12Indicating the interphalangeal relationship between the thumb and index finger, f13Indicating the interphalangeal relationship between the thumb and middle finger, f14Indicating the interphalangeal relationship between the thumb and ring finger, f15Representing the interphalangeal relationship between the thumb and the little finger, f23Indicating the interphalangeal relationship between the index finger and the middle finger, f34Indicating the interphalangeal relationship between the middle finger and ring finger, f45Representing the interphalangeal relationship between the ring finger and the little finger;
the specific values are as follows:
fithe value is 1, which indicates that the finger state is upward; f. ofiThe value is 2, which indicates that the finger is bent; f. ofiThe value is 3, which indicates that the finger state is forward; f. ofiThe value is 4, which indicates that the finger state is a side edge; f. ofiThe value is 5, which indicates that the finger state is semi-closed; f. ofiThe value is 6, which indicates that the finger state is closed; f. ofiA value of 0 indicates undefined;
fija value of 1 indicates that the inter-finger relationship is separation; f. ofijThe value is 2, which represents the inter-finger relationship as a combination; f. ofijThe value is 3, which indicates that the inter-finger relationship is bifurcated; f. ofijThe value is 4, and the inter-finger relation is represented as a loop; f. ofijA value of 0 indicates undefined.
6. The method of claim 5, wherein the method comprises: in the step (1), the step (2) and the step (3), the RGB picture input as the system, the depth map corresponding to the RGB picture, and the 3D coordinates of the key joint point of the hand output as the system all belong to the same coordinate system, i.e., an image coordinate system, so that the recognition accuracy can be improved and the stability of the accuracy can be maintained; the 3D coordinates of the key joint points of the hand are positions of the joint points in an image coordinate system, can be converted into a camera coordinate system, and the camera coordinate system is a world coordinate system after the camera is calibrated.
CN201910865437.4A 2019-09-12 2019-09-12 System and method for realizing gesture recognition based on vision Active CN110569817B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910865437.4A CN110569817B (en) 2019-09-12 2019-09-12 System and method for realizing gesture recognition based on vision

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910865437.4A CN110569817B (en) 2019-09-12 2019-09-12 System and method for realizing gesture recognition based on vision

Publications (2)

Publication Number Publication Date
CN110569817A CN110569817A (en) 2019-12-13
CN110569817B true CN110569817B (en) 2021-11-02

Family

ID=68779780

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910865437.4A Active CN110569817B (en) 2019-09-12 2019-09-12 System and method for realizing gesture recognition based on vision

Country Status (1)

Country Link
CN (1) CN110569817B (en)

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111368714A (en) * 2020-03-02 2020-07-03 北京华捷艾米科技有限公司 Gesture recognition method and device
CN111414837A (en) * 2020-03-16 2020-07-14 苏州交驰人工智能研究院有限公司 Gesture recognition method and device, computer equipment and storage medium
CN111523435A (en) * 2020-04-20 2020-08-11 安徽中科首脑智能医疗研究院有限公司 Finger detection method, system and storage medium based on target detection SSD
CN112089595A (en) * 2020-05-22 2020-12-18 未来穿戴技术有限公司 Login method of neck massager, neck massager and storage medium
CN111898524A (en) * 2020-07-29 2020-11-06 江苏艾什顿科技有限公司 5G edge computing gateway and application thereof
CN113312973B (en) * 2021-04-25 2023-06-02 北京信息科技大学 Gesture recognition key point feature extraction method and system
CN114967927B (en) * 2022-05-30 2024-04-16 桂林电子科技大学 Intelligent gesture interaction method based on image processing
CN115576431B (en) * 2022-11-18 2023-02-28 北京蔚领时代科技有限公司 VR gesture coding and recognizing method and device
CN116880687B (en) * 2023-06-07 2024-03-19 黑龙江科技大学 Suspension touch method based on monocular multi-algorithm

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102982557A (en) * 2012-11-06 2013-03-20 桂林电子科技大学 Method for processing space hand signal gesture command based on depth camera
CN107423698A (en) * 2017-07-14 2017-12-01 华中科技大学 A kind of gesture method of estimation based on convolutional neural networks in parallel
CN109919077A (en) * 2019-03-04 2019-06-21 网易(杭州)网络有限公司 Gesture recognition method, device, medium and calculating equipment
CN110188598A (en) * 2019-04-13 2019-08-30 大连理工大学 A kind of real-time hand Attitude estimation method based on MobileNet-v2

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
RU2013148582A (en) * 2013-10-30 2015-05-10 ЭлЭсАй Корпорейшн IMAGE PROCESSING PROCESSOR CONTAINING A GESTURE RECOGNITION SYSTEM WITH A COMPUTER-EFFECTIVE FIXED HAND POSITION RECOGNITION
RU2014108870A (en) * 2014-03-06 2015-09-20 ЭлЭсАй Корпорейшн IMAGE PROCESSOR CONTAINING A GESTURE RECOGNITION SYSTEM WITH A FIXED BRUSH POSITION RECOGNITION BASED ON THE FIRST AND SECOND SET OF SIGNS
EP3203412A1 (en) * 2016-02-05 2017-08-09 Delphi Technologies, Inc. System and method for detecting hand gestures in a 3d space

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102982557A (en) * 2012-11-06 2013-03-20 桂林电子科技大学 Method for processing space hand signal gesture command based on depth camera
CN107423698A (en) * 2017-07-14 2017-12-01 华中科技大学 A kind of gesture method of estimation based on convolutional neural networks in parallel
CN109919077A (en) * 2019-03-04 2019-06-21 网易(杭州)网络有限公司 Gesture recognition method, device, medium and calculating equipment
CN110188598A (en) * 2019-04-13 2019-08-30 大连理工大学 A kind of real-time hand Attitude estimation method based on MobileNet-v2

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
Personalized Sketch-Based Image Retrieval by Convolutional Neural Network and Deep Transfer Learning;QI QI等;《IEEE》;20190212;第16537-16549页 *

Also Published As

Publication number Publication date
CN110569817A (en) 2019-12-13

Similar Documents

Publication Publication Date Title
CN110569817B (en) System and method for realizing gesture recognition based on vision
Shriram et al. Deep learning-based real-time AI virtual mouse system using computer vision to avoid COVID-19 spread
Sarkar et al. Hand gesture recognition systems: a survey
Ma et al. Kinect sensor-based long-distance hand gesture recognition and fingertip detection with depth information
Shibly et al. Design and development of hand gesture based virtual mouse
CN110210426B (en) Method for estimating hand posture from single color image based on attention mechanism
Hongyong et al. Finger tracking and gesture recognition with kinect
Liu et al. Hand gesture recognition based on single-shot multibox detector deep learning
Joseph et al. Visual gesture recognition for text writing in air
CN113378770A (en) Gesture recognition method, device, equipment, storage medium and program product
Tsagaris et al. Colour space comparison for skin detection in finger gesture recognition
Shin et al. Hand region extraction and gesture recognition using entropy analysis
AlSaedi et al. A new hand gestures recognition system
Abdallah et al. An overview of gesture recognition
Harshitha et al. HCI using hand gesture recognition for digital sand model
Khan et al. Computer vision based mouse control using object detection and marker motion tracking
Alam et al. Affine transformation of virtual 3D object using 2D localization of fingertips
CN104123008A (en) Man-machine interaction method and system based on static gestures
CN114581535A (en) Method, device, storage medium and equipment for marking key points of user bones in image
Ghosh et al. Real-time 3d markerless multiple hand detection and tracking for human computer interaction applications
Birdal et al. Region based hand gesture recognition
Mou et al. Attention based dual branches fingertip detection network and virtual key system
Le et al. Remote mouse control using fingertip tracking technique
Mishra et al. Anchors Based Method for Fingertips Position Estimation from a Monocular RGB Image using Deep Neural Network
Man et al. Thumbstick: a novel virtual hand gesture interface

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant