CN116863541B

CN116863541B - Dynamic gesture recognition method and device, related equipment and handwriting recognition method

Info

Publication number: CN116863541B
Application number: CN202311119993.XA
Authority: CN
Inventors: 王杨
Original assignee: Xinyuan Technology Shanghai Co ltd; VeriSilicon Microelectronics Shanghai Co Ltd
Current assignee: Xinyuan Technology Shanghai Co ltd; VeriSilicon Microelectronics Shanghai Co Ltd
Priority date: 2023-09-01
Filing date: 2023-09-01
Publication date: 2023-11-21
Anticipated expiration: 2043-09-01
Also published as: CN116863541A

Abstract

The application provides a dynamic gesture recognition method, a device and related equipment, and a handwriting recognition method so as to improve the accuracy of dynamic gesture recognition. The dynamic gesture recognition method comprises the following steps: for each frame of hand images in continuous multi-frame hand images, determining three-dimensional coordinates of each hand key point in the hand images in space; based on the three-dimensional coordinates corresponding to the hand images, respectively determining static gestures of the hand images and motion parameters of hands in the hand images; the motion parameters represent the motion condition of hands in continuous multi-frame hand images; a dynamic gesture is determined based on the static gesture and the motion parameters of the hand for each of the successive frames of the hand image. By the dynamic gesture recognition method, accuracy of dynamic gesture recognition can be improved.

Description

Dynamic gesture recognition method and device, related equipment and handwriting recognition method

Technical Field

The application relates to the field of image processing, in particular to a dynamic gesture recognition method and device, related equipment and a handwriting recognition method.

Background

A video or a succession of multiple frames of images including the hand may be used to recognize the dynamic gesture.

However, dynamic gestures are more complex than static gestures, and the accuracy of recognition of dynamic gestures in each way is lower. For example, when a neural network is used for dynamic gesture recognition, since the neural network is usually trained using a specific gesture image, other gestures other than the specific gesture cannot be recognized, and similar gestures are difficult to distinguish, there is a problem that the recognition accuracy is low. For another example, when a dynamic gesture is identified by using an RGB (Red Green Blue, a color mode) image, each portion of the hand may not be accurately distinguished, and an angle presented by each joint may not be calculated, so that the accuracy of identifying the dynamic gesture is low.

Disclosure of Invention

In view of the foregoing, the present application is directed to a dynamic gesture recognition method, a device, a related apparatus, and a handwriting recognition method, so as to improve accuracy of dynamic gesture recognition.

In a first aspect, an embodiment of the present application provides a dynamic gesture recognition method, including: for each frame of hand images in continuous multi-frame hand images, determining three-dimensional coordinates of each hand key point in the hand images in space; based on the three-dimensional coordinates corresponding to the hand images, respectively determining static gestures of the hand images and motion parameters of hands in the hand images; the motion parameters represent the motion condition of hands in continuous multi-frame hand images; a dynamic gesture is determined based on the static gesture and the motion parameters of the hand for each of the successive frames of the hand image.

In the embodiment of the application, when the dynamic gesture is identified, three-dimensional modeling is carried out on the hand key points in space to obtain the three-dimensional coordinates of the hand key points, and the three-dimensional coordinates can accurately represent the position relation of each key point, so that each part of the hand can be accurately distinguished by using the three-dimensional coordinates, and the positions and the postures of each finger in different gestures can be accurately distinguished by using the three-dimensional coordinates, thereby reducing the possibility that similar gestures are identified as similar gestures, ensuring that the identified static gestures and motion parameters have higher accuracy, and improving the accuracy of identifying the dynamic gestures by using the static gestures and the motion parameters.

In an embodiment, the determining the three-dimensional coordinates of each hand key point in the hand image in space includes: acquiring two hand images of the same frame acquired by a binocular camera; matching the two hand images through a preset matching model to obtain a binocular disparity map; determining a depth map based on a preset depth calculation formula and the binocular disparity map; extracting a hand region from the depth map based on a preset extraction model; and identifying three-dimensional coordinates of the hand key points in the hand area based on a preset key point detection algorithm.

Compared with a single image, the binocular image acquired by the binocular camera can more accurately represent the spatial position relationship of objects in the image. In the embodiment of the application, two hand images of the same frame are acquired by using the binocular camera, so that the information of the hand extracted from the hand images in the three-dimensional space can be more accurate, the determined three-dimensional coordinates of the hand key points can be more accurate, and the accuracy of dynamic gesture recognition is improved.

In an embodiment, before the matching the hand image by the preset matching model, the method further includes: and carrying out polar correction on the two hand images of the same frame.

The same frame of image acquired by the binocular camera comprises two images acquired by the left camera and the right camera, and the binocular parallax image can be obtained by matching the pixel points of the two images. In the embodiment of the application, the polar line matching is carried out on the two hand images, and corresponding pixel point pairs on the left and right images can be arranged in the same row, so that compared with an untreated binocular image, the pixel points in the binocular image after polar line correction are easier to match, and the calculated amount during matching can be reduced without traversing all the pixel points in the binocular image, thereby reducing the occurrence of matching error, improving the accuracy during matching and improving the accuracy of dynamic gesture recognition.

In an embodiment, the hand region includes a plurality of hands, and the gesture preset extraction model is YOLO (You Only Look Once, a real-time, end-to-end deep learning target detection model) model; after the hand region is extracted from the depth map based on the preset extraction model, the method further comprises: identifying different hands and types of hands in the hand region based on the YOLO model; the type of hand includes left hand or right hand.

The YOLO model has the functions of hand recognition and left and right hand recognition, and in the embodiment of the application, the hands in the hand images can be recognized and distinguished by using the YOLO model, so that the possibility of confusing gestures of different hands is reduced, and the accuracy of recognizing dynamic gestures is improved. In addition, the left hand and the right hand in the hand image are recognized, so that the gestures of the left hand, the right hand or the left hand and the right hand can be conveniently distinguished, and the recognition range of the dynamic gestures can be expanded.

In an embodiment, after the identifying the different hands in the hand region, the method further comprises: and carrying out target tracking on different hands in the hand areas of the hand images of any two adjacent frames so as to match the same hands in the hand images of the two adjacent frames.

The dynamic gesture refers to an action gesture of the same hand in continuous multi-frame hand images, and when the hand images comprise a plurality of hands, different hands can be confused in the dynamic gesture recognition process. In the embodiment of the application, different hands are subjected to target tracking, and the same hands in two adjacent frames of hand images can be matched, so that the possibility of confusing different hands is reduced, and the accuracy of dynamic gesture recognition is improved.

In an embodiment, the determining a dynamic gesture based on the motion parameter and the static gesture includes: matching the static gesture and the motion parameter of the left hand, the static gesture and the motion parameter of the right hand with a preset double-hand dynamic gesture library to determine double-hand dynamic gestures; the preset two-hand dynamic gesture library comprises corresponding relations between the two-hand dynamic gestures, the static gestures and the motion parameters of the left hand and the static gestures and the motion parameters of the right hand.

In the embodiment provided by the application, the dynamic hand gestures of the hands can be identified through the static hand gestures and the motion parameters of the left hand, the static hand gestures and the motion parameters of the right hand and the preset dynamic hand gesture library, and the identification range of the dynamic hand gestures is expanded.

In an embodiment, the determining, based on the three-dimensional coordinates corresponding to each of the hand images, the static gesture of each of the hand images and the motion parameter of the hand in each of the hand images includes: for each frame of the hand image, calculating the finger curvature of each finger based on the three-dimensional coordinates of each hand key point in the hand image; determining each finger state based on a preset corresponding relation between the finger curvature and the finger state and the finger curvature; and matching the finger states with a preset static gesture library, and determining a static gesture corresponding to the hand image, wherein the static gesture library comprises the corresponding relation between the finger states and the static gesture.

In the embodiment of the application, the three-dimensional coordinates are utilized to calculate the finger bending degree of each finger in the hand image, and the three-dimensional coordinates can accurately reflect the position condition of each finger key point in the finger in the space, so that the finger bending degree has higher accuracy. Meanwhile, the static gestures are determined by utilizing the finger curvature of each finger, so that the relevance among the fingers can be reduced, the actions of the finger curvature characterization among the fingers are mutually independent, and therefore, more abundant gesture types combined among the fingers can be identified, similar gestures are distinguished, the requirement for identifying more gesture types is met, the possibility that similar dynamic gestures are identified as the same gesture is reduced, and the identification accuracy of different dynamic gestures is improved.

In an embodiment, the calculating the finger bending degree of each finger based on the three-dimensional coordinates of each hand key point in the hand image includes: for the same finger, determining a plurality of vectors consisting of key points of two adjacent hands in the path from the palm center to the finger tip of the finger; calculating an included angle between the vectors; and calculating the sum of included angles among vectors in the finger, wherein the sum of included angles is the curvature of the finger.

In the embodiment of the application, the bending degree of the finger is embodied as the sum of the included angles between a plurality of vectors formed by the key points of two adjacent hands in the path from the palm center to the finger tip, and compared with the mode of calculating whether the key points of the same finger are on the same straight line or not by utilizing coordinates, the sum of the included angles can accurately embody the actual bending degree of the bending of the finger, so that the determined bending degree can have higher accuracy, the accuracy of determining the state of the finger is improved, and the accuracy of dynamic gesture recognition is further improved.

In an embodiment, the preset correspondence includes a correspondence between a thumb curvature and a thumb curvature state, and determining each finger state based on the preset correspondence between a finger curvature and a finger state and the finger curvature includes: if the included angle and the corresponding finger are thumbs, comparing the included angle with a first threshold; if the included angle sum is smaller than or equal to the first threshold value, determining that the thumb is in a bending state; and if the included angle sum is larger than the first threshold value, the thumb is in a straightening state.

The structure of the thumb is different from other four fingers, and the type of actions which can be realized is different from other four fingers, compared with the mode of not distinguishing the thumb from other four fingers, in the embodiment of the application, the gesture of the thumb can be distinguished from other four fingers by independently setting the recognition conditions for the thumb, the gesture of each finger in the hand can be recognized more accurately, and confusion between the thumb and other four fingers is avoided, so that the gesture recognition accuracy is improved.

In an embodiment, the preset correspondence includes correspondence between bending states and bending states of four fingers other than the thumb, and the determining each finger state based on the preset correspondence between finger bending and finger state and the finger bending includes: if the included angle and the corresponding finger are any one finger of the other four fingers, comparing the sum of the included angles of the finger with a second threshold value, a third threshold value and a fourth threshold value in sequence, and determining the bending state of the finger; wherein the second threshold is less than the third threshold, the third threshold is less than the fourth threshold; if the included angle sum is smaller than or equal to the second threshold value, determining that the finger is in a straightening state; if the included angle is between the second threshold value and the third threshold value, the finger is in a bending state; if the included angle is between the third threshold value and the fourth threshold value, the finger is in a grabbing state; and if the included angle sum is larger than or equal to the fourth threshold value, determining that the finger is in a holding state.

In the embodiment of the application, aiming at the other four fingers except the thumb, each finger can identify the finger state as the state of straightening, bending, grabbing and holding, compared with the existing mode of only identifying bending, the embodiment of the application can provide richer finger action identification, further the combination of different fingers can be combined into more types of gestures, thereby realizing richer and more types of dynamic gesture identification, distinguishing some similar gestures and improving the accuracy of the hand types of the dynamic gestures.

In an embodiment, the motion parameter includes at least one of an angle of a hand in space, a motion direction, a motion speed, and a motion trajectory.

In the embodiment of the application, the dynamic gestures can be identified by one or more auxiliary motion parameters such as angles, motion directions, motion speeds, motion tracks and the like of the hands in space, and compared with a single motion track, more types of dynamic gestures can be identified, so that various dynamic gestures can be distinguished more finely.

In one embodiment, the hand keypoints comprise a palm center keypoint and a finger keypoint, and the motion parameters comprise angles of hands in space; the determining, based on the three-dimensional coordinates corresponding to each hand image, a static gesture of each hand image and a motion parameter of a hand in each hand image respectively includes: an angle of the hand in space is determined based on the palm center keypoint and a finger keypoint characterizing a root of a middle finger.

In the embodiment of the application, the positions between the palm key point and the finger key point representing the middle finger root are not changed generally, and the angle of the hand in space determined by the palm key point and the finger key point representing the middle finger root has higher accuracy, so that the accuracy of the determined dynamic gesture can be improved.

In an embodiment, the hand key points include palm key points, and the movement parameters include at least one of movement direction, movement speed and movement track; the determining, based on the three-dimensional coordinates corresponding to each hand image, a static gesture of each hand image and a motion parameter of a hand in each hand image respectively includes: and determining at least one of the motion direction, the motion speed and the motion track based on the palm center key points of the hand images of the continuous multiple frames.

In the embodiment of the application, the palm center does not independently move compared with the whole hand, so that the movement direction, the movement speed and the movement track determined by the palm center key point can effectively represent the movement of the whole hand compared with the finger key point, the determined movement parameters have higher accuracy, and the accuracy of dynamic gesture recognition is further improved.

In one embodiment, the motion parameters include angle, direction and speed of motion of the hand in space; the dynamic gestures comprise instantaneous dynamic gestures which are used for representing the motion behaviors of the hand in a single frame of the hand image; the determining a dynamic gesture based on the motion parameter and the static gesture includes: matching the motion direction, the motion speed, the angle of the hand in space and the static gesture corresponding to the hand image in a preset instantaneous dynamic gesture library, and determining the instantaneous dynamic gesture corresponding to the hand image; the transient dynamic gesture library comprises a correspondence between the static gesture and the motion parameter.

In the embodiment of the application, the dynamic gesture is wholly split into a plurality of instantaneous dynamic gestures, and compared with the process of wholly identifying the dynamic gesture, the process of identifying the hand image form of each frame can distinguish the details in the hand image of each frame, so that the corresponding instantaneous dynamic gesture is determined, thereby realizing finer distinction of the dynamic gesture and obtaining a more accurate identification result of the dynamic gesture.

In an embodiment, the dynamic gesture further includes a continuous dynamic gesture, where the continuous dynamic gesture characterizes a motion behavior of the hand in the continuous multiframe hand image; after the determination of the transient dynamic gesture corresponding to the partial image, the method further includes: if the number of continuous preset frames is the same as the number of the instantaneous dynamic gestures corresponding to the hand images, adding the instantaneous dynamic gestures into a gesture queue; matching the continuous multiple types of the instantaneous dynamic gestures in the gesture queue with a preset continuous dynamic gesture library, and determining the continuous dynamic gestures, wherein the continuous dynamic gesture library comprises corresponding relations between the continuous dynamic gestures and the instantaneous dynamic gestures.

In the embodiment of the application, a complex continuous dynamic gesture is split into a plurality of simple instantaneous dynamic gestures, so that on one hand, the difficulty in identifying the complex dynamic gesture can be reduced, and on the other hand, as different parts of the complex motion are distinguished, part of details in the process of carrying out each dynamic gesture can be distinguished, more accurate dynamic gesture types are determined, and similar gestures are distinguished. Thus, accuracy in recognizing the dynamic gesture can be improved.

In a second aspect, an embodiment of the present application provides a dynamic gesture recognition apparatus, including: the three-dimensional module is used for determining three-dimensional coordinates of each hand key point in the hand image in space for each frame of hand image in the continuous multi-frame hand images; the first recognition module is used for respectively determining static gestures of each hand image and motion parameters of hands in each hand image based on the three-dimensional coordinates corresponding to each hand image; the motion parameter characterizes a motion trend of the hand; and the second recognition module is used for determining a dynamic gesture based on the motion parameters of the hands of each hand image of a plurality of continuous frames and the static gesture.

In a third aspect, an embodiment of the present application provides a dynamic gesture recognition system, including: the image acquisition equipment is used for acquiring hand images; a processing device in communication with the image acquisition device for receiving the hand image and performing the dynamic gesture recognition method of any of the first aspects.

In an embodiment, the image acquisition device comprises a binocular camera.

In a fourth aspect, an embodiment of the present application provides a handwriting recognition method, including: identifying a dynamic gesture of a user based on the dynamic gesture recognition method as described in any one of the first aspects; if the dynamic gesture is a first preset gesture, recording a movement track of the target finger based on a preset layer; and if the dynamic gesture is a second preset gesture, identifying the moving track recorded on the preset layer to obtain an identification result.

In the embodiment of the application, the dynamic gesture of the user is identified by the dynamic gesture identification method provided in the first aspect, so that the accuracy of dynamic gesture identification can be effectively improved, and the accuracy of handwriting content identification can be further improved.

In an embodiment, after the recording the movement track of the target finger based on the preset layer, the method further includes: and if the dynamic gesture is a third preset gesture, clearing the moving track recorded by the preset layer.

In this embodiment, a third preset gesture for clearing a preset layer is provided, so that a user can clear after a handwriting error, thereby reducing the recognition of the error handwriting content and improving the accuracy of the recognition of the handwriting content.

In a fifth aspect, an embodiment of the present application provides an electronic device, including a memory and a processor, the memory storing computer readable instructions that are executed by the processor to perform the method according to the first or fourth aspect.

In a sixth aspect, embodiments of the present application provide a computer readable storage medium having a computer program stored therein, which when run on a computer causes the computer to perform the method according to the first or fourth aspect.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the embodiments of the present application will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and should not be considered as limiting the scope, and other related drawings can be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flow chart of a dynamic gesture recognition method according to an embodiment of the present application;

FIG. 2 is a schematic diagram of a hand key point according to an embodiment of the present application;

FIG. 3 is a schematic view of finger key points and bending provided by an embodiment of the present application;

FIG. 4 is a schematic diagram illustrating a dynamic gesture recognition process according to an embodiment of the present application;

FIG. 5 is a schematic diagram of a dynamic gesture recognition apparatus according to an embodiment of the present application;

FIG. 6 is a schematic diagram of a dynamic gesture recognition system according to an embodiment of the present application;

FIG. 7 is a flowchart of a handwriting recognition method according to an embodiment of the present application;

FIG. 8 is a diagram of a handwritten numeral according to an embodiment of the application;

fig. 9 is a schematic diagram of an electronic device according to an embodiment of the application.

Icon: a dynamic gesture recognition device 100; a three-dimensional module 110; a first identification module 120; a second identification module 130; a dynamic gesture recognition system 200; an image acquisition device 210; a processing device 220; an electronic device 300; a processor 310; a memory 320.

Detailed Description

The present application will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present application more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application.

Referring to fig. 1, fig. 1 is a flowchart of a dynamic gesture recognition method according to an embodiment of the application. The dynamic gesture recognition method comprises the following steps:

s110, for each of the continuous multi-frame hand images, determining three-dimensional coordinates of each hand key point in the hand image in space.

Dynamic gestures may characterize a hand's motion over time, including but not limited to characterizing the direction, speed, trajectory, orientation of the hand's motion, the pose the hand is in while in motion, and the like. However, a single frame of hand image can only represent the pose of the hand in the image, and the action which the hand is likely to actually perform cannot be determined. For example, in a single frame hand image, the hand may have three fingers standing up and put out a "3" motion, and in practice, the hand may be doing a "reciprocal" dynamic motion, with 3 being only one step of the "reciprocal" motion. Thus, in this embodiment, a continuous multi-frame hand image may be selected to recognize a dynamic gesture. Where the video is essentially comprised of successive multiple frames of successive images, the video may be considered as a representation of successive multiple frames of hand images for dynamic gesture recognition using the video.

In addition, continuous refers to temporal succession, that is, the sequence of the images for performing dynamic gesture recognition is consistent with the sequence of the actual actions, but not strictly adjacent, for example, the collected hand image includes 10 frames of images, and the selected part of the images for performing dynamic gesture recognition may be part of the images, for example, 1 st, 3 rd, 6 th, 8 th and 10 th frames of images may be selected as the hand images for performing dynamic gesture recognition.

The hand image can be acquired in real time through the image acquisition equipment, or can be an image which is selected by a user and needs dynamic gesture recognition, and the acquisition mode of the hand image is not limited.

The hand key points are used for representing the characteristic positions of the hands of the person, and as shown in fig. 2, the hand key points comprise 21 key points, namely palm key point W ₀ Finger key point T corresponding to each joint of five fingers ₁ To T ₅ 、D ₁ To D ₅ 、P ₁ To P ₅ 、M ₁ To M ₅ . The finger key points may refer to the prior art and are not described herein.

In the embodiment of the application, after the hand image is acquired, the hands in the hand image can be identified, so as to perform three-dimensional modeling on the hands, construct a three-dimensional model of each hand, and represent each key point by three-dimensional coordinates.

In an optional embodiment of the present application, for any one frame of hand image, determining three-dimensional coordinates of each hand key point in the hand image in space includes: acquiring two hand images of the same frame acquired by a binocular camera; matching the two hand images through a preset matching model to obtain a binocular disparity map; determining a depth map based on a preset depth calculation formula and a binocular disparity map; extracting a hand region from the depth map based on a preset extraction model; based on a preset key point detection algorithm, three-dimensional coordinates of hand key points in the hand region are identified.

The binocular camera comprises two cameras, the position relationship between the two cameras is preset, and two images, namely binocular images, can be acquired in the same area. Due to the position difference between the two cameras, the binocular image can be combined with the position relation of the two cameras in the binocular camera and acquisition parameters set by the binocular camera to construct a three-dimensional space represented in the image. The single image does not have the positional relationship among different cameras, so that the three-dimensional space determined by the single image is more unilateral than the three-dimensional space determined by the binocular image, and the actual information of the three-dimensional space in the image cannot be represented abundantly.

In this embodiment, a binocular camera may be used to collect hand images, and collect two hand images belonging to the same frame, so as to form a binocular image of the hand. Therefore, when dynamic gesture recognition is carried out by utilizing two hand images acquired by the binocular camera, more complete space information can be acquired from the two hand images, so that three-dimensional coordinates of hand keys can be more accurate, and the accuracy of dynamic gesture recognition is improved. The relevant content of the binocular camera and the binocular image may refer to the prior art and is not expanded here.

In some embodiments, after obtaining the binocular images acquired by the binocular camera, the polar correction may be performed on the two hand images, and then the polar corrected images may be matched to the binocular disparity map.

The implementation of epipolar disparity map rectification can be referred to the prior art and is not further developed here. In this embodiment, through epipolar rectification, the pixel points of the hand images collected by the left and right cameras in the binocular camera can be processed, so that all the matched pixel points on the two images are in the same row, therefore, when the two images are matched by using the preset matching model, the calculated amount of the preset matching model can be reduced, and the matching efficiency is improved. Meanwhile, the pixel points in the binocular image after polar correction are easier to match, and all the pixel points in the binocular image do not need to be traversed during matching, so that the occurrence of matching errors is reduced, the accuracy during matching is improved, and the accuracy of dynamic gesture recognition is improved.

In the embodiment of the application, after two hand images of the same frame acquired by the binocular camera are obtained, a preset matching model can be used for matching the two hand images to obtain a binocular disparity map. The matching model of the binocular disparity map may be various binocular vision matching models, such as HitNet (Hierarchical Iterative Tile Refinement Network, hierarchical iterative tile optimization network for real-time stereo matching), which are not developed here. The above models are only examples and are not limiting of the application.

In this embodiment, the binocular disparity map includes parameters of images acquired by the binocular camera, for example, a focal length, a distance between light centers between two cameras, and a parallax, and then a depth map corresponding to the binocular disparity map may be determined by the binocular disparity map and a preset depth calculation formula.

Illustratively, the depth calculation formula may be:

wherein,for normalized focal length, +.>Is the optical center distance between binocular cameras, < >>For the parallax to be a good visual indication,the above parameters can be directly obtained by binocular disparity map or binocular camera for pixel depth, and will not be further described herein.

The depth map includes depths of all pixels in the binocular disparity map. The above depth calculation formula is merely an example, and in some other embodiments, other parameters or other calculation manners may be used to obtain the depth map, which will not be described herein.

After the depth map is acquired, hand regions including hands may be extracted from the depth map. The hand region refers to a region including a hand in the hand image. It can be appreciated that the hand image may include other contents except for the hand, such as a background, other human body parts except for the hand, various ornaments, and the like, and the partial image has no beneficial effect on gesture recognition, but rather affects recognition efficiency, so in this embodiment, the hand in the hand image may be recognized first, and then the region including the hand may be extracted or cut out from the hand image, so as to obtain the hand region including only the hand, so that only the hand region is processed in the following, thereby reducing data processing amount and improving recognition efficiency of three-dimensional coordinates.

In some scenarios, the hand image may include multiple hands, e.g., two hands of one user, hands of different users, etc. In an embodiment of the present application, if the hand image includes multiple hands, different hands may be distinguished.

In some embodiments of the application, the predetermined extraction model is a YOLO model. Currently, the YOLO model has a function of identifying hands in an image and a function of identifying types of hands, so that different hands in the hand image and types of the hands can be distinguished to be left hand or right hand. The YOLO model is directly used, so that the implementation difficulty of the extraction model can be reduced.

Since there are multiple hands in the hand image and the hands may be of the same type, for example, the hand image includes both the left hand of the a user and the left hand of the B user, the hands of the hand images in different frames may be confused during the dynamic gesture recognition. For example, the left hand of the user of the first frame a is dynamically gesture-recognized with the left hand of the user of the B in the second frame, whereby the recognized dynamic gesture is erroneous.

Thus, in some embodiments of the present application, after identifying the different hands in the hand region, or before cropping the hand region, the different hands in the respective hand regions of any two adjacent frames of hand images may be targeted for matching the same hand in the two adjacent frames of hand images.

The target tracking may be implemented by various target tracking algorithms or models, for example, may be a SORT (Simple Online and Realtime Tracking, an online real-time multi-target tracking algorithm based on kalman filtering and hungarian algorithm) algorithm, a target tracking model based on Siamese (twin) network, and the like. The foregoing is by way of example only and is not intended as limiting the application.

After the hand region is extracted, three-dimensional coordinates of hand key points in the hand region can be identified based on a preset key point detection algorithm. For example, an A2J (Anchor to Joint based fast three-dimensional hand keypoint detection algorithm) algorithm may be used to identify hand keypoint three-dimensional coordinates of a hand in a gesture region.

It can be understood that the above manner of determining the three-dimensional coordinates of the key points of the hand is only one embodiment provided by the present application, and in actual use, the manner, model, formula or algorithm used in each step may be selected according to the requirement, and part of the steps may be combined or simplified, and will not be further developed herein.

S120, based on the three-dimensional coordinates corresponding to the hand images, respectively determining the static gesture of the hand images and the motion parameters of the hands in the hand images.

In an embodiment of the application, the hand includes two parts, a palm and fingers. Dynamic gestures include a process of changing the palm and/or fingers as the hand is posed. Static gestures refer to gestures that are presented by the palm and/or fingers in a single frame of image, e.g., with the five fingers open, with only the thumb standing upright, etc. Thus, in embodiments of the present application, the dynamic gesture may be determined by a static gesture of the hand.

Then, since the hand changes are mainly represented by the motion of the whole hand or parts of the hand, such as the motion of fingers, the movement of the palm with the fingers, etc., the motion parameters of the hand can be used to characterize the motion of the hand in the continuous multi-frame hand images. Wherein, no sequence score is determined when the motion parameters of the hand and the static gesture are performed.

Thus, by means of the static gesture and the motion parameters, the motion of the hand in the continuous multi-frame gesture image, namely the dynamic gesture, can be simulated.

In an embodiment, determining a static gesture of a hand in each frame of hand image, and calculating the finger bending of each finger based on the three-dimensional coordinates of each hand key point in the hand image; determining each finger state based on a preset corresponding relation between the finger curvature and the finger state and the finger curvature; and matching the finger states with a preset static gesture library, and determining the static gesture corresponding to the hand image.

In this embodiment, the static gesture library includes a corresponding relationship between a finger state combination of each finger and a static gesture. Specifically, the static gesture library comprises the correspondence between the combination of the respective finger states of the thumb, the index finger, the middle finger, the ring finger and the little finger and the static gesture. For example, if the thumb, ring finger and little finger are in a curved state and the index finger and middle finger are in a straightened state, the corresponding static gesture is "V" or "Y", etc. The actual static gestures may be defined according to requirements, such as different combinations of finger states of different fingers, so as to define different static gestures.

In addition, the static gesture may also have a specific meaning, for example, the static gesture in the above example may represent other meanings such as "YES" or "YES", and the specific meaning may be defined according to the actual situation, and is not expanded herein.

The hand usually presents different gestures through each finger, and the gesture of the finger is usually represented by the curvature of the finger, so in this embodiment, the curvature of the finger can be determined first, then the finger state of each finger is calculated by using the curvature, the finger state is used for representing the gesture presented by the finger, and finally, the finger state combination of each finger is compared with a preset gesture library to determine the static gesture.

By distinguishing different fingers and combining finger states of different fingers, the gesture recognition method can be combined into more gesture types, the range capable of being recognized by gesture recognition is expanded, the accuracy of dynamic gesture recognition is improved, and the possibility that similar gestures are recognized into the same gesture type is reduced.

In addition, when the gesture recognition type is required to be added, the states of all fingers can be defined without retraining a model, so that the difficulty in expanding the gesture recognition range can be simplified, and the expanding cost of the dynamic gesture recognition range can be reduced.

In one embodiment, for the same finger, determining a plurality of vectors consisting of key points of two adjacent hands in the path from the palm to the finger tip of the finger; calculating the included angle between the vectors; and calculating the sum of included angles among vectors in the finger, wherein the sum of included angles is the curvature of the finger.

Taking fig. 2 as an example, for an index finger, in the path from the palm to the tip of the finger, a plurality of vectors composed of key points of two adjacent hands include W ₀ M ₂ 、M ₂ P ₂ 、P ₂ D ₂ And D ₂ T ₂ Similarly, for a middle finger, the plurality of vectors includes W ₀ M ₃ 、M ₃ P ₃ 、P ₃ D ₃ And D ₃ T ₃ The plurality of vectors of the ring finger include W ₀ M ₄ 、M ₄ P ₄ 、P ₄ D ₄ And D ₄ T ₄ The plurality of vectors of the pinky finger comprise W ₀ M ₅ 、M ₅ P ₅ 、P ₅ D ₅ And D ₅ T ₅ . For the thumb, the plurality of vectors includes W ₀ P ₁ 、M ₁ P ₁ 、P ₁ D ₁ And D ₁ T ₁ 。

In embodiments of the present application, the degree of bending of a finger may be characterized by the cumulative sum of the angles between adjacent joints on the finger, for example, when the hand is opened to make a fist, the index finger is held in a straightened state until it engages the palm, during which the sum of the angles between adjacent knuckles increases gradually. In this embodiment, the knuckle may be characterized by the vector, and the bending degree of the finger may be calculated by using a plurality of vectors composed of key points of two adjacent hands in the path from the palm to the tip of the finger.

As shown in FIG. 3, W ₀ As the key point of the palm, M ₁ 、M ₂ 、P ₁ 、P ₂ 、D ₁ 、D ₂ 、T ₁ 、T ₂ The key points of the fingers are respectively arranged on the fingers.

Then the angle a characterizes the vector W ₀ M ₂ And M ₂ P ₂ Included angle b represents vector M ₂ P ₂ And P ₂ D ₂ Included angle c represents vector P ₂ D ₂ And D ₂ T ₂ And the included angles a, b and c are added to obtain the included angle sum, and the included angle can represent the bending degree of the finger.

In addition, since the thumb structure is different from the remaining four fingers, in the present embodiment, the vector P can be calculated ₁ D ₁ And W is equal to ₀ P ₁ Included angle M ₁ P ₁ And P ₁ D ₁ Included angle of P ₁ D ₁ And D ₁ T ₁ Adding the three included angles to obtain the included angle sum, namely the bending degree of the thumb.

After the finger bending degree of each finger is determined, the state of the finger can be determined according to the preset corresponding relation. The preset correspondence may include correspondence between the finger state and the included angle, for example, a plurality of included angle ranges may be set, each included angle range corresponds to one finger state, and when the included angle sum of a certain finger falls into a certain included angle range, the finger state corresponding to the included angle range may be used as the finger state of the finger.

Because the thumb and the other four fingers have different structures, in the embodiment of the application, the preset corresponding relation can distinguish the thumb from the other four fingers.

In an embodiment, the preset corresponding relation includes a corresponding relation between the bending state of the thumb and the bending state of the thumb, and if the included angle and the corresponding finger are thumb, the included angle and the first threshold are compared; if the included angle sum is smaller than or equal to a first threshold value, determining that the thumb is in a bending state; if the sum of the included angles is greater than the first threshold, the thumb is in a straightened state.

In this embodiment, a first threshold value may be set and compared with the included angle to determine whether the thumb is bent or vertical.

In an embodiment, the preset corresponding relation includes a corresponding relation between the bending states and the bending states of the four fingers except the thumb, if the included angle and the corresponding finger are any one finger of the four fingers, comparing the sum of the included angles of the fingers with a second threshold, a third threshold and a fourth threshold in sequence, and determining the bending state of the finger; if the sum of the included angles is smaller than or equal to a second threshold value, determining that the finger is in a straightening state; if the included angle is between the second threshold value and the third threshold value, the finger is in a bending state; if the included angle is between the third threshold value and the fourth threshold value, the finger is in a grabbing state; and if the included angle sum is greater than or equal to a fourth threshold value, determining that the finger is in a holding state.

In this embodiment, the second threshold is smaller than the third threshold, and the third threshold is smaller than the fourth threshold. The four states of different bending degrees are respectively represented by straightening, bending, grasping and holding, the bending degrees are gradually increased, the bending degrees are not a certain action in a strict sense, but the finger is in an approximate state due to the increased bending degrees, in fact, the finger can be bent by one natural degree, but the bending degree does not reach the second threshold value, and the finger can still be considered to be in the straightened state.

The first threshold, the second threshold, the third threshold and the fourth threshold may be set according to finger gestures corresponding to different curvatures, and are not developed here.

For the other four fingers, three thresholds are set, so that judgment of four finger states of the other four fingers can be realized, and compared with judging whether the fingers are bent or not only, more finger states can provide richer recognition types for static and dynamic gesture recognition so as to realize differentiation of similar gestures and meet the requirements of gesture recognition of more types.

Furthermore, it will be appreciated that for any finger, after the angle of the fingertip is confirmed, the angular direction of the remaining joints, etc. are generally fixed, i.e. the fingertip reaches a certain angle, then the curvature of each joint on the finger may be determined, and thus the angles of the other joints may not need to be concerned. In general, if a user needs to perform a hand, the finger tip is mainly controlled during the hand operation, and the angles of the remaining joints are all operated in accordance with the posture of the finger tip. Therefore, in the embodiment of the application, the sum of the included angles from the palm center to the fingertips can be determined and compared with each threshold value, and the sum of the included angles from the palm center to each joint does not need to be calculated.

The setting of the first threshold, the second threshold, the third threshold and the fourth threshold is only one embodiment provided by the present application. In some other embodiments, more thresholds may also be set based on finer divisions of the gesture to identify more finger states of the finger. Or in some scenes without excessive complex gesture recognition, the setting quantity of the threshold values is reduced, so that the finger state type required to be recognized is reduced, and the complexity of dynamic gesture recognition is reduced.

In some embodiments, when judging the static gesture, the static gesture type can be also judged in combination with other factors. For example, the distance between the finger tips may be calculated based on the finger keypoint three-dimensional coordinates of the finger tips to determine more gesture types for the fingers in a combined or separated state in combination with the fingertip distances. The foregoing is merely exemplary, and other manners are possible, which are not described herein.

In addition, the foregoing is merely exemplary, but not limiting, and some embodiments of the present application may also employ some other ways of identifying static gestures, for example, determining the position relationship of each finger by using the three-dimensional coordinates of each hand key point, and determining the static gesture by comparing the position relationship with the gesture preset position relationship.

The dynamic gesture is related to the motion of the hand, and in embodiments of the present application, each motion-related parameter may be determined by the three-dimensional coordinates of each keypoint. For example, the motion track of the finger to which the finger key point belongs may be calculated according to the three-dimensional coordinates of the same finger key point in the continuous plurality of hand images, or the change condition of the finger may be determined according to the three-dimensional coordinate change of each finger key point in the same finger. The above is merely an example, but not a limitation, and the dynamics of each finger can be distinguished by the change of the three-dimensional coordinates of the key points of the finger for the same hand, and the same hand is not developed here.

In the embodiment of the present application, the dynamic gesture may also be a change of a hand in space, for example, waving a hand, beating, punching a fist, etc., so in this embodiment, when judging the dynamic gesture, the dynamic gesture may also be judged by combining at least one motion parameter of an angle, a motion direction, a motion speed, a motion track, etc. of the hand in space.

In one embodiment, the determining the angle of the hand in space includes: the angle of the hand in space is determined based on the palm center key point and the finger key point characterizing the base of the middle finger.

In this embodiment, the positional relationship between the palm center key point and the finger key point representing the root of the middle finger is not generally changed, and the direction of the whole palm can be generally indicated. The orientation of the palm is the orientation of the side of the hand from which the finger grows, as opposed to the orientation of the palm, for example, when the hand is making a fist, the orientation of the fist is the orientation of the palm.

In this embodiment, the angle of the hand in space can be determined by the vector between the palm center key point and the finger key point characterizing the root of the middle finger. Take fig. 2 as an exampleConstruct vector W ₀ M ₃ And put the vector into three-dimensional coordinates, put vector W ₀ M ₃ The direction pointed is determined as the angle of the hand in space.

In some embodiments, the hand type may also be determined and the orientation of the palm of the hand may be determined based on the hand type.

It will be appreciated that in the embodiments of the present application, various directions and angles of the hand may be determined in different manners to meet the recognition requirements of different gestures, which are merely examples, and are not limiting of the present application.

In one embodiment, at least one of the motion direction, the motion velocity, and the motion trajectory may be determined based on the respective palm center key points of the successive multi-frame hand images.

The motion of the palm typically characterizes the motion of the hand, e.g., the palm moves as the hand is swung, and thus, in this embodiment, at least one of the direction of motion, the velocity of motion, and the trajectory of motion may be determined based on the respective palm center keypoints of successive multi-frame hand images.

For example, vectors of three-dimensional coordinates of palm key points in two frames of hand images can be used for being put into the three-dimensional coordinates, and the movement direction between the two coordinates can be determined; or continuously recording three-dimensional coordinates of multi-frame palm center key points, and connecting the three-dimensional coordinates into a motion track; since the position of the hand in the space can be determined when the three-dimensional coordinates are determined, the motion distance of the hand can be determined by using two frames of hand images, and the motion time can be determined by using the acquisition interval for acquiring two frames of hand images, so that the motion speed can be determined by the motion distance and the motion time. The above implementation is merely an example, and the actual calculation of the motion pattern, the motion speed and the motion trajectory may be implemented by other ways, which are not developed here.

It should be noted that the motion parameters are specific to multiple hand images, that is, multiple hand passes correspond to the same motion parameters. For example, the motion speed calculated using 10 frames of hand images is the same as the motion speed of the 10 frames of hand images. If the motion parameters in the multi-frame images change, the multi-frame images can be split into a plurality of groups, and the motion parameters of different groups are calculated respectively.

The motion parameters calculated by using the key points of the finger may be different from the actual situation, for example, when the palm moves, the finger moves simultaneously, and the speed determined by using the three-dimensional coordinates of the key points of the finger may be larger or smaller. Therefore, the palm center key points are selected to calculate the movement speed, movement direction and movement track of the hand, so that the reliability is higher.

S130, determining a dynamic gesture based on the motion parameters of the respective hands of the continuous multi-frame hand images and the static gesture.

In this embodiment, after the static gesture and the motion parameter are acquired, the dynamic gesture may be determined according to a preset correspondence between the static gesture, the motion parameter and the dynamic gesture. The correspondence among the static gesture, the motion parameter and the dynamic gesture may be a pre-constructed dynamic gesture library, and the static gesture and the motion parameter may be compared to determine the corresponding motion parameter.

In addition, when the dynamic gestures are identified, the types of motion parameters corresponding to different dynamic gestures may be different. For example, a dynamic gesture sliding leftwards has small relevance to the motion speed and motion track, and is mainly a change of the motion direction, so that the motion track and the motion speed can not be limited or limited to a larger range when the motion parameters are not defined in the dynamic gesture library. Therefore, corresponding static gestures and motion parameters are respectively configured in the dynamic gesture library for different dynamic gesture types according to requirements, so that the requirements of different dynamic gesture recognition are met.

In some embodiments, the dynamic gesture library may include a preset two-hand dynamic gesture library including correspondence between two-hand dynamic gestures and static gestures and motion parameters of the left hand, and static gestures and motion parameters of the right hand.

Therefore, in the embodiment of the application, the static gesture and the motion parameter of the left hand, the static gesture and the motion parameter of the right hand can be matched with a preset double-hand dynamic gesture library, and the double-hand dynamic gesture can be determined.

By recognizing the gestures of both hands of the user, richer gesture recognition types can be provided, so that dynamic gestures can be understood more accurately. For example, the index finger of the left hand of the user is straightened, the other fingers are held tightly, the right hand is straightened, the palm center is placed on the index finger of the left hand and moves up and down to form a motion representing 'pause', if the gesture of the left hand or the right hand is independently recognized, the meaning of the gesture cannot be accurately recognized, and the meaning of the dynamic gesture of the two hands can be recognized as 'pause' through the preset dynamic gesture library of the two hands, so that the incorrect recognition of the dynamic gesture is reduced, and the accuracy of gesture recognition is improved.

In the dynamic gesture recognition, the hand in the single frame image may be in a certain dynamic gesture, and the single frame image may not recognize the dynamic gesture, so the embodiment of the application also provides a method for recognizing the instantaneous dynamic gesture, which characterizes the motion behavior of the hand in the single frame hand image by the instantaneous dynamic gesture.

In the embodiment of the application, the static gestures corresponding to the motion direction, the motion speed, the angles of the hands in the space and the hand images can be matched in a preset instantaneous dynamic gesture library, so that the instantaneous dynamic gestures corresponding to the hand images are determined; the instantaneous dynamic gesture library comprises the corresponding relation among instantaneous dynamic gestures, static gestures and motion parameters.

In this embodiment, the transient dynamic gesture library is defined with static gestures and motion parameters corresponding to different transient dynamic gestures, for example, for a beating action, the static gesture of the punch may be defined to make a punch, the motion direction is upward or downward, the motion track is moving up and down, and if the static gesture and motion parameters of a certain frame of hand image conform to the corresponding relationship, the transient dynamic gesture of the hand in the frame of hand image may be determined to be a beating action.

Therefore, for each frame of hand image, corresponding instantaneous dynamic gestures can be determined, when the dynamic gestures are recognized, the dynamic gestures can be determined as a combination of a plurality of instantaneous dynamic gestures, the plurality of instantaneous dynamic gestures are used for jointly determining a dynamic gesture with longer duration, and compared with the method of directly determining an integral dynamic gesture by using multiple frames of images, the method of dividing the integral dynamic gesture into the instantaneous dynamic gestures can distinguish continuous change process in one large action and detailed actions in the process, so that more complex dynamic gestures can be recognized, and the accuracy of dynamic gesture recognition is improved.

Further, some complex dynamic gestures may be split into multiple simple dynamic gestures, e.g., a left-right hand swing, and thus, in some embodiments of the present application, a continuous dynamic gesture of longer duration may be split into multiple transient dynamic gestures of shorter duration when the dynamic gesture is recognized.

In embodiments of the present application, a gesture queue may be provided with different transient dynamic gestures recorded in the gesture queue. If the instantaneous dynamic gestures corresponding to the continuous preset frame number hand images are the same, the instantaneous dynamic gestures can be added into a gesture queue, and then continuous multiple types of instantaneous dynamic gestures in the gesture queue are matched with a preset continuous dynamic gesture library to determine the continuous dynamic gestures.

It should be noted that, the types of transient dynamic gestures added to the gesture queue are different between the front transient dynamic gesture and the rear transient dynamic gesture. For example, a left-right hand swing may include four transient dynamic gestures of left hand swing, right hand swing, left hand swing, and right hand swing for each transient dynamic gesture in the gesture queue.

In this embodiment, the preset frame number may be set according to requirements, for example, 3 frames, 4 frames or more frames, and the preset frame number is set to reduce errors in transient dynamic gesture recognition on part of the hand image, thereby reducing the possibility of abnormality in continuous dynamic gesture recognition and improving the accuracy of recognition.

The continuous dynamic gesture library may define a correspondence between different instantaneous dynamic gesture arrangements in the gesture queue and continuous dynamic gestures, and when the arrangement of the instantaneous dynamic gestures recorded in the gesture queue is the same as a certain continuous dynamic gesture in the scale dynamic gesture library, the continuous dynamic gesture is determined to be a recognition result.

The complex continuous dynamic gesture is split into a plurality of instantaneous dynamic gesture combinations, detailed gestures in the dynamic gestures can be distinguished, various similar dynamic gestures are distinguished, types of recognizable dynamic gestures are enriched, and therefore accuracy of dynamic gesture recognition is improved.

For ease of understanding, the static gesture, transient dynamic gesture, and persistent dynamic gesture described above are distinguished herein. First, a static gesture is a gesture that a hand in a frame of image assumes, for example, five fingers open, bijean, OK, etc., and cannot show a change in movement, bending, etc. of the hand. The motion of the hand in one frame of hand image is represented by the instantaneous dynamic gesture, for example, the hand image is an image in the process of waving the hand, the hand of each frame of image is a static gesture with five fingers open in the process of waving the hand, and in practice, the hand in each frame of hand image is in the motion of waving the hand, so that the instantaneous dynamic gesture corresponding to each frame of hand image is waving the hand. Then, the continuous dynamic gesture represents an action formed by the continuous multi-frame hand images together, for example, the left hand waving and the right hand waving are divided into the left hand waving and the right hand waving, each hand image in the continuous multi-frame hand images corresponds to the left hand waving or the right hand waving respectively, the continuous multi-frame hand images are combined to jointly represent the action of the left hand waving and the right hand waving, and then the continuous dynamic gesture corresponding to the continuous multi-frame hand images is the left hand waving and the right hand waving.

To facilitate an understanding of the application, an embodiment of the application is provided herein for illustration.

As shown in fig. 4, first, a hand in a continuous multi-frame hand image or video of a dynamic gesture to be recognized may be recognized, and a hand key point may be determined.

And then, determining the bending degree of the thumb by utilizing the key points of the fingers of the thumb, calculating the bending degree of each finger by utilizing the key points of the fingers of the index finger, the middle finger, the ring finger and the little finger and the palm key point, and determining the finger state of each finger by utilizing the bending degree of each finger. And matching the finger states with a preset static gesture library, and determining the static gesture corresponding to the hand. For example, the hand is in a state in which the five fingers are straightened, and the corresponding static gesture is "palm".

Meanwhile, the palm key point can be used for determining the motion parameters such as the motion speed, the motion direction, the motion track and the like of the hand, and the palm key point and the finger key point M3 at the root of the middle finger as shown in fig. 2 can be used for determining the angle of the hand in space. For example, a total of 15 frames of hand images, the angles of the continuous 15 frames of hand in space are upward, the calculated motion speed of the continuous 15 frames of hand images is 1m/s, the motion track is left, right, left and right, the motion direction of the first 4 frames is left, the 5 th to 8 th frames are right, the 9 th to 12 th frames are left, and the 13 th to 15 th frames are right.

And secondly, combining the static gesture and the motion parameters, comparing any frame of hand image with a preset instantaneous dynamic gesture library, and determining the instantaneous dynamic gesture corresponding to each frame of hand image. The instantaneous dynamic gesture corresponding to each frame of hand image of the first 4 frames is a left hand swing, the 5 th to 8 th frames are right hand swings, the 9 th to 12 th frames are left hand swings, and the 13 th to 15 th frames are right hand swings.

Each hand waving action is larger than or equal to a preset 3 frames, and then four instantaneous dynamic gestures of left hand waving, right hand waving, left hand waving and right hand waving can be sequentially added into the gesture queue.

And finally, comparing the combination of the instantaneous dynamic gestures in the gesture queue with a preset continuous dynamic gesture library to determine the corresponding continuous dynamic gesture. For example, the gesture queue may be identified as a side-to-side swing.

Based on the same inventive concept, an embodiment of the present application further provides a dynamic gesture recognition device, referring to fig. 5, and fig. 5 is a schematic diagram of the dynamic gesture recognition device according to an embodiment of the present application. The dynamic gesture recognition apparatus includes: a three-dimensional module 110, a first recognition module 120, and a second recognition module 130.

The three-dimensional module 110 is configured to determine, for each of the plurality of successive hand images, three-dimensional coordinates of each hand key point in the hand image in space.

The first recognition module 120 is configured to determine a static gesture of each hand image and a motion parameter of a hand in each hand image based on three-dimensional coordinates corresponding to each hand image; the motion parameter characterizes a motion trend of the hand.

The second recognition module 130 is configured to determine a dynamic gesture based on the motion parameters and the static gesture of the respective hands of the continuous multi-frame hand images.

In one embodiment, the three-dimensional module 110 is further configured to acquire two hand images of the same frame acquired by the binocular camera; matching the two hand images through a preset matching model to obtain a binocular disparity map; determining a depth map based on a preset depth calculation formula and a binocular disparity map; extracting a hand region from the depth map based on a preset extraction model; based on a preset key point detection algorithm, three-dimensional coordinates of hand key points in the hand region are identified.

In one embodiment, the three-dimensional module 110 is further configured to perform epipolar rectification on the two hand images.

In an embodiment, the hand region includes a plurality of hands, the three-dimensional module 110 includes a preset extraction model, which is a YOLO model, and the three-dimensional module 110 is further configured to identify different hands and types of each hand in the hand region based on the YOLO model; the type of hand includes left hand or right hand.

In an embodiment, the three-dimensional module 110 is further configured to perform object tracking on different hands in the hand regions of each of any two adjacent frames of hand images, so as to match the same hands in the two adjacent frames of hand images.

In an embodiment, the second recognition module 130 is further configured to match the static gesture and the motion parameter of the left hand, the static gesture and the motion parameter of the right hand with a preset two-hand dynamic gesture library to determine a two-hand dynamic gesture; the preset two-hand dynamic gesture library comprises corresponding relations between the two-hand dynamic gestures and static gestures and motion parameters of the left hand and between the two-hand dynamic gestures and motion parameters of the right hand.

In an embodiment, the first recognition module 120 is further configured to calculate, for each frame of the hand image, a finger curvature of each finger based on three-dimensional coordinates of each hand key point in the hand image, determine each finger state based on a preset correspondence between the finger curvature and the finger state and the finger curvature, match each finger state with a preset static gesture library, and determine a static gesture corresponding to the hand image, where the static gesture library includes a correspondence between each finger state and the static gesture.

In one embodiment, the first identifying module 120 is further configured to determine, for the same finger, a plurality of vectors consisting of key points of two adjacent hands in a path from the palm to the fingertip of the finger; calculating the included angle between the vectors; and calculating the sum of included angles among vectors in the finger, wherein the sum of included angles is the curvature of the finger.

In an embodiment, the preset correspondence includes a correspondence between a bending state of the thumb and a bending state of the thumb, and the first identifying module 120 is further configured to compare the included angle with a first threshold if the included angle and the corresponding finger are thumbs; if the included angle sum is smaller than the first threshold value, determining that the thumb is in a bending state; and if the included angle sum is larger than the first threshold value, the thumb is in a straightening state.

In an embodiment, the preset correspondence includes correspondence between bending and bending states of the four fingers except the thumb, and the first identifying module 120 is further configured to compare, if the included angle and the corresponding finger are any one of the four fingers, a sum of the included angles of the fingers with a second threshold, a third threshold, and a fourth threshold in sequence, and determine the bending state of the finger; wherein the second threshold is less than the third threshold, and the third threshold is less than the fourth threshold; if the sum of the included angles is smaller than the second threshold value, determining that the finger is in a straightening state; if the included angle is between the second threshold value and the third threshold value, the finger is in a bending state; if the included angle is between the third threshold value and the fourth threshold value, the finger is in a grabbing state; and if the included angle sum is larger than the fourth threshold value, determining that the finger is in a holding state.

In one embodiment, the motion parameter includes at least one of an angle of the hand in space, a direction of motion, a speed of motion, and a trajectory of motion.

In one embodiment, the hand keypoints include a palm keypoint and a finger keypoint, and the first identification module 120 is further configured to determine the angle of the hand in space based on the palm keypoint and the finger keypoint characterizing the root of the middle finger.

In one embodiment, the first identifying module 120 is further configured to determine at least one of a motion direction, a motion velocity, and a motion trajectory based on the palm center key points of the successive multi-frame hand images.

In an embodiment, the dynamic gestures include transient dynamic gestures, the transient dynamic gestures are used for representing motion behaviors of the hand in a single frame of hand image, and the second recognition module 130 is further used for matching a motion direction, a motion speed, angles of the hand in space and static gestures corresponding to the hand image in a preset transient dynamic gesture library to determine the transient dynamic gestures corresponding to the hand image; the instantaneous dynamic gesture library comprises the corresponding relation among instantaneous dynamic gestures, static gestures and motion parameters.

In an embodiment, the dynamic gesture further includes a continuous dynamic gesture, the continuous dynamic gesture characterizes motion behaviors of the hands in the continuous multi-frame hand images, and the second recognition module 130 is further configured to add the instantaneous dynamic gesture corresponding to the continuous preset frame number hand images to the gesture queue if the instantaneous dynamic gesture is the same; matching continuous multiple types of instantaneous dynamic gestures in the gesture queue with a preset continuous dynamic gesture library, and determining the continuous dynamic gestures, wherein the continuous dynamic gesture library comprises corresponding relations between the continuous dynamic gestures and the multiple instantaneous dynamic gestures.

It can be appreciated that the functions implemented by the dynamic gesture recognition apparatus are similar to those of the dynamic gesture recognition method described above, and specific implementation functions may refer to the dynamic gesture recognition method, which is not described herein.

Based on the same inventive concept, an embodiment of the present application further provides a dynamic gesture recognition system, referring to fig. 6, fig. 6 is a schematic diagram of the dynamic gesture recognition system provided by the embodiment of the present application, where the dynamic gesture recognition system 200 includes: an image acquisition device 210 and a processing device 220.

An image acquisition device 210 for acquiring hand images.

In this embodiment, the image capturing device 210 includes various devices with image capturing functions, such as a camera, a video camera, a mobile phone with a camera, a computer, etc., and the specific implementation of the image capturing device may refer to the prior art and will not be further described herein.

In some embodiments of the present application, the image capture device 210 may include a binocular camera to capture binocular images including hands.

The processing device 220 is in communication with the image acquisition device.

In this embodiment, the processing device 220 may be in communication connection with the image capturing device through various communication modes such as bluetooth, serial port, wireless network, etc. so as to receive the hand image captured by the image capturing device.

In this embodiment, the processing device 220 may perform dynamic gesture recognition based on the hand image acquired by the image acquisition device, so as to obtain a recognition result of the dynamic gesture. The processing device may implement the dynamic gesture recognition method to recognize the dynamic gesture.

Based on the same inventive concept, the embodiment of the application also provides a handwriting recognition method, please refer to fig. 7, and fig. 7 is a flowchart of the handwriting recognition method provided by the embodiment of the application. The handwriting recognition method comprises the following steps:

s210, identifying dynamic gestures of a user.

In this embodiment, the dynamic gesture may be identified based on the foregoing dynamic gesture identification method, which is not described herein.

S220, if the dynamic gesture is a first preset gesture, recording a movement track of the target finger based on a preset layer.

In this embodiment, the first preset gesture is a preset gesture for representing that the user needs to write. After the first preset gesture is identified, a layer can be constructed, and the moving track of the finger can be recorded by the layer. The moving track can be calculated and determined according to the three-dimensional coordinates of the key points of the fingertips of the target fingers. And will not be deployed here.

For example, the first preset gesture may be the action of straightening the index finger and making a fist with the other four fingers, and after the dynamic gesture is recognized, the movement track of the tip of the index finger is recorded.

S230, if the dynamic gesture is a second preset gesture, identifying a moving track recorded on a preset layer to obtain an identification result.

In this embodiment, the second preset gesture represents stopping handwriting, and after the second preset gesture is identified, recording of the movement track may be stopped, and the recorded movement track is input into a handwriting recognition model constructed in advance to identify the handwritten content. The handwriting recognition model may refer to the prior art and is not developed here.

In some embodiments, before the moving track is input into the handwriting recognition model, the minimum circumscribed rectangle where the moving track is located can be cut, and the adjacent line segments are connected by using morphological closing operation to fill the holes, so that the operation amount of recognition is reduced, the recognition of irrelevant contents is reduced, and the recognition efficiency and accuracy are improved.

For example, as shown in fig. 8, when a second preset gesture is recognized after handwriting a number 6, the number 6 may be selected by a rectangle, and the image obtained by clipping is output to a preset handwriting recognition model, so as to obtain a recognition result of "6".

In an embodiment, after S210, if the identified dynamic gesture is a third preset gesture, the movement track recorded in the preset layer is cleared.

In this embodiment, the third preset gesture is a gesture for clearing the record of the layer, and when the handwriting is wrong or the handwriting is completed and the next handwriting content needs to be identified, the record of the layer can be cleared through the third preset gesture to re-record the handwriting content.

The first preset gesture, the second preset gesture and the third preset gesture in the above embodiment may be set to be different types of dynamic gestures according to requirements, and specific types of gestures are not limited herein.

The dynamic gesture recognition method or handwriting recognition method described above may be implemented in the form of computer readable instructions that may be executed on an electronic device as shown in fig. 9.

Referring to fig. 9, an embodiment of the present application further provides an electronic device 300, which may be used as an execution body of the dynamic gesture recognition method or the handwriting recognition method, including: a processor 310 and a memory 320 communicatively coupled to the processor 310.

The memory 320 stores instructions executable by the processor 310, and the instructions are executed by the processor 310 to enable the processor 310 to perform the dynamic gesture recognition method or the handwriting recognition method in the foregoing embodiments.

The processor 310 and the memory 320 may be connected by a communication bus. Or by some communication module, such as: wireless communication module, bluetooth communication module, 4G/5G communication module, etc.

The processor 310 may be an integrated circuit chip with signal processing capabilities. The processor 310 may be a general-purpose processor including a CPU (Central Processing Unit ), NP (Network Processor, network processor), etc.; but may be a digital signal processor, an application specific integrated circuit, an off-the-shelf programmable gate array or other programmable logic device or transistor logic device, discrete hardware components. Which may implement or perform the disclosed methods, steps, and logic blocks in embodiments of the application. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

Memory 320 may include, but is not limited to, RAM (Random Access Memory ), ROM (Read Only Memory), PROM (Programmable Read-Only Memory, programmable Read Only Memory), EPROM (Erasable Programmable Read-Only Memory, erasable programmable Read Only Memory), EEPROM (Electric Erasable Programmable Read-Only Memory, electrically erasable programmable Read Only Memory), and the like.

It will be appreciated that the electronic device 300 may also include more general modules as needed by itself, and not described in detail in this embodiment of the application.

Based on the same inventive concept, the embodiments of the present application also provide a computer-readable storage medium having stored thereon a computer program which, when executed, performs the method provided in the above embodiments.

The storage media may be any available media that can be accessed by a computer or a data storage device such as a server, data center, or the like that contains an integration of one or more available media. The usable medium may be a magnetic medium (e.g., a floppy Disk, a hard Disk, a magnetic tape), an optical medium (e.g., a DVD), or a semiconductor medium (e.g., an SSD (Solid State Disk)), or the like.

In the embodiments provided in the present application, it should be understood that the disclosed method and apparatus may be implemented in other manners. The device embodiments described above are merely illustrative. The functional modules in the embodiments of the present application may be integrated together to form a single part, or the functional modules may exist alone, or two or more modules may be integrated to form a single part.

The functions, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a ROM (Read-Only Memory), a RAM (Random Access Memory), a magnetic disk, an optical disk, or other various media capable of storing program codes.

The foregoing is merely illustrative of the present application, and the present application is not limited thereto, and any person skilled in the art will readily recognize that variations or substitutions are within the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

Claims

1. A method of dynamic gesture recognition, comprising:

for each frame of hand images in continuous multi-frame hand images, determining three-dimensional coordinates of each hand key point in the hand images in space; the three-dimensional coordinates are determined based on binocular disparity maps corresponding to the hand images;

based on the three-dimensional coordinates corresponding to the hand images, respectively determining static gestures of the hand images and motion parameters of hands in the hand images; the motion parameters represent the motion condition of hands in continuous multi-frame hand images;

Determining a dynamic gesture based on the static gesture and the motion parameters of the hand of each of the successive frames of the hand image;

the dynamic gestures include transient dynamic gestures and persistent dynamic gestures; the continuous dynamic gesture is determined based on the instantaneous dynamic gesture corresponding to the hand image with the continuous preset frame number; the instantaneous dynamic gestures are used for representing the motion behaviors of the hand in a single frame of the hand image; the continuous dynamic gesture characterizes the motion behavior of the hand in the continuous multi-frame hand image;

the motion parameters include: angle of hand in space, direction of movement and speed of movement;

the transient dynamic gesture is determined by: matching the motion direction, the motion speed, the angle of the hand in space and the static gesture corresponding to the hand image in a preset instantaneous dynamic gesture library, and determining the instantaneous dynamic gesture corresponding to the hand image; the transient dynamic gesture library comprises the corresponding relation among the transient dynamic gesture, the static gesture and the motion parameter.

2. The method of claim 1, wherein determining three-dimensional coordinates of each hand keypoint in the hand image in space comprises:

Acquiring two hand images of the same frame acquired by a binocular camera;

matching the two hand images through a preset matching model to obtain the binocular parallax image;

determining a depth map based on a preset depth calculation formula and the binocular disparity map;

extracting a hand region from the depth map based on a preset extraction model;

and identifying three-dimensional coordinates of the hand key points in the hand area based on a preset key point detection algorithm.

3. The method of claim 2, wherein before the matching the hand image by the preset matching model, the method further comprises: and carrying out polar correction on the two hand images.

4. The dynamic gesture recognition method of claim 2, wherein the hand region includes a plurality of hands, and the preset extraction model is a YOLO model; after the hand region is extracted from the depth map based on the preset extraction model, the method further comprises:

identifying different hands and types of hands in the hand region based on the YOLO model; the type of hand includes left hand or right hand.

5. The method of dynamic gesture recognition of claim 4, wherein after the recognition of the different hands in the hand region, the method further comprises:

And carrying out target tracking on different hands in the hand areas of the hand images of any two adjacent frames so as to match the same hands in the hand images of the two adjacent frames.

6. The method of claim 5, wherein the determining a dynamic gesture based on the motion parameter and the static gesture comprises:

matching the static gesture and the motion parameter of the left hand, the static gesture and the motion parameter of the right hand with a preset double-hand dynamic gesture library to determine double-hand dynamic gestures; the preset two-hand dynamic gesture library comprises corresponding relations between the two-hand dynamic gestures, the static gestures and the motion parameters of the left hand and the static gestures and the motion parameters of the right hand.

7. The method according to claim 1, wherein the determining the static gesture of each hand image and the motion parameter of the hand in each hand image based on the three-dimensional coordinates corresponding to each hand image includes:

for each frame of the hand image, calculating the finger curvature of each finger based on the three-dimensional coordinates of each hand key point in the hand image;

Determining each finger state based on a preset corresponding relation between the finger curvature and the finger state and the finger curvature;

and matching the finger states with a preset static gesture library, and determining a static gesture corresponding to the hand image, wherein the static gesture library comprises the corresponding relation between the finger states and the static gesture.

8. The method according to claim 7, wherein calculating the degree of curvature of each finger based on the three-dimensional coordinates of each hand key point in the hand image comprises:

for the same finger, determining a plurality of vectors consisting of key points of two adjacent hands in the path from the palm center to the finger tip of the finger;

calculating an included angle between the vectors;

and calculating the sum of included angles among vectors in the finger, wherein the sum of included angles is the curvature of the finger.

9. The method according to claim 8, wherein the predetermined correspondence includes a correspondence between a thumb curvature and a thumb curvature state, the determining each finger state based on the predetermined correspondence between a finger curvature and a finger state and the finger curvature includes:

If the included angle and the corresponding finger are thumbs, comparing the included angle with a first threshold;

if the included angle sum is smaller than or equal to the first threshold value, determining that the thumb is in a bending state;

and if the included angle sum is larger than the first threshold value, the thumb is in a straightening state.

10. The dynamic gesture recognition method of claim 8, wherein the preset correspondence includes correspondence between bending and bending states of the remaining four fingers except for the thumb, and the determining each finger state based on the preset correspondence between finger bending and finger state and the finger bending includes:

if the included angle and the corresponding finger are any one finger of the other four fingers, comparing the included angle and the corresponding finger with a second threshold value, a third threshold value and a fourth threshold value in sequence, and determining the bending state of the finger;

wherein the second threshold is less than the third threshold, the third threshold is less than the fourth threshold;

if the included angle sum is smaller than or equal to the second threshold value, determining that the finger is in a straightening state;

if the included angle is between the second threshold value and the third threshold value, the finger is in a bending state;

If the included angle is between the third threshold value and the fourth threshold value, the finger is in a grabbing state;

and if the included angle sum is larger than or equal to the fourth threshold value, determining that the finger is in a holding state.

11. The dynamic gesture recognition method of any one of claims 1 to 10, wherein the motion parameters include at least one of an angle of a hand in space, a motion direction, a motion speed, and a motion trajectory.

12. The method of claim 11, wherein the hand keypoints comprise a palm center keypoint and a finger keypoint, and the motion parameter comprises a hand angle in space;

the determining, based on the three-dimensional coordinates corresponding to each hand image, a static gesture of each hand image and a motion parameter of a hand in each hand image respectively includes:

an angle of the hand in space is determined based on the palm center keypoint and a finger keypoint characterizing a root of a middle finger.

13. The method of claim 11, wherein the hand keypoints comprise palm center keypoints, and the motion parameters comprise at least one of a direction of motion, a speed of motion, and a motion trajectory;

and determining at least one of the motion direction, the motion speed and the motion track based on the palm center key points of the hand images of the continuous multiple frames.

14. The method of claim 11, wherein the dynamic gesture recognition is performed,

after the determination of the transient dynamic gesture corresponding to the partial image, the method further includes:

if the number of continuous preset frames is the same as the number of the instantaneous dynamic gestures corresponding to the hand images, adding the instantaneous dynamic gestures into a gesture queue;

matching the continuous multiple types of the instantaneous dynamic gestures in the gesture queue with a preset continuous dynamic gesture library, and determining the continuous dynamic gestures, wherein the continuous dynamic gesture library comprises corresponding relations between the continuous dynamic gestures and the instantaneous dynamic gestures.

15. A dynamic gesture recognition apparatus, comprising:

the three-dimensional module is used for determining three-dimensional coordinates of each hand key point in the hand image in space for each frame of hand image in the continuous multi-frame hand images; the three-dimensional coordinates are determined based on binocular disparity maps corresponding to the hand images;

The first recognition module is used for respectively determining static gestures of each hand image and motion parameters of hands in each hand image based on the three-dimensional coordinates corresponding to each hand image; the motion parameters represent the motion condition of hands in continuous multi-frame hand images;

a second recognition module, configured to determine a dynamic gesture based on the static gesture and the motion parameters of the hand of each of the continuous multiple frames of hand images; the dynamic gestures include transient dynamic gestures and persistent dynamic gestures; the continuous dynamic gesture is determined based on the instantaneous dynamic gesture corresponding to the hand image with the continuous preset frame number; the instantaneous dynamic gestures are used for representing the motion behaviors of the hand in a single frame of the hand image; the continuous dynamic gesture characterizes the motion behavior of the hand in the continuous multi-frame hand image;

the second recognition module is further configured to determine the transient dynamic gesture by: matching the motion direction, the motion speed, the angle of the hand in space and the static gesture corresponding to the hand image in a preset instantaneous dynamic gesture library, and determining the instantaneous dynamic gesture corresponding to the hand image; the transient dynamic gesture library comprises the corresponding relation among the transient dynamic gesture, the static gesture and the motion parameter.

16. A dynamic gesture recognition system, comprising:

the image acquisition equipment is used for acquiring hand images;

processing device, in communication with the image acquisition device, for receiving the hand image and performing the dynamic gesture recognition method according to any one of claims 1-14.

17. The dynamic gesture recognition system of claim 16, wherein the image capture device comprises a binocular camera.

18. A handwriting recognition method, comprising:

identifying a dynamic gesture of a user based on the dynamic gesture recognition method of any one of claims 1-14;

if the dynamic gesture is a first preset gesture, recording a movement track of the target finger based on a preset layer;

and if the dynamic gesture is a second preset gesture, identifying the moving track recorded on the preset layer to obtain an identification result.

19. The handwriting recognition method according to claim 18, wherein after the moving track of the target finger is recorded based on the preset layer, the method further comprises:

and if the dynamic gesture is a third preset gesture, clearing the moving track recorded by the preset layer.

20. A computer readable storage medium, wherein a computer program is stored in the readable storage medium, which when run on a computer causes the computer to perform the dynamic gesture recognition method according to any one of claims 1-14 or the handwriting recognition method according to any one of claims 18-19.

21. An electronic device comprising a memory and a processor, the memory having stored therein computer readable instructions that, when executed by the processor, cause the processor to perform the dynamic gesture recognition method of any one of claims 1-14 or the handwriting recognition method of any one of claims 18-19.