CN113705378A

CN113705378A - Sample data generation method and device and electronic equipment

Info

Publication number: CN113705378A
Application number: CN202110919681.1A
Authority: CN
Inventors: 周凡
Original assignee: Guangzhou Huya Technology Co Ltd
Current assignee: Guangzhou Huya Technology Co Ltd
Priority date: 2021-08-11
Filing date: 2021-08-11
Publication date: 2021-11-26

Abstract

The embodiment of the specification provides a method and a device for generating sample data and electronic equipment. When sample data used for training the deep learning model is constructed, a hand kinematics model can be constructed, the hand kinematics model is used for describing the influence of hand gestures and hand skeleton lengths on positions of key points of hands, and then multiple groups of sample data are generated according to hand gesture information of standard gestures defined by a user in advance and the hand kinematics model. By the method, a large amount of sample data can be quickly and conveniently obtained, a user does not need to manually mark key points of hands, the acquisition difficulty of the sample data is reduced, the constructed sample data can be more diversified, the accuracy is higher, and the prediction precision of the trained deep learning model can be improved.

Description

Sample data generation method and device and electronic equipment

Technical Field

The present specification relates to the field of artificial intelligence technologies, and in particular, to a method and an apparatus for generating sample data, and an electronic device.

Background

Gesture recognition has a wide range of applications in many fields, for example, smart devices can be controlled by gestures. At present, when the user gesture is recognized, 2D position information or 3D position information of a hand key point may be first acquired from an RGB image or a depth image including the hand, and then the 2D or 3D position information is input into a depth learning model trained in advance to predict the hand gesture in the image. When the deep learning model is trained, because the hand key points and the hand postures in the image need to be labeled, the method is relatively complicated, and the constructed sample data is limited, so that the prediction precision of the trained deep learning model can be influenced.

Disclosure of Invention

Based on this, the present specification provides a method and an apparatus for generating sample data, and an electronic device.

According to a first aspect of embodiments herein, there is provided a method of generating sample data for training a deep learning model for predicting hand pose information based on hand keypoint location information, the method comprising:

acquiring hand posture information of a standard gesture, wherein the hand posture information of the standard gesture is preset by a user;

generating a plurality of groups of sample data based on the hand posture information of the standard gesture and a preset hand kinematics model, wherein each group of sample data comprises hand key point position information and hand posture information corresponding to the hand key point position information;

the hand kinematics model is used for describing the hand gesture and the influence of the hand skeleton length on the positions of the hand key points.

According to a second aspect of embodiments herein, there is provided an apparatus for generating sample data for training a deep learning model for predicting hand pose information based on hand keypoint location information, the apparatus comprising:

the acquisition module is used for acquiring hand posture information of a standard gesture, and the hand posture information of the standard gesture is preset by a user;

the sample data generating module is used for generating a plurality of groups of sample data based on the hand posture information of the standard gesture and a preset hand kinematics model, wherein each group of sample data comprises hand key point position information and hand posture information corresponding to the hand key point position information;

According to a third aspect of embodiments of the present specification, there is provided an electronic device, the electronic device including a processor, a memory, and a computer program stored in the memory and executable by the processor, the processor implementing the method mentioned in the first aspect when executing the computer program.

By applying the scheme of the embodiment of the specification, when sample data used for training a deep learning model is constructed, a hand kinematics model can be constructed, the hand kinematics model is used for describing the influence of hand gestures and hand skeleton lengths on positions of hand key points, then multiple groups of sample data are generated according to hand gesture information of standard gestures predefined by a user and the hand kinematics model, the sample data comprises hand key point position information and hand gesture information corresponding to the hand key point position information, the hand key point position information is used for being input as a sample of the deep learning model, and the hand gesture information is used for being used as a label of the sample input. By the method, a large amount of sample data can be quickly and conveniently obtained, a user does not need to manually mark key points of hands, the acquisition difficulty of the sample data is reduced, the constructed sample data can be more diversified, the accuracy is higher, and the prediction precision of the trained deep learning model can be improved.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the specification.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present specification and together with the description, serve to explain the principles of the specification.

Fig. 1 is a flowchart of a sample data generation method according to an embodiment of the present specification.

FIG. 2 is a schematic view of a hand model in one embodiment of the present description.

FIG. 3 is a diagram of a standard gesture, in one embodiment of the present description.

FIG. 4(a) is a schematic diagram of an interactive interface for setting standard gestures by a user according to an embodiment of the present disclosure.

FIG. 4(b) is a diagram illustrating a designated gesture, according to an embodiment of the present disclosure.

Fig. 5(a) and 5(b) are schematic diagrams of uniform sampling of wrist joint rotation angle in a spherical surface according to an embodiment of the present disclosure.

Fig. 6 is a schematic flowchart of generating sample data according to an embodiment of the present specification.

Fig. 7 is a block diagram of a logical structure of a sample generation apparatus according to an embodiment of the present disclosure.

Fig. 8 is a block diagram of a logical structure of an electronic device according to an embodiment of the present disclosure.

Detailed Description

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present specification. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the specification, as detailed in the appended claims.

The terminology used in the description herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the description. As used in this specification and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items.

It should be understood that although the terms first, second, third, etc. may be used herein to describe various information, these information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, the first information may also be referred to as second information, and similarly, the second information may also be referred to as first information, without departing from the scope of the present specification. The word "if" as used herein may be interpreted as "at … …" or "when … …" or "in response to a determination", depending on the context.

Gesture recognition has a wide range of applications in many fields, for example, smart devices can be controlled by different gestures. Accurately recognizing user gestures is a prerequisite for implementing gesture control. At present, when a user gesture is recognized, two ways are mainly adopted, one way is to directly input an RGB image or a depth image containing a hand into a depth learning model trained in advance, and then the depth learning model directly outputs a prediction result of the hand gesture in the image, but the way lacks a judgment basis for whether the prediction result is correct in a model reasoning stage, and the interpretability is poor.

Another method is to acquire 2D position information or 3D position information of a hand key point from an RGB image or a depth image including the hand, input the 2D or 3D position information into a depth learning model, and output a prediction result of a hand gesture in the image by the depth learning model. In the mode, the position information of the hand key points can be used in combination with the reprojection error to carry out secondary verification on the hand gesture in the model reasoning stage, and the reasonability of the output hand gesture is ensured. However, by predicting the hand gesture in this way, the difficulty of acquiring the sample data set is high when the deep learning model is trained. For example, an RGB image or a depth image including a hand needs to be acquired first, then, a hand key point is marked in the RGB image or the depth image, and a hand gesture in the image is determined, which is difficult to expand a sample data set, limits the scale of the sample data set, causes few types of included samples and cannot cover all sample spaces, so that a hand prediction result of a depth learning model obtained by training based on the sample data set is often inaccurate, and sample data of a specific gesture which is interested in people obtained by the method is often less, and the accuracy of a recognition result of the specific gesture is often lower.

Based on this, the present application provides a method for generating sample data, which may be used to train a deep learning model, which may be used to predict hand pose information based on position information of a hand keypoint, and thus, each set of sample data used to train the deep learning model may include, as an input sample of the model, hand keypoint position information and may include, as a tag of the input sample, hand pose information corresponding to the hand keypoint position information.

According to the embodiment of the application, a large amount of sample data can be generated based on the hand posture information of the standard gesture predefined by the user and the hand kinematics model constructed in advance, the hand key points do not need to be manually marked by the user, the acquisition difficulty of the sample data is reduced, a large amount of sample data can be quickly and conveniently obtained, the constructed sample data can be more diversified, the accuracy is higher, and the prediction precision of the deep learning model of the training can be improved.

The sample data generation method provided by the embodiment of the application can be executed by the terminal equipment with the function of generating sample data, and the function of generating sample data can exist in the terminal equipment in the form of an application program, a toolkit, a component or a service.

Specifically, as shown in fig. 1, the method for generating sample data provided in the embodiment of the present application includes the following steps:

s102, acquiring hand posture information of a standard gesture, wherein the hand posture information of the standard gesture is preset by a user;

the hand posture information in the embodiment of the application can be the angle value of each joint corner of the hand. Generally, as shown in fig. 2, the structure of the hand can be described by using 21 hand key points (i.e. skeleton points), and the left, middle and right panels in fig. 2 are respectively a hand skeleton distribution diagram, a 21 hand key point distribution diagram, and a hand joint corner diagram. The hand joint corners include wrist joint corners and 15 finger joint corners. As can be seen from fig. 2, the hand posture is determined by the hand joint rotation angle, that is, when the wrist joint rotation angle and the 15 finger joint rotation angle in the hand are determined, the corresponding hand posture can also be determined.

The standard gesture in the embodiment of the present application may be a manually defined gesture with specific semantics, such as an "OK gesture", a "bixin gesture", a "scissors hand", and the like, as shown in fig. 3, which is a schematic diagram of different standard gestures. Because gesture recognition usually recognizes a gesture (i.e. a standard gesture) containing a specific semantic meaning, when generating sample data, one or more standard gestures predefined by a user may be acquired, and then a large amount of sample data is generated based on the standard gesture, so as to improve the prediction accuracy of a trained deep learning model on the standard gesture. The hand posture information (i.e. the angle value of each hand joint rotation angle) of each standard gesture can be preset by the user. For example, for a standard gesture "OK", the user may preset an angle value of each hand joint rotation angle in the gesture as hand posture information of the standard gesture.

S202, generating a plurality of groups of sample data based on the hand posture information of the standard gesture and a preset hand kinematics model, wherein each group of sample data comprises hand key point position information and hand posture information corresponding to the hand key point position information; the hand kinematics model is used for describing the influence of hand postures and hand skeleton lengths on the positions of the hand key points.

The hand kinematics model in the embodiment of the present application may be obtained by performing kinematics modeling on a hand topological relation, as shown in fig. 2, the structure of the hand may be described by using 21 hand key points (skeleton points), where a skeleton length between every two skeleton points is denoted as Li, which represents the length of the ith skeleton, and the skeleton length may be obtained based on an empirical value. Hand pose information typically includes angle values for hand joint angles, including wrist joint angles and 15 finger joint angles. The wrist key corner can be represented by R, the wrist key corner can rotate randomly in a three-dimensional space, and the wrist joint corner can be represented by 6D to ensure the continuity of the rotation of the wrist in the space. And 6D represents that the rotation condition of the wrist joint in the space is represented by 6 parameters, wherein the modulus of the first three parameters in the 6 parameters is 1, the modulus of the last three parameters is 1, and the vector formed by the first three parameters is orthogonal to the vector formed by the last three parameters.

Finger joints can be represented by the euler angle Theta because of significant motion constraints (e.g., the fingers cannot bend outward), wherein the rotation of each finger joint in space can be represented by three angles, for example, three axial rotations around X, Y, Z. And upper and lower bounds li and ui of the euler angle may be set, and li and ui represent constraints of the ith finger joint angle.

Obviously, the positions of the key points of the hand are determined by the size of each hand joint corner and the length of the hand skeleton, the current hand joint corner and the length of the hand skeleton are determined, and the positions of the hand joint points are also determined. Thus, the hand kinematics model can describe the effect of hand pose, hand skeleton length, and the location of hand key points.

The hand kinematics model can be represented by the following equation (1):

p ═ Φ ({ Li }, R, Theta) formula (1)

Wherein, P represents the position information of the key points of the hand, can be three-dimensional position information, can be two-dimensional position information, phi is a function for describing the influence of the skeleton length, the wrist joint corner and the finger joint corner on the position information of the key points of the hand; { Li } denotes the set of all joint bone lengths, R denotes the wrist joint angle, Theta denotes the finger joint angle.

Based on the hand kinematics model, after the gesture posture information and the hand skeleton length are determined, the hand kinematics model can be used for determining the position information of the corresponding hand key points, so that multiple groups of sample data can be generated.

Therefore, after the hand posture information of the standard gesture is determined, multiple sets of sample data can be generated by using the hand posture information of the standard gesture as a reference and using a preset hand kinematics model, wherein each set of sample data includes the position information of the hand key point and the hand posture information corresponding to the position information of the hand key point. And then, the position information of the hand key points in the sample data can be used as a sample input of the deep learning model, and the hand posture information can be used as a label of the sample input, so as to train the deep learning model.

The position information of the hand key points may be two-dimensional position information or three-dimensional position information. For example, in some embodiments, the hand keypoint location information may be pixel coordinates of the hand keypoints in the image, and the deep learning model may predict hand pose information based on the pixel coordinates of the hand keypoints. In some embodiments, the hand keypoint location information may also be three-dimensional coordinates in three-dimensional space, and the deep learning model may predict hand pose information based on the three-dimensional coordinates of the hand keypoints.

According to the embodiment of the application, the hand kinematics model is constructed in advance, a large amount of sample data are automatically generated based on the hand posture information of the standard gesture defined by the user and the hand kinematics model, the deep learning model is trained, the hand key points do not need to be manually marked by the user, the acquisition difficulty of the sample data is reduced, the sample data can be quickly and conveniently obtained, the constructed sample data can be more diversified, the accuracy is higher, and the prediction precision of the trained deep learning model can be improved.

In order to facilitate the user to set the hand posture information of the standard gesture, a user interaction interface can be provided for the user, and the user can input the hand posture information of the standard gesture through the interaction interface. Because the standard gesture is generally only related to the finger joint rotation angles, the wrist joint rotation angles do not affect the gesture, for example, the gesture is also an 'OK gesture', and even if the wrist joints are rotated, the gesture is still an 'OK gesture', so that when a user sets hand posture information of the standard gesture, the user only needs to set the angle values of 15 finger joint rotation angles. In some embodiments, in order to facilitate the user to determine whether the currently input finger joint angle is appropriate, a gesture model corresponding to an instruction input by the user may be displayed in the interactive interface, so that the user determines whether the currently input finger joint angle is appropriate based on the gesture model.

When the standard gesture is set, the user may directly input a specific angle value of each finger joint rotation angle of the standard gesture, for example, a recommended value of each finger joint rotation angle of a certain standard gesture may be loaded on the interactive interface, and the user may adjust the angle value of each joint rotation angle of the standard gesture based on the recommended value to set hand posture information of the standard gesture. Of course, the user may also directly input an angle adjustment instruction for adjusting the angle value of the finger joint rotation angle through the interactive interface, for example, an angle adjustment control is set for each joint rotation angle on the interactive interface, when the user clicks the control, the angle value of the joint rotation angle may be increased or decreased by a certain step length, and after receiving the angle adjustment instruction input by the user, the gesture model corresponding to the angle adjustment instruction may be displayed on the interactive interface. After the user determines that the current gesture has been adjusted to a desired standard gesture (for example, an OK gesture) based on the displayed gesture model, an instruction to end the angle adjustment may be input, and at this time, gesture posture information corresponding to the gesture model displayed on the current interactive interface may be used as hand posture information of the standard gesture and stored.

As shown in fig. 4(a), for a schematic view of a user interaction interface for determining a standard gesture in an embodiment of the present application, when setting a finger joint rotation angle of the standard gesture, a user may select a finger to be operated, for example, a thumb or an index finger, and then select a joint to be operated, for example, one finger includes different joints, and may select a joint to be operated currently. The corresponding rotational axis (X, Y, Z axis) may then be selected and the rotational angle relative to each rotational axis adjusted. During the process of adjusting the turning angle, the interactive interface displays the adjusted gesture model so that the user can determine whether the adjustment of the standard gesture is completed based on the model. After the adjustment is completed, the user can store the adjusted gesture as a standard gesture and store the standard gesture in a gesture library. Meanwhile, standard gestures can be loaded from a gesture library for viewing.

In some embodiments, when multiple sets of sample data are generated based on the hand posture information of the standard gesture and the preset hand kinematics model, the hand posture information of the standard gesture may be changed to obtain multiple sets of new hand posture information, and then, for each set of new hand posture information, the hand key point position information corresponding to the set of new hand posture information may be determined based on the hand kinematics model, so as to obtain multiple sets of sample data.

Based on the hand kinematics model, after parameters such as finger joint corners, wrist joint corners, hand skeleton lengths and the like are changed, corresponding hand key point position information is also changed, so that multiple groups of new hand posture information can be obtained by changing the parameters in the hand posture information of the standard gesture, and multiple groups of sample data can be obtained by determining the hand key point position information corresponding to the new hand posture information based on the hand kinematics model. By the method, a large amount of sample data can be generated in the sample space, and the diversity and comprehensiveness of the sample data are ensured.

For the same standard gesture, the gesture can correspond to a plurality of groups of finger joint corners, for example, taking the 'OK gesture' as an example, even if the finger joint corners are changed, the gesture can be recognized as the 'OK gesture' as long as the finger shape keeps the 'OK' shape. Therefore, for each standard gesture, the angle value of each finger joint corner can be continuously changed within a small amplitude range, so that the similarity between the changed gesture and the standard gesture is still higher, namely the changed gesture can be still recognized as the standard gesture, sample data corresponding to each standard gesture can be enriched, the sample data corresponding to each standard gesture is more diversified, and the prediction precision of the standard gesture is improved.

Therefore, in some embodiments, when changing the hand posture information of the standard gesture to obtain a plurality of sets of new hand posture information, a plurality of first angle variations may be randomly generated within a first angle range for each finger joint rotation angle of the standard gesture, and then the angle value of each finger joint rotation angle may be changed based on the plurality of first angle variations to obtain a plurality of sets of new hand posture information, where a similarity between the plurality of sets of new hand posture information obtained based on the first angle variations and the posture information of the standard gesture is greater than a first preset threshold. The first angle range can be a preset angle range, the angle range can ensure that the changed finger joint angle is within the constraint range of the finger joint, and the shape of the gesture formed after the finger joint angle is changed cannot be changed greatly, namely the similarity with the predefined standard gesture is high.

Besides the standard gestures, some nonstandard gestures without specific semantics exist, so that when sample data is generated, some gestures without specific semantics can be automatically and randomly generated in the constraint range of the joint corner to fill up the sample space among the standard gestures, and the condition that a trained deep learning model can only be converged near the standard gestures is avoided. Therefore, in some embodiments, when generating a gesture without specific semantics, a certain specified gesture may be preset, and the specified gesture may be a standard gesture or a gesture with a joint rotation angle of 0 for each finger, as shown in fig. 4 (b). A plurality of second angle variation amounts may be randomly generated within a second angle range for each finger joint rotation angle of the designated gesture, and then an angle value of the finger joint rotation angle is changed based on the plurality of second angle variation amounts to obtain a plurality of sets of new hand posture information. The similarity between the plurality of groups of new hand posture information obtained based on the second angle variation and the posture information of the standard gesture is smaller than a second preset threshold, and the second preset threshold is smaller than the first preset threshold. When the designated gesture is a standard gesture, the second angle range may be a preset angle range different from the first angle range, the angle value in the angle range is greater than the angle value in the first angle range, the angle range may ensure that the changed finger joint angle is within the constraint range of the finger joint, and the shape of the gesture formed after the finger joint angle is changed is greatly changed, that is, the gesture has a lower similarity to the predefined standard gesture.

Certainly, for different people, the lengths of the hand bones of the people also have certain differences, and in order to ensure the diversity and comprehensiveness of the samples, in some embodiments, when the hand posture information of the standard gesture is changed to obtain a plurality of groups of new hand posture information, a plurality of scaling factors can be randomly generated within a preset range, and then the hand bone lengths of the standard gesture are scaled by the scaling factors to obtain a plurality of groups of new hand posture information.

Unlike finger joints, the rotation of the wrist joint in space is unconstrained, and therefore it is necessary to ensure that the distribution of the wrist joint corners in the rotation space is dense and uniform to ensure that all sample space is covered as much as possible. Therefore, in some embodiments, when the hand posture information of the standard gesture is changed to obtain a plurality of new sets of hand posture information, the wrist joint rotation angles of the standard gesture can be uniformly sampled in the rotation space to obtain a plurality of sets of wrist joint rotation angles, so as to generate a plurality of sets of new hand posture information.

Since the 6D representation has the modulus of the first three parameters of the 6 parameters as 1, the modulus of the last three parameters as 1, and the vector formed by the first three parameters is orthogonal to the vector formed by the last three parameters. In some embodiments, when the wrist joint corners of the standard gesture are uniformly sampled in the rotation space to obtain a plurality of groups of wrist joint corners, the wrist joint corners of the standard gesture may be uniformly sampled on a spherical surface with a radius of 1 to obtain a plurality of first sampling points, the spatial coordinates of each first sampling point are used as the first three parameters in the 6D representation of the wrist joint corners, then, for each first sampling point, a connection line between the first sampling point and the spherical center of the spherical surface is determined, a circumference perpendicular to the connection line and with a radius of 1 is determined, uniform sampling is performed on the circumference, for example, one second sampling point is collected at a predetermined angle to obtain a plurality of second sampling points, and the spatial coordinates of each second sampling point are used as the last three parameters in the 6D representation of the wrist joint corners, so as to obtain the 6D representation of the plurality of groups of wrist joint corners. In some embodiments, the spatial coordinates of the first sampling point may be coordinates with the center of sphere of the spherical surface as the origin, and the spatial coordinates of the second sampling point may be coordinates with the center of circle of the circumference as the origin.

The wrist joint angle can be expressed in 6D, for example, as [ x, y, z, a, b, c ], as can be seen from the definition of 6D, x, y, z and a, b, c are taken from the first two columns of the rotation matrix corresponding to 6D, respectively, so that the vector [ x, y, z ] and the vector [ a, b, c ] are orthogonal to each other and have a modular length of 1. It can be seen that, to ensure uniform distribution of R, the points [ X, y, z ] are distributed on the spherical surface with the radius of 1, the points are marked as X, and it is first necessary to consider uniform distribution of the points X on the spherical surface, and then introduce the fionacci grid to ensure uniform and dense requirements of the points X, as shown in fig. 5(a), the spatial coordinates of the points X are calculated by formula (2):

where P represents the total number of sample points, constant

Is the golden ratio, [ xp, yp, zp]The spatial coordinates of the p-th sampling point Xp are shown as points other than the circle a in fig. 5(b), and the vector xyz is [ Xp, yp, zp ═]And represents a straight line as in fig. 5 (b); notation vector abc ═ a, b, c]When xyz is orthogonal to abc, the point [ a, b, c ]]On a circle perpendicular to the vector xyz and having a radius of 1, such as a point on the circle a in fig. 5(b), if the number of sampling points is Q, the point coordinates [ aq, bq, cq ] can be obtained by directly and uniformly sampling according to the angle]Therefore, the final 6D is represented as [ xp, yp, zp, aq, bq, cq]。

In the above embodiment, when the hand posture information of the standard gesture is changed to generate a plurality of sets of new hand posture information, different changing manners may be adopted, for example, a plurality of changing manners such as a finger joint rotation angle, a wrist joint rotation angle, a hand skeleton length, and the like may be changed, and for different changing manners, a user may preset the sampling number corresponding to each changing manner according to a requirement. For example, taking the change of the joint angle of the finger as an example, a plurality of hand postures similar to the standard gesture may be generated by generating a plurality of first angle change amounts, or a plurality of hand postures of the non-standard gesture having a large difference from the standard gesture may be generated by generating a plurality of second angle change amounts. For each variation, the corresponding sampling data can be preset. For example, the amounts of the first angle change amount and the second angle change amount may be preset by a user. Also, for different standard gestures, the number of sample data generated based on the standard gesture may also be inconsistent.

Therefore, in some embodiments, before changing the hand posture information of the standard gesture to obtain a plurality of sets of new hand posture information, the configuration information input by the user through the interactive interface may be obtained, where the configuration information is used to indicate the sampling number corresponding to each change mode when the hand posture information of the standard gesture is changed in different change modes, and then the hand posture information of the standard gesture is changed based on the sampling number corresponding to each change mode to obtain a plurality of sets of new hand posture information.

After the deep learning model is trained by using the generated sample data, if the prediction accuracy of the trained deep learning model for a certain specific gesture is low, more samples can be generated for the specific gesture, for example, the finger joint corner, the wrist joint corner or the skeleton length of the standard gesture is changed in a finer granularity manner, and more sample data for the specific gesture are obtained and used for training the deep learning model, so that the prediction accuracy of the trained deep learning model for the standard gesture is improved. Therefore, in some embodiments, when the prediction accuracy of the trained deep learning model for a target gesture is lower than the preset accuracy, the hand posture information of the target gesture is changed to obtain multiple sets of new hand posture information, and the key point position information corresponding to the new hand posture information is determined based on the obtained new hand posture information and the hand kinematics model to generate multiple sets of sample data for training the deep learning model. By the method, the structure of the sample data can be adjusted in a targeted manner, so that the trained model has higher prediction precision on various gestures. The target gesture may be a standard gesture or a non-standard gesture.

The input of the deep learning model can be three-dimensional position information (such as three-dimensional coordinates in a three-dimensional space) of the hand key point, and can also be two-dimensional position information (such as pixel coordinates in an image) of the hand key point in the image. In some embodiments, if the input to the deep learning model is the pixel coordinates of the hand keypoints in the image, i.e., the hand keypoint location information in the sample data may be the pixel coordinates of the hand keypoints in the image. If the position information of the hand key points determined based on the kinematics model is the three-dimensional position information of the hand key points in the three-dimensional space, when multiple groups of sample data are generated based on the hand posture information of the standard gesture and the preset hand kinematics model, the three-dimensional position information of the multiple groups of hand key points in the three-dimensional space and the hand posture information corresponding to the three-dimensional position information of the hand key points can be generated based on the hand posture information of the standard gesture and the preset hand kinematics model, and then the three-dimensional position information of the hand key points is subjected to orthogonal projection to obtain the pixel coordinates of the hand key points in the image.

In order to further explain the sample data generating method provided in the embodiments of the present application, the following is explained with reference to a specific embodiment.

When the deep learning model for predicting the hand gesture based on the position information of the hand key point is trained, the hand key point needs to be marked in the image when sample data is constructed, the hand gesture in the image needs to be marked, the sample expansion difficulty is high, the diversity of the sample cannot be guaranteed, and the prediction accuracy of the trained deep learning model is low.

In order to obtain a large amount of high-precision sample data quickly, the embodiment can generate a large amount of sample data for training the deep learning model based on the hand kinematics model and the standard gestures predefined by the user, so that the diversity and comprehensiveness of the sample data can be ensured, and the prediction precision of the trained model can be improved.

Referring to fig. 6, the process of constructing sample data specifically includes the following steps:

1. constructing a hand kinematics model:

performing kinematic modeling on the topological relation of the hand, as shown in fig. 2, the structure of the hand can be described by using 21 skeleton points (key points), wherein the skeleton length between every two skeleton points is marked as Li, and represents the length of the ith skeleton, the skeleton length is a statistic in relevant documents, the hand joint angles include a wrist joint angle R (global rotation) and 15 finger joint angles Theta (local rotation), wherein the wrist joint angle R adopts a 6D representation method, that is, 6 parameters are adopted to describe spatial rotation, the continuity of rotation of the wrist in space is ensured, and finger joints are represented by euler angles because of obvious motion constraints (such as that fingers cannot bend outwards), and upper and lower bounds Li and ui of the euler angles are set, and Li and ui represent the constraints of the ith finger joint angle. Keeping the function describing positive kinematics of the hand as FK, we can obtain the coordinates of the 3D key points of the hand as P3D ═ FK ({ Li }, R, Theta), { Li } represents the set of all joint bone lengths.

After the kinematic model of the hand is established, it can be seen that three types of parameters { Li }, R and Theta need to be changed to ensure the diversity of 3D key points.

(2) The user defines N types of standard gestures, and the number of samples for each variation.

Considering that gesture posture estimation generally needs to identify gestures containing specific semantics, some standard gestures can be predefined by a user, and then sample data is generated based on the standard gestures, so that the prediction accuracy of the trained model on the standard gestures can be improved.

Standard gestures may be generated by the user manually adjusting the Theta parameter (the R parameter has not changed temporarily). When a user defines a standard gesture, an interactive interface can be provided, the user can click a control on the interactive interface to increase or decrease the rotation angle Theta of each finger joint, and meanwhile, the interactive interface can display a corresponding gesture model after the user adjusts the rotation angle of the finger joint, so that the user can conveniently observe whether the standard gesture is adjusted to the desired standard gesture. All standard gestures containing semantics can be predefined in an exhaustive mode to ensure the diversity of samples, and the adjustment of each standard gesture takes about <1min in an interactive mode of manual adjustment.

Because the user can change three types of parameters of { Li }, R and Theta to generate sample data, the user can define the sampling number corresponding to each change mode according to the requirement, for example, the user can set the sampling number corresponding to each change mode through an interactive interface to ensure that the sample meets the actual requirement.

(3) Changing the corner of finger joint in standard gesture to generate new hand gesture

After standard gestures are predefined, some motions can be superimposed on the finger corners corresponding to the standard gestures, so that the diversity of the gestures is further improved. Because the finger joint rotation angle has a certain constraint range, the change of the finger joint rotation angle needs to be within a certain constraint range.

Because the finger joint rotation angles of one standard gesture can be in many groups, for example, after the finger joint rotation angles are slightly changed, the meaning of the gesture represented by the finger joint rotation angles is not changed, the finger joint rotation angles can be changed within a small angle range, and samples of each standard gesture are more diversified. Specifically, a variation delta is introduced into the joint angle of each finger, the variation delta is uniformly distributed between-15 degrees and is randomly generated, wherein the delta is an angle system, the superposed finger joint angle is Theta + delta, and the superposed joint angle still needs to meet the constraints of upper and lower bounds Li and ui, so that a 3D key point P3D is FK ({ Li }, R, Theta + delta). Assuming that there are N classes of standard gestures, each of which produces M delta changes, the number of samples currently obtained is N x M, typically N is on the order of 10 and M is on the order of 100.

Besides the standard gestures, some gestures without specific semantics need to be automatically and randomly generated in a constraint range for filling sample spaces among the standard gestures, so that the situation that a trained model can only be converged near the standard gestures is avoided, the number of the samples without the semantics gestures is more, the number is O, and the order of magnitude is about 10000, and therefore N x M + O samples are obtained at the stage.

In addition, if the prediction accuracy of the trained model for some standard gestures is low, the sample data of the standard gestures can be increased appropriately.

(4) Changing wrist joint rotation angle to generate hand gesture

And uniformly sampling the rotation angle R of the wrist joint in a rotation space. Unlike finger joints, the rotation of the wrist is unconstrained in the rotation space, and therefore it is necessary to ensure that the distribution of R in space is dense and uniform to ensure that all sample space is covered as much as possible. The wrist joint rotation angle can be represented by 6D, and can be represented as [ x, y, z, a, b, c ], and as can be seen from the definition of 6D, x, y, z and a, b, c are respectively taken from the first two columns of the rotation matrix corresponding to 6D representation, so that the vector [ x, y, z ] and the vector [ a, b, c ] are orthogonal to each other, and the module length is 1. It can be seen that, to ensure uniform distribution of R, the points X need to be considered first, and a fionacci grid is introduced here, so that the uniform and dense requirements of the points X can be ensured, the sampling effect is as shown in fig. 5(a), and the spatial coordinates (with the center of sphere as the origin) of the points X are calculated as follows by the formula (2):

where P represents the total number of sample points, constant

Is the golden ratio, [ xp, yp, zp]The spatial coordinates of the p-th sampling point Xp are shown as points other than the circle a in fig. 5(b), and the vector xyz is [ Xp, yp, zp ═]And represents a straight line as in fig. 5 (b); notation vector abc ═ a, b, c]When xyz is orthogonal to abc, the point [ a, b, c ]]On a circle perpendicular to the vector xyz and having a radius of 1, if the number of sampling points is set to Q as a point on the circle a in fig. 5(b), the point coordinates [ aq, bq, cq ] can be obtained by directly and uniformly sampling according to the angle](with the center of the circle as the origin), so the final 6D is represented as [ xp, yp, zp, aq, bq, cq]. There is a 3D key point P3D ═ FK ({ Li }, [ xp, yp, zp, aq, bq, cq]Theta + delta), so the total number of samples collected is P x Q (N x M + O), and the number fractions of P, Q are 100, 10, respectively.

(5) Changing bone length to generate hand gestures

The bone length Li is randomly modified within a certain range. Considering that the bone length of hand joints of different people is different, scaling factors are required to be introduced to scale the bone length of each joint, the scaling factor is set as alpha i, the scaling factor is represented as the scaling factor of the ith bone, and the scaling factors are uniformly distributed and randomly generated in the range of 0.75-1.25, so that a 3D key point P3D is FK ({ Li x alpha i }, [ xp, yp, zp, aq, bq, cq ], Theta + delta).

(6) After a large number of new hand gestures are generated, generating hand key point position information corresponding to the hand gestures based on the kinematic model. Therefore, a large-scale data set uniformly covering a sample space can be obtained, in the data set, 3D key point dimensional information of a hand is used as an input signal of a sample, the length of a skeleton { Li x α i }, the wrist rotation R and the finger joint rotation angle theta are used as the mark of the sample.

In addition, if the input of the model is the 2D position information of the hand keypoints, the 2D keypoints can be obtained through setting orthogonal projection parameters s and t through projection transformation, and can be stored as the input signals of the sample, wherein s represents scaling and t represents translation, and only the 2D keypoints after projection need to be located in the image.

Corresponding to the above method, an embodiment of the present application further provides a sample data generating device, as shown in fig. 7, where the device includes:

the sample data is used to train a deep learning model for predicting hand pose information based on hand keypoint location information, the device comprising:

an obtaining module 71, configured to obtain hand posture information of a standard gesture, where the hand posture information of the standard gesture is preset by a user;

a sample data generating module 72, configured to generate multiple sets of sample data based on the hand posture information of the standard gesture and a preset hand kinematics model, where each set of sample data includes hand key point position information and hand posture information corresponding to the hand key point position information;

In some embodiments, the hand pose information for the standard gesture is determined based on:

presenting a gesture model corresponding to an angle adjusting instruction on an interactive interface in response to the angle adjusting instruction input by a user through the interactive interface, wherein the angle adjusting instruction is used for adjusting the angle value of each finger joint corner of a standard gesture;

and responding to an angle adjustment finishing instruction input by a user, and taking the hand posture information of the gesture model presented by the interactive interface at the current moment as the hand posture information of the standard gesture.

In some embodiments, the sample data generating module is configured to, when generating multiple sets of sample data based on the hand posture information of the standard gesture and a preset hand kinematics model, specifically:

changing the hand posture information of the standard gesture to obtain a plurality of groups of new hand posture information;

and determining the position information of the key points of the hand corresponding to the new hand posture information based on the hand kinematic model aiming at each group of new hand posture information to obtain the multiple groups of sample data.

In some implementations, the sample data generation module is configured to change the hand posture information of the standard gesture, and when obtaining multiple sets of new hand posture information, specifically configured to:

randomly generating a plurality of first angle variable quantities in a first angle range aiming at each finger joint corner of the standard gesture;

and changing the angle value of the finger joint corner based on the plurality of first angle variation quantities to obtain a plurality of groups of new hand posture information, wherein the similarity between the obtained plurality of groups of new hand posture information and the posture information of the standard gesture is greater than a first preset threshold value.

In some embodiments, the sample data generating module is configured to change the hand posture information of the standard gesture, and when obtaining multiple sets of new hand posture information, specifically configured to:

randomly generating a plurality of scaling factors in a preset range;

and scaling the hand skeleton length of the standard gesture by using the scaling factors to obtain a plurality of groups of new hand posture information.

and uniformly sampling the wrist joint corners of the standard gestures in a rotation space to obtain multiple groups of wrist joint corners so as to generate multiple groups of new hand posture information.

In some embodiments, the sample data generating module is configured to, when uniformly sampling the wrist joint corners of the standard gesture in the rotation space to obtain multiple groups of wrist joint corners, specifically:

uniformly sampling on a spherical surface with the radius of 1 to obtain a plurality of first sampling points, and taking the space coordinate of each first sampling point as the first three parameters in 6D representation of the wrist joint corner;

determining a connecting line between the first sampling point and the spherical center of the spherical surface aiming at each first sampling point;

and uniformly sampling on a circle which is vertical to the connecting line and has the radius of 1 to obtain a plurality of second sampling points, and taking the space coordinate of each second sampling point as the last three parameters expressed by the wrist joint rotation angle 6D to obtain the 6D expression of a plurality of groups of wrist joint rotation angles.

In some embodiments, the apparatus is further configured to:

randomly generating a plurality of second angle variation quantities within a second angle range aiming at each finger joint corner of a designated gesture, wherein the designated gesture comprises the standard gesture and/or a gesture with each finger key corner being 0 degrees;

changing the angle value of the finger joint rotation angle of the designated gesture based on the plurality of second angle variation quantities to obtain a plurality of groups of new hand posture information, wherein the similarity between the obtained plurality of groups of new hand posture information and the posture information of the standard gesture is smaller than a second preset threshold value;

and generating a plurality of groups of sample data based on the obtained plurality of groups of new hand posture information and the hand kinematics model.

In some embodiments, the apparatus is further configured to:

acquiring configuration information input by a user through an interactive interface, wherein the configuration information is used for indicating the sampling quantity corresponding to each change mode when the hand posture information of the standard gesture is changed in different change modes;

and changing the hand posture information of the standard gesture based on the sampling number corresponding to each change mode to obtain the multiple groups of new hand posture information.

In some embodiments, after training the deep learning model with the generated sample data, the apparatus is further configured to:

under the condition that the prediction precision of the trained deep learning model for the target gesture is lower than the preset precision, changing gesture posture information of the target gesture to obtain multiple groups of new gesture posture information, and generating multiple groups of sample data based on the obtained new gesture posture information and the hand kinematics model for training the deep learning model.

In some embodiments, the hand key point position information in the multiple sets of sample data includes pixel coordinates of a hand key point in an image, and the sample data generation module is configured to, when generating multiple sets of sample data based on the hand pose information of the standard gesture and a preset hand kinematics model, specifically:

generating three-dimensional position information of a plurality of groups of hand key points in a three-dimensional space and hand posture information corresponding to the three-dimensional position information of the hand key points based on the hand posture information of the standard gesture and a preset hand kinematics model;

and carrying out orthogonal projection on the three-dimensional position information of the hand key points to obtain pixel coordinates of the hand key points in the image.

Further, an electronic device is provided in an embodiment of the present application, as shown in fig. 8, the electronic device includes a processor 81, a memory 82, and a computer program stored in the memory 82 and executable by the processor 81, where the processor 81 implements the method according to any one of the foregoing embodiments when executing the computer program.

Accordingly, an embodiment of the present application further provides a computer storage medium, where a program is stored in the computer storage medium, and when the program is executed by a processor, the method in any of the above embodiments is implemented.

Embodiments of the present description may take the form of a computer program product embodied on one or more storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having program code embodied therein. Computer-usable storage media include permanent and non-permanent, removable and non-removable media, and information storage may be implemented by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of the storage medium of the computer include, but are not limited to: phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technologies, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic tape storage or other magnetic storage devices, or any other non-transmission medium, may be used to store information that may be accessed by a computing device.

For the device embodiments, since they substantially correspond to the method embodiments, reference may be made to the partial description of the method embodiments for relevant points. The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This disclosure is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

The above description is only exemplary of the present disclosure and should not be taken as limiting the disclosure, as any modification, equivalent replacement, or improvement made within the spirit and principle of the present disclosure should be included in the scope of the present disclosure.

Claims

1. A method for generating sample data for training a deep learning model for predicting hand pose information based on hand keypoint location information, the method comprising:

2. The method of claim 1, wherein the hand pose information for the standard gesture is determined based on:

3. The method of claim 1, wherein generating multiple sets of sample data based on hand pose information of the standard gesture and a preset hand kinematics model comprises:

4. The method of claim 3, wherein changing the hand pose information of the standard gesture results in new sets of hand pose information, comprising:

5. The method of claim 3 or 4, wherein changing the hand pose information of the standard gesture results in new sets of hand pose information, comprising:

randomly generating a plurality of scaling factors in a preset range;

6. The method of claim 3 or 4, wherein changing the hand pose information of the standard gesture results in new sets of hand pose information, comprising:

7. The method of claim 6, wherein uniformly sampling wrist joint angles of the standard gesture in a rotation space to obtain a plurality of groups of wrist joint angles comprises:

8. The method of claim 1, further comprising:

9. The method of claim 1, further comprising:

10. The method of claim 1, wherein after training the deep learning model with the generated sample data, the method further comprises:

11. The method of claim 1, wherein the hand key point position information in the multiple sets of sample data comprises pixel coordinates of a hand key point in an image, and the generating of the multiple sets of sample data based on the hand pose information of the standard gesture and a preset hand kinematics model comprises:

12. An apparatus for generating sample data for training a deep learning model for predicting hand pose information based on hand keypoint location information, the apparatus comprising:

13. An electronic device, comprising a processor, a memory, and a computer program stored in the memory for execution by the processor, wherein the processor, when executing the computer program, implements the method of any of claims 1-11.