CN116301303A

CN116301303A - Hand gesture acquisition method and device, electronic equipment and readable storage medium

Info

Publication number: CN116301303A
Application number: CN202111572301.8A
Authority: CN
Inventors: 余海桃; 孙飞; 吴涛
Original assignee: Beijing Zitiao Network Technology Co Ltd
Current assignee: Beijing Zitiao Network Technology Co Ltd
Priority date: 2021-12-21
Filing date: 2021-12-21
Publication date: 2023-06-23

Abstract

The present disclosure relates to a method, an apparatus, an electronic device, and a readable storage medium for acquiring a hand gesture, where the method includes acquiring a first position of a hand joint point in a plurality of reference images and a second position of the hand joint point in a plurality of images to be processed, respectively; when the hand is in a specific three-dimensional posture, the plurality of reference images comprise images obtained by mapping the hand joint points respectively based on a plurality of view angles; the plurality of images to be processed include images obtained at the plurality of perspectives for the hand; performing least square optimization according to the first position and the second position corresponding to the hand joint points respectively, and adjusting the second positions of the hand joint points in the plurality of images to be processed; and obtaining the target three-dimensional gesture of the hand based on the adjusted second positions of the hand joint points in the plurality of images to be processed. According to the method, the pose information is optimized through multi-view observation, the state of multiple views can be matched to the greatest extent, and the accuracy of the obtained three-dimensional pose of the hand can be improved.

Description

Hand gesture acquisition method and device, electronic equipment and readable storage medium

Technical Field

The disclosure relates to the technical field of image processing, and in particular relates to a hand gesture acquisition method, a hand gesture acquisition device, electronic equipment and a readable storage medium.

Background

The three-dimensional gesture of the hand is obtained by accurately identifying the positions of all the joints of the hand of the human body from the image. Hand gesture recognition is commonly used in applications such as man-machine interaction, augmented reality or virtual reality, and is one of the currently mainstream interaction modes.

Currently, the three-dimensional gesture of the hand is obtained by analyzing the positions of the nodes in the image, obtaining the positions of the nodes in a specific three-dimensional coordinate system, and obtaining the three-dimensional gesture of the hand based on the positions of the nodes in the specific three-dimensional coordinate system. The accuracy of hand gestures is extremely important, and the interaction experience is seriously affected. However, the three-dimensional gesture of the hand is obtained by adopting the method, which may cause inaccurate positions of the obtained articulation points in the image due to the reasons that some articulation points are blocked, the image is unclear and the like, thereby affecting the accuracy of the obtained three-dimensional gesture of the hand. Therefore, how to acquire the three-dimensional gesture of the hand with higher accuracy is a current urgent problem to be solved.

Disclosure of Invention

In order to solve the technical problems, the present disclosure provides a method, an apparatus, an electronic device, and a readable storage medium for acquiring hand gestures.

In a first aspect, the present disclosure provides a hand gesture acquisition method, including:

acquiring first positions of hand joint points in a plurality of reference images respectively; when the hand is in a specific three-dimensional posture, the plurality of reference images comprise images obtained by mapping the hand joint points respectively based on a plurality of view angles;

acquiring second positions of the hand node in a plurality of images to be processed respectively; the images to be processed comprise images obtained at the multiple view angles for the hand, and reference images at the same view angle correspond to the images to be processed one by one;

performing least square optimization LM according to the first positions of the hand joint points in the plurality of reference images and the second positions of the hand joint points in the plurality of images to be processed respectively, and acquiring adjusted second positions corresponding to the hand joint points;

and mapping the adjusted second position corresponding to the hand joint point to a pre-established three-dimensional coordinate system, and obtaining the target three-dimensional gesture of the hand.

As a possible implementation manner, the obtaining, according to the LM of the first position of the hand joint point in the multiple reference images and the LM of the second position of the hand joint point in the multiple images to be processed, the optimized position corresponding to the hand joint point includes:

acquiring residual error items according to first positions of the hand node in the plurality of reference images and second positions of the hand node in the plurality of images to be processed;

and when the residual error item is determined to be the minimum value, the obtained position of the hand joint point is the adjusted second position corresponding to the hand joint point.

As one possible implementation, the hand joint includes a plurality of joints; the acquiring the first positions of the hand joint points in the plurality of reference images respectively includes:

when the specific three-dimensional gesture is acquired, the position of each joint point in a first coordinate system; the first coordinate system is a three-dimensional coordinate system established according to the specific three-dimensional posture;

for each reference image, acquiring the position of the joint point in a second coordinate system according to the position of the joint point in the first coordinate system and the conversion relation between the first coordinate system and the second coordinate system; the second coordinate system is an image coordinate system of the reference image corresponding to the view angle.

As a possible implementation manner, when the specific three-dimensional pose is obtained, the position of each node in the first coordinate system includes:

when the specific three-dimensional gesture is acquired, the position of each joint point in a first coordinate system; the first coordinate system is a world coordinate system established according to the plurality of view angles;

for each reference image, acquiring the position of the joint point in a second coordinate system according to the position of the joint point in the first coordinate system and the conversion relation between the first coordinate system and the second coordinate system; the second coordinate system is a camera coordinate system of the view angle corresponding to the reference image;

and for each view angle, acquiring a first position of the joint point in a reference image corresponding to the view angle according to the position of the joint point in a second coordinate system corresponding to the view angle and corresponding camera parameters.

obtaining model data corresponding to the plurality of joint points in the hand model of the specific three-dimensional gesture;

For each node, acquiring the position of the node in the first coordinate system according to the model data corresponding to the node and the model data corresponding to the associated node; the associated joint point is a father node corresponding to the joint point.

As a possible implementation manner, the obtaining, according to the model data corresponding to the joint point and the model data corresponding to the associated joint point, the position of the joint point in the first coordinate system includes:

acquiring the position of the joint point in a third coordinate system according to the displacement data of the joint point relative to the associated joint point, the first rotation matrix and the position of the associated joint point in the third coordinate system, wherein the third coordinate system is a three-dimensional coordinate system established according to the hand model;

acquiring the positions of the joint points in the first coordinate system according to the conversion relation between the third coordinate system and the first coordinate system;

the model data of the joint points comprise rotation data of the joint points and displacement data of the joint points relative to the associated joint points; the first rotation matrix is a rotation matrix of the joint point relative to the associated joint point, and the first rotation matrix is obtained according to the rotation data of the joint point and the corresponding first rotation matrix of the associated joint point.

As a possible implementation manner, the obtaining the position of the joint point in the third coordinate system according to the displacement data of the joint point relative to the associated joint point, the first rotation matrix and the position of the associated joint point in the third coordinate system includes:

multiplying displacement data of the joint point relative to the associated joint point by the first rotation matrix, and adding the multiplication result to the position of the associated joint point in the third coordinate system to obtain the position of the joint point in the third coordinate system.

As a possible implementation manner, the obtaining the position of the joint point in the second coordinate system according to the position of the joint point in the first coordinate system and the conversion relation between the first coordinate system and the second coordinate system includes:

acquiring the position of the joint point in a camera coordinate system according to the product of the position of the joint point in the first coordinate system and the second rotation matrix; the camera coordinate system is a camera coordinate system of a view angle corresponding to the second coordinate system; the second rotation matrix is a transformation matrix between the first coordinate system and the second coordinate system.

As a possible implementation, the specific three-dimensional pose is predicted from the target three-dimensional poses of the previous hand or hands.

In a second aspect, the present disclosure provides a hand gesture acquisition device comprising:

the acquisition module is used for acquiring first positions of hand joint points in a plurality of reference images respectively; when the hand is in a specific three-dimensional posture, mapping the hand joint points based on a plurality of view angles to obtain images;

the acquisition module is further used for acquiring second positions of the hand node in the plurality of images to be processed respectively; the images to be processed comprise images obtained at the multiple view angles for the hand, and reference images at the same view angle correspond to the images to be processed one by one;

the position adjustment module is used for carrying out least square optimization LM according to the first positions of the hand joint points in the plurality of reference images and the second positions of the hand joint points in the plurality of images to be processed respectively, and obtaining adjusted second positions corresponding to the hand joint points;

the three-dimensional gesture recognition module is used for mapping the adjusted second position corresponding to the hand joint point to a pre-established three-dimensional coordinate system to obtain the target three-dimensional gesture of the hand.

In a third aspect, the present disclosure provides an electronic device comprising: a memory and a processor;

the memory is configured to store computer program instructions;

the processor being configured to execute the computer program instructions, causing the electronic device to implement the hand gesture acquisition method of any one of the first aspects,

in a fourth aspect, the present disclosure provides a readable storage medium comprising: computer program instructions; at least one processor of an electronic device executing the computer program instructions, causing the electronic device to implement the hand gesture acquisition method of any one of the first aspects,

in a fifth aspect, the present disclosure provides a computer program product which, when executed by a computer, causes the computer to implement the hand gesture acquisition method of any one of the first aspects.

The disclosure provides a method, a device, electronic equipment and a readable storage medium for acquiring hand gestures, wherein the method comprises the steps of acquiring a first position of a hand joint point in a plurality of reference images and a second position of the hand joint point in a plurality of images to be processed respectively; when the hand is in a specific three-dimensional posture, mapping the hand joint points based on a plurality of view angles to obtain images; the plurality of images to be processed comprise images obtained at the plurality of view angles for the hands, and the images to be processed correspond to the reference images one by one; performing least square optimization according to the first position and the second position corresponding to the hand joint points respectively, and adjusting the second positions of the hand joint points in the plurality of images to be processed; and obtaining the target three-dimensional gesture of the hand based on the adjusted second positions of the hand joint points in the plurality of images to be processed. The method provided by the disclosure can improve the accuracy of the obtained three-dimensional gesture of the hand.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the disclosure and together with the description, serve to explain the principles of the disclosure.

In order to more clearly illustrate the embodiments of the present disclosure or the solutions in the prior art, the drawings that are required for the description of the embodiments or the prior art will be briefly described below, and it will be obvious to those skilled in the art that other drawings can be obtained from these drawings without inventive effort.

Fig. 1 is an application scenario schematic diagram of a hand gesture obtaining method provided in an embodiment of the present disclosure;

FIG. 2 is a flow chart of a method for acquiring hand gestures according to an embodiment of the present disclosure;

FIG. 3 is a flow chart of a method for acquiring hand gestures according to another embodiment of the present disclosure;

FIG. 4 is a distribution diagram of various joints of a hand according to an embodiment of the present disclosure;

fig. 5 is a schematic structural diagram of a hand gesture acquiring device according to an embodiment of the present disclosure;

fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the disclosure.

Detailed Description

In order that the above objects, features and advantages of the present disclosure may be more clearly understood, a further description of aspects of the present disclosure will be provided below. It should be noted that, without conflict, the embodiments of the present disclosure and features in the embodiments may be combined with each other.

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present disclosure, but the present disclosure may be practiced otherwise than as described herein; it will be apparent that the embodiments in the specification are only some, but not all, embodiments of the disclosure.

Illustratively, the hand gesture acquisition method provided by the present disclosure may be performed by the hand gesture acquisition device provided by the present disclosure, where the hand gesture acquisition device may be implemented by any software and/or hardware manner. Illustratively, the hand gesture obtaining apparatus may be, but not limited to, an internet of things (the internet of things, IOT) device such as a tablet computer, a mobile phone (e.g., a folding screen mobile phone, a large screen mobile phone, etc.), a wearable device, a vehicle-mounted device, an Augmented Reality (AR)/Virtual Reality (VR) device, a notebook computer, an ultra-mobile personal computer (UMPC), a netbook, a personal digital assistant (personaldigital assistant, PDA), a smart television, a smart screen, a high definition television, a 4K television, a smart speaker, a smart projector, etc., and the present disclosure does not impose any limitation on a specific type of electronic device.

Fig. 1 is an application scenario schematic diagram of a hand gesture obtaining method according to an embodiment of the present disclosure. Referring to fig. 1, a scene 100 provided in this embodiment includes: camera 101 to camera 104, hand gesture acquisition means 105.

The cameras 101 to 104 may be used to collect images within the corresponding view angle ranges, and send the collected images to the hand gesture obtaining device 105.

The present disclosure is not limited to parameters of the resolution, model number, storage format of captured images in the cameras, and the like of the cameras 101 to 104. It will be appreciated that the better the parameters and performance of the camera, the higher the quality of the acquired image may be, providing more effective image information for three-dimensional pose acquisition of the hand.

The hand gesture acquiring device 105 is a device connected to the cameras 101 to 104, wherein the hand gesture acquiring device 105 is capable of receiving images sent by the cameras 101 to 104, respectively, and acquiring a three-dimensional gesture of a hand by executing the hand gesture acquiring method provided by the present disclosure.

With continued reference to fig. 1, assuming that the scene shown in fig. 1 is a VR scene, the cameras 101 to 104 may be respectively located at four vertices of a square area (i.e., the dashed area shown in fig. 1), and of course, the positions of the cameras may be laid out in other manners, which is not limited in this disclosure, and the illustration in fig. 1 is only an example. When a user is in a dashed area for VR experience, the cameras 101 to 104 may collect images for the user (including the user's hand) from 4 different perspectives, then, the cameras 101 to 104 may respectively send the collected images to the hand gesture acquiring device 105, the hand gesture acquiring device 105 may obtain a three-dimensional gesture of the hand with higher accuracy by executing the method provided by the present disclosure, and then, a control unit (not shown in fig. 1) in the VR scene may further generate a corresponding control instruction based on the three-dimensional gesture of the hand acquired by the hand gesture acquiring device 105, and respond to the control instruction, thereby implementing interaction.

It should be noted that, in the embodiment shown in fig. 1, the number of cameras is 4, and in practical application, the number of cameras may be more or less, for example, the number of cameras may be 3, 5, 6, etc., which is not limited in this disclosure.

The hand gesture acquisition method provided by the present disclosure is described in detail below by several embodiments. The following embodiments will be described by taking an example of a method for acquiring a hand gesture performed by an electronic device.

Fig. 2 is a flowchart of a hand gesture obtaining method according to an embodiment of the present disclosure. Referring to fig. 2, the method provided in this embodiment includes:

s201, acquiring first positions of hand joint points in a plurality of reference images respectively; the plurality of reference images comprise images obtained by mapping the hand joint points respectively based on a plurality of view angles when the hand is in a specific three-dimensional posture.

When the hand is in a specific three-dimensional posture, the plurality of reference images comprise images obtained by mapping the hand joint points respectively based on a plurality of view angles. Illustratively, in connection with the embodiment shown in FIG. 1, the plurality of reference images may include: when the hand is in a specific three-dimensional posture, 4 images obtained by mapping each joint point of the hand are respectively obtained based on the 4 view angles shown in fig. 1.

The particular three-dimensional pose may be predicted from the target three-dimensional pose of the previous hand or hands. It should be noted that, generally, the hand movements of the user have continuity, so in an actual application scenario, the cameras at each view angle can respectively acquire multiple continuous images for the hand movements of the user according to a preset mode, for example, the multiple continuous images are periodically acquired for the hand movements of the user, then, the target three-dimensional gesture of the 1 st hand is obtained by joint estimation for the 1 st image captured by the user according to the cameras at each view angle, and then the target three-dimensional gesture of the 1 st hand can be used as a specific three-dimensional gesture for acquiring the target three-dimensional gesture of the 2 nd hand; for another example, prediction can be performed according to the target three-dimensional gestures of the 1 st and 2 nd hands, so as to obtain a specific three-dimensional gesture, which is used for obtaining the target three-dimensional gesture of the 3 rd hand, and so on.

The present disclosure is not limited to an implementation manner in which an electronic device obtains a first position of a hand node in a plurality of reference images.

In one possible implementation manner, the electronic device may obtain a hand model with a specific three-dimensional gesture, and obtain model data of each node of the hand according to a required target scale (scale); assuming that a three-dimensional coordinate system for unifying all view angles is a first coordinate system, mapping all the joint points of the hand into the first coordinate system according to model data of all the joint points of the hand, and acquiring positions of all the joint points of the hand in the first coordinate system; assuming that the image coordinate system corresponding to the reference image is the second coordinate system, it can be understood that the image coordinate systems of the reference images corresponding to each view angle are different; for each reference image, the positions of the joint points of the hand in the first coordinate system can be mapped to the second coordinate system according to the conversion relation between the first coordinate system and the corresponding second coordinate system, so that the first positions of the joint points of the hand in the reference image are obtained.

It should be noted that, the conversion relationship between the first coordinate system and the corresponding second coordinate system is a position conversion relationship, and may be represented by a conversion matrix.

The positions of the joints of the hand in the first coordinate system can be represented by coordinate values of the joints of the hand in the first coordinate system. Hereinafter, an implementation of how to calculate the position of the hand node in the first coordinate system based on the model data corresponding to the hand node will be described in detail through the embodiments shown in fig. 3 and fig. 4.

S202, acquiring second positions of hand joint points in a plurality of images to be processed respectively, wherein the images to be processed correspond to the reference images one by one.

The electronic equipment can acquire a plurality of images to be processed, respectively detect joint points of the images to be processed, and acquire second positions of hand joint points in the images to be processed respectively, wherein the second positions of the hand joint points in the images to be processed can be represented by coordinate values of pixel points corresponding to the hand joint points in an image coordinate system corresponding to the images to be processed.

For example, the electronic device may analyze the images to be processed using a pre-trained machine learning model, and obtain the second positions of the hand nodes in the respective images to be processed. The present disclosure is not limited to the network structure, type, etc. of the machine learning model, and for example, the machine learning model may be a deep neural network model, a convolutional neural network model, a machine learning model obtained by training with a random forest algorithm, a decision tree model, etc.

S203, performing least square optimization on the first positions of the hand joint points in the plurality of reference images and the second positions of the hand joint points in the plurality of images to be processed according to the hand joint points, and acquiring adjusted second positions corresponding to the hand joint points.

The electronic equipment can obtain and calculate residual functions corresponding to each joint point according to a one-to-one correspondence relationship between a plurality of reference images and a plurality of images to be processed and a first position of each joint point of the hand in the reference images and a second position of each joint point of the hand in the images to be processed respectively, and obtain residual items based on the sum of squares of the residual functions corresponding to each joint point respectively; and when the residual error item is minimum, the second positions of the hand joint points in the plurality of images to be processed are adjusted second positions corresponding to the hand joint points.

To make this step clearer, the following is illustrated by the formula:

assume that the first position of the hand joint point in the jth reference image is (u) _j (i),v _j (i) A second position of the hand joint point in the j-th image to be processed is (u) _j ′(i),v _j ' i), where j is related to the view angle (which can also be understood as the camera corresponding to that view angle), i is the serial number of the hand node. On this basis, the residual term E may satisfy formula (1):

Wherein, define

And the θ when the optimized residual error term E is the minimum is the adjusted second position corresponding to each joint point of the hand. Wherein u is _j (i)-u _j ' (i) is the first residual function, v, corresponding to the joint point i _j (i)-v _j ' (i) is a second residual function corresponding to the joint point i.

S204, mapping the adjusted second position corresponding to the hand joint point to a pre-established three-dimensional coordinate system, and obtaining the target three-dimensional posture of the hand.

Specifically, the adjusted second position corresponding to the hand joint point is mapped to a pre-established three-dimensional coordinate system, and the target three-dimensional gesture of the hand can be identified and obtained based on the connection relation among a plurality of joint points of the hand.

In the embodiment, a first position of a hand joint point in a plurality of reference images and a second position of the hand joint point in a plurality of images to be processed are obtained; when the hand is in a specific three-dimensional posture, mapping the hand joint points based on a plurality of view angles to obtain a map; the images to be processed are images obtained at the multiple view angles for the hands, and the images to be processed correspond to the reference images one by one; performing least square optimization according to the first position and the second position corresponding to the hand joint points respectively, and adjusting the second positions of the hand joint points in the plurality of images to be processed; and obtaining the three-dimensional gesture of the hand based on the adjusted second positions of the hand joint points in the plurality of images to be processed. According to the method provided by the embodiment, the pose information is optimized through multi-view observation, so that the state of multiple views can be matched to the greatest extent, and the obtained three-dimensional pose of the hand can be improved to be more accurate.

Before describing the embodiment shown in fig. 3, a description is first given of several coordinate systems involved:

1. and the first coordinate system is used for unifying the three-dimensional coordinate systems established by the multiple visual angles.

2. The second coordinate system, i.e. the camera coordinate system, is a three-dimensional coordinate system related to the view angle, and therefore, can be understood as a camera coordinate system of the reference image corresponding to the view angle, where the second coordinate system can be established according to the focus center of the corresponding view angle as the origin, and the optical axis as the Z-axis.

2. The third coordinate system, namely the model coordinate system, is a three-dimensional coordinate system established based on the hand model, and the third coordinate system can be a left-hand three-dimensional coordinate system or a right-hand three-dimensional coordinate system, and can be determined according to whether the human hand is left hand or right hand.

Fig. 3 is a flowchart of a hand gesture obtaining method according to an embodiment of the present disclosure. Referring to fig. 3, the method provided in this embodiment includes:

it should be noted that, in connection with the embodiment shown in fig. 2, step S201 in the embodiment shown in fig. 2 may be implemented by steps S301 to S303 in the present embodiment.

S301, acquiring model data corresponding to the plurality of nodes in the hand model of the specific three-dimensional gesture.

The hand includes a plurality of joints, and fig. 4 is a schematic distribution diagram of the joints of the hand of the human body according to an embodiment of the disclosure. Referring to fig. 4, taking the right hand of the human body as an example, the black solid circles in fig. 4 represent a joint point of the hand, so the 20 joint points included in the hand of the human body are respectively: wrist joint 0, joint 1 at the root of the thumb, joint 2 at the middle of the thumb, joint 3 at the tip of the thumb (thumb), joint 4 at the root of the index finger (index), joint 5 at the middle of the index finger, joint 6 at the middle of the index finger, joint 7 at the tip of the index finger, joint 8 at the root of the middle finger (middle), joint 9 at the middle of the middle finger, joint 10 at the middle of the middle finger, joint 11 at the tip of the middle finger, joint 12 at the root of the ring finger (ring), joint 13 at the middle of the ring finger, joint 14 at the middle of the ring finger, joint 15 at the tip of the ring finger, joint 16 at the root of the little finger (pinky), joint 17 at the middle of the little finger, joint 18 at the middle of the little finger, and joint 19 at the tip of the little finger.

It should be noted that, fig. 4 shows a distribution diagram of a right-hand joint point of a human body, and a division manner of a left-hand joint point of the human body is similar to that of a right-hand joint point, and may refer to the division manner of the right-hand joint point, which is not described herein for brevity.

With continued reference to the embodiment shown in fig. 4, the degrees of freedom of the model data corresponding to each of the joint points 0 to 19 are different.

The model data corresponding to the wrist joint point 0 is 6 degrees of freedom data (i.e., 6dof data), and the model data may include position data (x, y, z) and rotation data rotation (x, y, z, w) of the wrist joint point 0, where the rotation data is represented by a quaternion manner.

It should be noted that, the 6dof data corresponding to the wrist node 0 represents the state of the entire hand, so the position data position (x, y, z) of the wrist node 0 may represent the position of the wrist node 0 in the model coordinate system, in addition, the position of the wrist node 0 is used as a reference node important for the hand, and determines the position state of the hand in the world coordinate system, so the position data of the hand node 0 may also represent the position of the wrist node 0 in the world coordinate system; rotation (x, y, z, w) represents rotation data of the wrist joint point 0, xyz represents a rotation axis, w represents a rotation angle, and similarly, rotation data of the wrist joint point 0 may represent a rotation state of the hand in a model coordinate system or a rotation state of the wrist joint point 0 in a world coordinate system.

For the joints of the finger root, such as joint 1 of the thumb root, joint 4 of the index finger root, joint 8 of the middle finger root, joint 12 of the ring finger root and joint 16 of the little finger root, the model data of the joints of the finger root are all 2 degrees of freedom data (namely 2dof data), and the model data of the joints of the finger root comprise rotation data of the joints of the finger root; for the middle joint points of the finger, such as the middle joint point 2 of the thumb, the middle joint point 5 of the index finger, the middle joint point 6 of the index finger, the middle joint point 9 of the middle finger, the middle joint point 10 of the middle finger, the middle joint point 13 of the ring finger, the middle joint point 14 of the ring finger, the middle joint point 17 of the little finger and the middle joint point 18 of the little finger, the model data of the middle joint point of the finger is 1 degree of freedom data (namely 1dof data), and the model data comprises the rotation information of the middle joint point of the finger. The rotation data corresponding to the joint point of the finger root and the joint point of the middle part of the finger can be expressed in a quaternion mode, namely in a rotation (x, y, z, w) mode.

It is noted that, as a result of analysis of each joint point of the hand, the joint point of the finger root can swing left and right and also can bend inward, and therefore, the model data of the joint point of the finger root is 2dof data; the joint point in the middle of the finger can only bend inwards and cannot swing left and right, so that the model data of the joint point in the middle of the finger is 1dof data.

Further, the rotation data of the child joint point with respect to the parent joint point is represented by the rotation data of the root joint point and the joint point in the middle of the finger, respectively, and the rotation data of these joint points represent the rotation of the joint points in the first coordinate system.

Based on the embodiment shown in fig. 4, the joints 3, a7, a11, a15 and a19 are all joints of finger tips (tips), the pose of the finger tip joints is controlled by the associated parent node, and the finger tip joints have no child nodes, so the model data corresponding to the finger tip joints may include displacement data, not include rotation data, i.e. the joints have no dof data, but the finger tip joints also need to calculate positions in the first coordinate system to participate in the subsequent least squares optimization.

The distribution statistics of the model data of each node of the hand shown in fig. 4 is that 26 dof data are distributed in 15 nodes.

S302, for each joint point, according to the model data corresponding to the joint point and the model data corresponding to the associated joint point, acquiring the position of the joint point in a first coordinate system when the joint point is in a specific three-dimensional posture.

The plurality of nodes of the hand can define the father-son relationship among the nodes according to the connection relationship. Illustratively, the wrist joint point 0 is a father node of the joint point 4 at the root of the index finger, the joint point 4 at the root of the index finger is a father node of the joint point 5 at the middle of the index finger, the joint point 5 at the middle of the index finger is a father node of the joint point 6 at the middle of the index finger, and the joint point 6 at the middle of the index finger is a father node of the joint point 7 at the tip of the index finger; the child-parent relationship between the wrist node 0 and the other joints of each finger is similar to the child-parent relationship between the wrist node 0 and the joints of the index finger in the previous example, and is not described here again for brevity.

If the plurality of nodes of the hand have a child-parent relationship, the positions of the nodes in the first coordinate system can be obtained based on the child-parent relationship.

As a possible implementation manner, the positions of the joints of the hand in a third coordinate system can be obtained based on the model data corresponding to the joints of the hand and the sub-parent relationship between the joints, wherein the third coordinate system is a three-dimensional coordinate system established according to the hand model, and therefore, the third coordinate system can be also called a model coordinate system; and mapping the positions of the joints of the hand in the third coordinate system to the first coordinate system based on the conversion relation between the third coordinate system and the first coordinate system, so as to obtain the positions of the joints of the hand in the first coordinate system.

The obtaining of the positions of the joints of the hand in the third coordinate system can be achieved through the following formula (2):

p _{child node} ＝R _{Father node} *T _{Child node} +p _{Father node} Formula (2)

Wherein p is _{Child node} Representing the position of the joint point corresponding to the child node in a third coordinate system; p is p _{Father node} Representing the position of the joint point corresponding to the father node in a third coordinate system; t (T) _{Child node} The displacement of the joint point corresponding to the child node relative to the joint point corresponding to the parent node is represented by model data including displacement data corresponding to the joint point. R is R _{Father node} The first rotation matrix is a rotation matrix representing the joint points corresponding to the child nodes relative to the joint points corresponding to the father nodes, wherein R _{Father node} The rotation data of the node corresponding to the child node and the first rotation matrix corresponding to the parent node can be obtained.

Next, taking an index finger as an example, how to obtain the positions of the joints of the index finger in the first coordinate system according to the model data of the joints and the model data of the associated joints.

1. Joint 4 of index finger root

The position of the node 4 in the third frame can be obtained by the formula (3):

p (4) =r (0) ×t (4) +p (0) formula (3)

Wherein p (4) represents the position of the joint point 4 of the root of the index finger in the third coordinate system.

p (0) represents position information of the wrist joint point 0, that is, position (x, y, z) in the 6dof data corresponding to the wrist joint point.

R (0) represents a rotation matrix of the joint point 4 at the root of the index finger relative to the wrist joint point 0, namely, corresponds to the first rotation matrix; t (4) represents the displacement of the joint point of the root of the index finger relative to the wrist joint point, wherein when a hand model of a specific three-dimensional posture is acquired, T (4) can be obtained from model data, and T (4) can be constant.

2. Joint point 5 in the middle of the index finger

The position of the articulation point 5 in the third coordinate system can be obtained by the formula (4):

wherein p (5) represents the position of the joint point 5 in the middle of the index finger in the third coordinate system.

p (4) represents position information of the joint point 4 of the root of the index finger in the third coordinate system.

R (4) represents a rotation matrix of the joint point 5 in the middle of the index finger relative to the joint point 4 in the middle of the index finger, namely, the rotation matrix corresponds to the first rotation matrix; r (4) represents a rotation matrix corresponding to rotation data (1 dof data) of the joint point 4 in the middle of the index finger.

T (5) represents the displacement of the joint point 5 in the middle of the index finger relative to the joint point 4 in the root of the index finger, wherein T (5) can be obtained from model data and T (5) can be constant when a hand model of a specific three-dimensional pose is acquired.

3. Joint point 6 in the middle of index finger

The position of the articulation point 6 in the third coordinate system can be obtained by the formula (5):

wherein p (6) represents the position of the joint point 6 in the middle of the index finger in the third coordinate system.

p (5) represents position information of the joint point 5 of the root of the index finger in the third coordinate system.

R (5) represents a rotation matrix of the joint point 6 in the middle of the index finger relative to the joint point 5 in the middle of the index finger, namely, the rotation matrix corresponds to the first rotation matrix; r (5) represents a rotation matrix corresponding to rotation data (1 dof data) of the articulation point 5 in the middle of the index finger.

T (6) represents the displacement of the joint point 6 in the middle of the index finger relative to the joint point 5 in the middle of the index finger, wherein T (6) can be obtained from model data and T (6) can be constant when a hand model of a specific three-dimensional pose is acquired.

4. Joint 7 of index finger tip

The position of the articulation point 7 in the third coordinate system can be obtained by the formula (6):

wherein p (7) represents the position of the joint point 7 of the index finger tip in the third coordinate system.

p (6) represents the positional information of the joint point 6 in the middle of the index finger in the third coordinate system.

R (6) represents a rotation matrix of the joint point 7 of the tip of the index finger relative to the joint point 6 in the middle of the index finger, namely, the rotation matrix corresponds to the second rotation matrix; r (6) represents a rotation matrix corresponding to rotation data (1 dof data) of the joint point 6 in the middle of the index finger.

T (7) represents the displacement of the joint point 7 of the tip of the index finger relative to the joint point 6 in the middle of the index finger, wherein T (7) may be obtained from model data and T (7) may be constant when a hand model of a specific three-dimensional pose is acquired.

Through the mode, the positions of all the joints of the index finger of the hand in the third coordinate system can be obtained. The manner of acquiring the positions of the nodes of other fingers of the hand in the third coordinate system is similar to that of the nodes of the index finger, and for brevity, the description is omitted here.

It should be noted that, because there is one less node of thumb relative to index finger, but the calculation process is similar, the calculation is performed by substituting the child-parent relationship between each node into formula (1).

And (3) completing the calculation process aiming at each joint point, namely obtaining the position of each joint point of the hand in a third coordinate system when the specific three-dimensional gesture is obtained.

Then, according to the conversion relation between the third coordinate system and the first coordinate system, for example, when the conversion relation is represented by the conversion matrix, the position of the hand joint point in the first coordinate system can be obtained for each hand joint point according to the product of the position of the hand joint point in the third coordinate system and the conversion matrix.

Let the transformation matrix between the third coordinate system and the first coordinate system be M ₁ If the position of the hand joint point in the third coordinate system is p (i), the position of the hand joint point in the first coordinate system p' (i) =m is obtained ₁ *p(i)。

In some cases, the first coordinate system and the third coordinate system are aligned coordinate systems, and it is also understood that the position of the hand joint point in the third coordinate system is the position of the hand joint point in the first coordinate system without conversion between the first coordinate system and the third coordinate system.

S303, for each reference image, according to the positions of a plurality of nodes of the hand in a first coordinate system and the conversion relation between the first coordinate system and a corresponding second coordinate system, acquiring the positions of the plurality of nodes of the hand in the second coordinate system respectively, wherein the second coordinate system is a camera coordinate system of a view angle corresponding to the reference image.

The purpose of this step is: the projection of multiple views, that is, the mapping of the positions of the nodes of the hand in the first coordinate system to the images corresponding to the views, can be realized by a second rotation matrix for representing the conversion relationship between the first coordinate system and the second coordinate system.

Illustratively, assume that the transformation matrix between the world coordinate system and the camera coordinate system corresponding to the jth view angle is K _j If the position of the hand joint point in the world coordinate system is p' (i), the position of the hand joint point in the camera coordinate system corresponding to the j-th view angle is p _j (i)＝K _j *p′(i)。

It should be noted that, for the reference image of each view angle, the corresponding second rotation matrix is the conversion matrix K _j 。

S304, for each reference image, acquiring the first position of each joint point of the hand in the reference image according to the camera parameters corresponding to the visual angle and the positions of each joint point of the hand in the second coordinate system corresponding to the visual angle.

In this step, the camera parameters corresponding to the viewing angle include camera parameters, such as focal length parameters of the camera, center offset of the camera, and the like.

Assume that the camera internal parameter corresponding to a viewing angle is (f) _xj ,f _yj ,cu _j ,cv _j ) Wherein f _xj 、f _yj Focal length parameter, cu, of camera representing jth view _j 、cv _j And represents the center offset of the camera corresponding to the j-th view angle. Therefore, the first position of each node of the hand in the reference image corresponding to the j-th view angle can be obtained through the following formula (7).

Equation (7) is as follows:

for each view angle (i.e., each reference image), the first position of each joint point of the hand in each reference image can be obtained by performing calculation according to the above formula (7).

S305, acquiring second positions of hand joint points in a plurality of images to be processed respectively, wherein the images to be processed are in one-to-one correspondence with the reference images.

S306, performing least square optimization on the first positions of the hand joint points in the plurality of reference images and the second positions of the hand joint points in the plurality of images to be processed according to the hand joint points, and acquiring adjusted second positions corresponding to the hand joint points.

S307, mapping the adjusted second position corresponding to the hand joint point to a pre-established three-dimensional coordinate system, and obtaining the target three-dimensional posture of the hand.

Steps S305 to S307 in this embodiment are similar to steps S202 to S204 in the embodiment shown in fig. 2, respectively, and reference may be made to the foregoing detailed description in the embodiment shown in fig. 2, and for brevity, the description is omitted here.

It should be noted that, in connection with the embodiment shown in fig. 4, the dof data of each node of the hand is distributed among 15 nodes, and the total of 26 dof data, so, in connection with the embodiment shown in fig. 2, as shown in the formula (1) in the step S203, θ when the optimized residual term E is minimum may be 26 dof data, which corresponds to the distribution of the dof data of each node in the embodiment shown in fig. 4.

Based on the 26 dof data after adjustment, the positions of the hand joint points in a preset three-dimensional coordinate system can be obtained by combining the child-parent relations among the hand joint points, and then the target three-dimensional gesture of the hand can be obtained.

In addition, as can be seen from the foregoing, the first coordinate system is a three-dimensional coordinate system for establishing views for unifying multiple viewing angles, and thus the preset three-dimensional coordinate system in S307 may be the foregoing first coordinate system.

According to the method provided by the embodiment, the first positions of the hand joint points in the multiple reference images and the second positions of the hand joint points in the multiple images to be processed are obtained; when the hand is in a specific three-dimensional posture, the plurality of reference images comprise images obtained by mapping the hand joint points respectively based on a plurality of view angles; the plurality of images to be processed comprise images obtained at the plurality of view angles for the hands, and the images to be processed correspond to the reference images one by one; performing least square optimization according to the first position and the second position corresponding to the hand joint points respectively, and adjusting the second positions of the hand joint points in the plurality of images to be processed; and obtaining the target three-dimensional gesture of the hand based on the adjusted second positions of the hand joint points in the plurality of images to be processed. According to the method, dof data are optimized through multi-view observation, states of multiple views can be matched to the greatest extent, and accuracy of the obtained three-dimensional gesture of the hand can be improved. In addition, the dof data is used for driving the hand gesture, so that the generation of the hand three-dimensional gesture which does not accord with the rule can be avoided, and the rendering complexity can be reduced.

Illustratively, the present disclosure provides a hand gesture acquisition device.

Fig. 5 is a schematic structural diagram of a hand gesture acquiring device according to an embodiment of the present disclosure.

Referring to fig. 5, the hand gesture acquiring apparatus 500 provided in this embodiment includes:

an acquiring module 501, configured to acquire first positions of hand joint points in a plurality of reference images, respectively; when the hand is in a specific three-dimensional posture, the plurality of reference images comprise images obtained by mapping the hand joint points respectively based on a plurality of view angles.

The acquiring module 501 is further configured to acquire second positions of the hand node in the plurality of images to be processed, respectively; the plurality of images to be processed comprise images obtained at the plurality of view angles for the hand, and reference images at the same view angle correspond to the images to be processed one by one.

And the position adjustment module 502 is configured to perform LM according to the first positions of the hand node in the multiple reference images and the second positions of the hand node in the multiple images to be processed, respectively, to obtain adjusted second positions corresponding to the hand node.

The three-dimensional gesture recognition module 503 is configured to map the adjusted second position corresponding to the hand joint point to a pre-established three-dimensional coordinate system, and obtain a target three-dimensional gesture of the hand.

As a possible implementation manner, the position adjustment module 502 is specifically configured to obtain a residual term according to a first position of the hand node in the multiple reference images and a second position of the hand node in the multiple images to be processed, respectively; and when the residual error item is determined to be the minimum value, the obtained position of the hand joint point is the adjusted second position corresponding to the hand joint point.

As one possible implementation, the hand joint includes a plurality of joints; the acquiring module 501 is specifically configured to acquire a position of each of the nodes in the first coordinate system when the specific three-dimensional gesture is performed; the first coordinate system is a world coordinate system established according to the plurality of view angles; for each reference image, acquiring the position of the joint point in a second coordinate system according to the position of the joint point in the first coordinate system and the conversion relation between the first coordinate system and the second coordinate system; the second coordinate system is a camera coordinate system of the view angle corresponding to the reference image; and for each view angle, acquiring a first position of the joint point in a reference image corresponding to the view angle according to the position of the joint point in a second coordinate system corresponding to the view angle and corresponding camera parameters.

As a possible implementation manner, the obtaining module 501 is specifically configured to obtain model data corresponding to each of the plurality of nodes in the hand model with the specific three-dimensional gesture; for each node, acquiring the position of the node in the first coordinate system according to the model data corresponding to the node and the model data corresponding to the associated node; the associated joint point is a father node corresponding to the joint point.

As a possible implementation manner, the obtaining module 501 is specifically configured to obtain, according to displacement data of the joint point relative to the associated joint point, a first rotation matrix, and a position of the associated joint point in a third coordinate system, where the third coordinate system is a three-dimensional coordinate system established according to the hand model; and acquiring the positions of the joint points in the first coordinate system according to the conversion relation between the third coordinate system and the first coordinate system.

As a possible implementation manner, the obtaining module 501 is specifically configured to multiply the displacement data of the joint point relative to the associated joint point by the first rotation matrix, and add a result obtained by the multiplication to the position of the associated joint point in the third coordinate system, to obtain the position of the joint point in the third coordinate system.

As a possible implementation manner, the obtaining module 501 is specifically configured to obtain the position of the joint point in the camera coordinate system according to the product of the position of the joint point in the first coordinate system and the second rotation matrix; the camera coordinate system is a camera coordinate system of a view angle corresponding to the second coordinate system; the second rotation matrix is a transformation matrix between the first coordinate system and the second coordinate system.

The hand gesture obtaining device provided in this embodiment may be used to implement the technical solution of any of the foregoing method embodiments, and its implementation principle and technical effects are similar, and reference may be made to the detailed description of the foregoing method embodiments, which is omitted herein for brevity.

Fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the disclosure. Referring to fig. 6, an electronic device 600 provided in this embodiment includes: a memory 601 and a processor 602.

The memory 601 may be a separate physical unit, and may be connected to the processor 602 through a bus 603. The memory 601, the processor 602 may be integrated, implemented by hardware, or the like.

The memory 601 is used for storing program instructions, which the processor 602 invokes to execute the technical solutions of any of the above method embodiments.

Alternatively, when some or all of the methods of the above embodiments are implemented in software, the electronic device 600 may include only the processor 602. The memory 601 for storing programs is located outside the electronic device 600, and the processor 602 is connected to the memory through a circuit/wire for reading and executing the programs stored in the memory.

The processor 602 may be a central processing unit (central processing unit, CPU), a network processor (network processor, NP) or a combination of CPU and NP.

The processor 602 may further include a hardware chip. The hardware chip may be an application-specific integrated circuit (ASIC), a programmable logic device (programmable logic device, PLD), or a combination thereof. The PLD may be a complex programmable logic device (complex programmable logic device, CPLD), a field-programmable gate array (field-programmable gate array, FPGA), general-purpose array logic (generic array logic, GAL), or any combination thereof.

The memory 601 may include a volatile memory (RAM) such as a random-access memory (RAM); the memory may also include a nonvolatile memory (non-volatile memory), such as a flash memory (flash memory), a hard disk (HDD) or a Solid State Drive (SSD); the memory may also comprise a combination of the above types of memories.

The present disclosure also provides a readable storage medium comprising: computer program instructions; the computer program instructions, when executed by at least one processor of an electronic device, implement the hand gesture acquisition method shown in any of the method embodiments described above.

The present disclosure also provides a computer program product which, when executed by a computer, causes the computer to implement the hand gesture acquisition method shown in any of the above method embodiments.

It should be noted that in this document, relational terms such as "first" and "second" and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

The foregoing is merely a specific embodiment of the disclosure to enable one skilled in the art to understand or practice the disclosure. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the disclosure. Thus, the present disclosure is not intended to be limited to the embodiments shown and described herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A hand gesture acquisition method, comprising:

2. The method according to claim 1, wherein the obtaining the optimized position corresponding to the hand joint point according to the LM performed by the first position of the hand joint point in the plurality of reference images and the second position of the hand joint point in the plurality of images to be processed, respectively, includes:

3. The method of claim 1 or 2, wherein the hand joint comprises a plurality of joints; the acquiring the first positions of the hand joint points in the plurality of reference images respectively includes:

4. A method according to claim 3, wherein said obtaining the position of each of said nodes in the first coordinate system when said particular three-dimensional pose is obtained comprises:

obtaining model data corresponding to the joint points in the hand model of the specific three-dimensional gesture;

5. The method of claim 4, wherein the obtaining the position of the joint point in the first coordinate system according to the model data corresponding to the joint point and the model data corresponding to the associated joint point comprises:

6. The method of claim 5, wherein the obtaining the position of the joint point in a third coordinate system based on the displacement data of the joint point relative to the associated joint point, the first rotation matrix, and the position of the associated joint point in the third coordinate system comprises:

7. A method according to claim 3, wherein the obtaining the position of the joint point in the second coordinate system according to the position of the joint point in the first coordinate system and the conversion relation between the first coordinate system and the second coordinate system comprises:

8. The method of claim 1, wherein the particular three-dimensional pose is predicted from a target three-dimensional pose of a previous hand or hands.

9. A hand gesture acquisition device, comprising:

10. An electronic device, comprising: a memory and a processor;

the memory is configured to store computer program instructions;

The processor is configured to execute the computer program instructions to cause the electronic device to implement the hand gesture acquisition method of any one of claims 1 to 8.

11. A readable storage medium, comprising: computer program instructions;

at least one processor of an electronic device executes the computer program instructions to cause the electronic device to implement the hand gesture acquisition method of any one of claims 1 to 8.