WO2021103648A1 - 手部关键点检测方法、手势识别方法及相关装置 - Google Patents

手部关键点检测方法、手势识别方法及相关装置 Download PDF

Info

Publication number
WO2021103648A1
WO2021103648A1 PCT/CN2020/107960 CN2020107960W WO2021103648A1 WO 2021103648 A1 WO2021103648 A1 WO 2021103648A1 CN 2020107960 W CN2020107960 W CN 2020107960W WO 2021103648 A1 WO2021103648 A1 WO 2021103648A1
Authority
WO
WIPO (PCT)
Prior art keywords
hand
vector
key points
heat map
key
Prior art date
Application number
PCT/CN2020/107960
Other languages
English (en)
French (fr)
Inventor
项伟
王毅峰
Original Assignee
百果园技术(新加坡)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 百果园技术(新加坡)有限公司 filed Critical 百果园技术(新加坡)有限公司
Priority to EP20893388.7A priority Critical patent/EP4068150A4/en
Priority to US17/780,694 priority patent/US20230252670A1/en
Publication of WO2021103648A1 publication Critical patent/WO2021103648A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/107Static hand or arm
    • G06V40/113Recognition of static hand signs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/107Static hand or arm
    • G06V40/11Hand-related biometrics; Hand pose recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20076Probabilistic image processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person

Definitions

  • the present disclosure relates to the field of computer vision technology, for example, to hand key point detection methods, hand key point detection devices, gesture recognition methods, gesture recognition devices, equipment, and storage media.
  • gesture recognition In the field of computer vision, gesture recognition is widely used in scenarios such as human-computer interaction and sign language recognition. Gesture recognition relies on the detection of key points in the hand. With the popularity of mobile terminals and mobile Internet, gesture recognition is also widely used In the mobile terminal.
  • Hand key points refer to multiple joint points in the hand.
  • the most commonly used method for hand key point detection in related technologies is to use deep convolutional neural networks to output the three-dimensional coordinates of hand key points through deep convolutional neural networks.
  • a deep convolutional neural network containing multiple convolutional layers and fully connected layers is used to extract the image features of a two-dimensional hand image, and then the three-dimensional coordinates of key points of the hand are returned through the fully connected layer.
  • the deep convolutional neural network The network is complex and the amount of data calculation is large. However, it is limited by the computing power of the mobile terminal.
  • the present disclosure provides a hand key point detection method, a hand key point detection device, a gesture recognition method, a gesture recognition device, equipment, and a storage medium, so as to solve the problem of computing when the hand key point detection method is applied to a mobile terminal in related technologies
  • the long time and poor real-time performance limit the application of gesture recognition in mobile terminals.
  • a method for detecting key points of the hand including:
  • the three-dimensional coordinates of the key points of the hand in the world coordinate system are determined according to the structured connection information of the hand and the two-dimensional coordinates in the heat map.
  • a gesture recognition method including:
  • the detecting the key points in the hand image includes: detecting the key points in the hand image according to the hand key point detection method described in the present disclosure.
  • a device for detecting key points of the hand which includes:
  • the hand image acquisition module is configured to acquire the hand image to be detected
  • a heat map acquisition module configured to input the hand image into a pre-trained heat map model to obtain a heat map of key points of the hand, the heat map including the two-dimensional coordinates of the key points of the hand;
  • a hand structured connection information acquisition module configured to input the heat map and the hand image into a pre-trained three-dimensional information prediction model to obtain hand structured connection information
  • the three-dimensional coordinate calculation module is configured to determine the three-dimensional coordinates of the key points of the hand in the world coordinate system according to the structured connection information of the hand and the two-dimensional coordinates in the heat map.
  • a gesture recognition device including:
  • the hand image acquisition module is configured to acquire the hand image to be recognized
  • a key point detection module configured to detect key points in the hand image
  • a gesture recognition module configured to recognize the gesture expressed by the hand in the hand image based on the key point
  • the key point detection module is configured to detect the key points in the hand image according to the hand key point detection device of the present disclosure.
  • a device is also provided, and the device includes:
  • One or more processors are One or more processors;
  • Storage device set to store one or more programs
  • the one or more processors When the one or more programs are executed by the one or more processors, the one or more processors implement the hand key point detection method and/or gesture recognition method described in the present disclosure.
  • a computer-readable storage medium is also provided, on which a computer program is stored, and when the computer program is executed by a processor, it implements the hand key point detection method and/or gesture recognition method described in the present disclosure.
  • FIG. 1 is a flowchart of a method for detecting key points of a hand according to Embodiment 1 of the present invention
  • Fig. 2 is a schematic diagram of key points of the hand according to an embodiment of the present invention.
  • FIG. 3 is a flowchart of a method for detecting hand key points according to Embodiment 2 of the present invention.
  • FIG. 4 is a schematic diagram of the hand coordinate system and the world coordinate system in the embodiment of the present invention.
  • FIG. 5 is a flowchart of a gesture recognition method provided by Embodiment 3 of the present invention.
  • FIG. 6 is a schematic diagram of key points of the hand detected during gesture recognition in an embodiment of the present invention.
  • FIG. 7 is a schematic diagram of gestures expressed by the key points of the hand in FIG. 6;
  • FIG. 8 is a structural block diagram of a hand key point detection device provided by the fourth embodiment of the present invention.
  • FIG. 9 is a structural block diagram of a gesture recognition device provided by Embodiment 5 of the present invention.
  • FIG. 10 is a structural block diagram of a device according to Embodiment 6 of the present invention.
  • FIG. 1 is a flowchart of a method for detecting key points of a hand according to the first embodiment of the present invention.
  • the embodiment of the present invention is applicable to the case of detecting key points of a hand.
  • the method can be executed by a device for detecting key points of a hand.
  • the device can be implemented by software and/or hardware, and integrated in a device that executes the method.
  • the hand key point detection method of the embodiment of the present invention may include the following steps:
  • the hand image to be detected may be an image for which the three-dimensional coordinates of key points of the hand need to be detected.
  • the hand image may be a bitmap (bitmap, bmp), or a Joint Photographic Experts Group (Joint Photographic Experts Group). , Jpg), portable network graphics (Portable Network Graphics, png), tag image file format (Tag Image File Format, tif) and other storage formats containing the physiological characteristics of the hand image, in addition, the hand image may be a color image.
  • hand images can be acquired in the scene of a gesture recognition application.
  • the scene of the gesture recognition application can be human-computer interaction controlled by gestures (Virtual Reality (VR) applications), sign language recognition (live sign language) Teaching) and other scenes.
  • hand images can be collected by the image acquisition device, and the images can also be detected to obtain hand images from the images.
  • the embodiment of the present invention does not take into account the scenes and methods for obtaining hand images. limit.
  • the heat map may be an image of the area to which the key point of the hand belongs in a special highlight form.
  • the value associated with a position on the heat map is the probability that the key point of the hand is at that position. The greater the probability of, the closer the position is to the center of the Gaussian kernel on the heat map. Therefore, the center of the Gaussian kernel is the pose with the highest probability, that is, the position of the key points of the hand.
  • the heat map model can be pre-trained.
  • the heat map model can output the heat map of the key points of the hand.
  • the heat map model can be obtained from one or more neural networks.
  • a deep convolutional neural network can be used to train the heat map model, such as For a hand image with known two-dimensional coordinates of the key points of the hand, first, use the two-dimensional coordinates of the key points of the hand to generate a Gaussian kernel, which is the Gaussian kernel of the heat map.
  • Partial image input deep convolutional neural network output a heat map, using the Gaussian kernel in the output heat map and the Gaussian kernel generated before to calculate the loss rate, and then adjust the parameters of the deep convolutional neural network, and continuously iterate the depth volume
  • the product neural network stops iterating until the loss rate is less than the preset value or reaches the preset number of iterations.
  • the final deep convolutional neural network is the heat map model. After inputting a hand image to the heat map model, you can get A heat map of multiple hand key points of the hand.
  • the center of the Gaussian kernel in the heat map is the location of the key points of the hand, that is, the coordinates of the center of the Gaussian kernel are the two-dimensional coordinates of the key points of the hand.
  • Figure 2 shows a schematic diagram of the key points of the hand.
  • the key points of the hand can include the key point O of the wrist and multiple key points (key points MCP, PIP, DIP and TIP) on each finger, as shown in Figure 2.
  • the key point of the wrist is point O
  • each finger includes four key points of MCP, PIP, DIP and TIP.
  • the key points of the wrist and the key points of multiple fingers constitute the key points of the hand.
  • the structured connection information of the hand may include the Euler angle of the hand and the joint bending angle formed by multiple key points of the hand.
  • the Euler angle of the hand may be the hand coordinate system relative to the world.
  • the representation of the coordinate system expresses the position and posture of the hand in three-dimensional space; the joint bending angle formed by the key points of the hand can include the angle formed by the connection between the key point MCP of two adjacent fingers and the key point O of the wrist.
  • a also includes the angle b at the key point MCP, the angle c at the key point PIP, and the angle d at the key point DIP.
  • the Euler angle of the hand marked by the manual marking method, the angle of the hand joint obtained by the sensor, and the heat map of multiple key points of the hand predicted by the heat map model can be used as
  • the training data is used to train a three-dimensional information prediction model, and the three-dimensional information prediction model may be a variety of neural networks.
  • the hand structured connection information including the Euler angle of the hand and the joint bending angle formed by multiple key points of the hand can be obtained.
  • S104 Determine the three-dimensional coordinates of the key points of the hand in the world coordinate system according to the structured connection information of the hands and the two-dimensional coordinates in the heat map.
  • the hand structured connection information after obtaining the hand structured connection information through the three-dimensional information prediction model, can be used to determine the direction of the vector formed by two adjacent hand key points in the hand coordinate system Vector, and then convert the direction vector into a direction vector in the world coordinate system through Euler angles, and at the same time obtain the two-dimensional coordinates of each key point of the hand through the heat map, so that the formation of two key points of the hand can be obtained
  • the vector length of the vector of the vector can be determined after knowing the vector length and direction vector of the vector.
  • the three-dimensional coordinates of the key points of the wrist in the world coordinate system can be obtained through the imaging principle of the hand image. According to the principle of vector addition, The three-dimensional coordinates of each key point of the hand in the world coordinate system.
  • the heat map containing the two-dimensional coordinates of the key points of the hand is first obtained through the heat map model, and then the structured connection information of the hand is obtained through the three-dimensional information prediction model.
  • Dimensional coordinates and hand structured connection information calculates the three-dimensional coordinates of the key points of the hand.
  • the two models are used to predict the two-dimensional coordinates and the structured connection information of the hand.
  • each model has a simple structure and a small amount of calculation, which is suitable for mobile terminals with limited computing capabilities. And because of the simple structure of the model, the small amount of calculation, and the short detection time of the key points of the hand, it is realized
  • the real-time detection of key points of the hand on the mobile terminal facilitates the application of gesture recognition to the mobile terminal.
  • FIG. 3 is a flowchart of a method for detecting key points of a hand according to Embodiment 2 of the present invention.
  • the embodiment of the present invention is described on the basis of the foregoing Embodiment 1.
  • the hand of the embodiment of the present invention The key point detection method may include the following steps:
  • the original image may be an image including a hand, for example, it may be an image including the entire human body or an image including an arm and a palm, and the original image may be an image collected by an image acquisition device.
  • the original image can be an image collected by a camera during a live broadcast, or an image extracted from multimedia video data.
  • the hand can be the part from the wrist to the end of the finger.
  • the hand can be detected from the original image by the hand detection algorithm.
  • the hand in the original image can be detected by the semantic segmentation network.
  • the hand is detected from the original image in other ways, and the embodiment of the present invention does not limit the way of detecting the hand from the original image.
  • An image of a preset size including the hand is intercepted as a hand image to be detected.
  • the image area is used as a hand image.
  • the image area can be an area of a preset size.
  • the shape of the image area can be a square, and the square area can be scaled to a size of 64 ⁇ 64. It is a three-dimensional tensor of 64 ⁇ 64 ⁇ 3, 64 ⁇ 64 is the size of the hand image, and 3 is the RGB channel of the two-dimensional image.
  • an image of a preset size containing a hand is intercepted from an original image as the hand image to be detected, so that the background included in the hand image is reduced, and subsequent model processing focuses more on the characteristics of the hand itself.
  • the amount of data to be processed is reduced, and the efficiency of hand key point detection can be improved.
  • the heat map model of the embodiment of the present invention can be pre-trained.
  • the heat map model can output the heat map of the key points of the hand.
  • the heat map model can be obtained from one or more neural networks, for example, it can be trained using a deep convolutional neural network.
  • the heat map model After inputting a hand image, the heat map model can obtain the heat map of multiple key points of the hand.
  • the center of the Gaussian kernel in the heat map is the position of the key points of the hand.
  • the coordinates of the center are the two-dimensional coordinates of the key points of the hand.
  • the hand image can be a 64 ⁇ 64 ⁇ 3 three-dimensional tensor after being scaled.
  • the heat map model is actually a deep neural network, which is extracted by the deep neural network. Image features, and finally output the heat map of all key points of the hand.
  • the heat map model outputs 20 heat maps.
  • the size of each heat map is the same as the hand image. The same, that is, the size of the heat map is also 64 ⁇ 64.
  • the three-dimensional information prediction model can be trained by knowing the Euler angles of the hands, the angles of the hand joints, and the heat maps of multiple key points of the hands predicted by the heat map model.
  • the model can output the Euler angle of the hand and the angle formed by the bending of the joints of the key points of the hand after inputting the heat map of the key points of the hand and the hand image.
  • the heat map of the hand key points is the same size as the hand image.
  • the hand image is a 64 ⁇ 64 ⁇ 3 three-dimensional tensor.
  • the heat map can be expressed as a 64 ⁇ 64 ⁇ 20 three-dimensional tensor.
  • the above two tensors are connected to form a 64 ⁇ 64 ⁇ 23 tensor and input into the trained three-dimensional information prediction model to obtain multiple key points of the hand
  • the formed joint bending angle and the Euler angle of the hand is the same size as the hand image.
  • the hand image is a 64 ⁇ 64 ⁇ 3 three-dimensional tensor.
  • the heat map can be expressed as a 64 ⁇ 64 ⁇ 20 three-dimensional tensor.
  • the above two tensors are connected to form a 64 ⁇ 64 ⁇ 23 tensor and input into the trained three-dimensional information prediction model to obtain multiple key points of the hand
  • the formed joint bending angle and the Euler angle of the hand are
  • S306 Calculate the first direction vector of the vector formed by the two key points of the hand in the hand coordinate system of the hand according to the bending angle of the joint.
  • a vector is a quantity with a magnitude and a direction.
  • any two hand key points can form a vector.
  • the magnitude of the vector is the distance between the two hand key points, and the direction of the vector is two The direction of the line connecting the key points of the hand.
  • the vector B shown in Figure 2 is the vector formed from the key point O of the wrist to the key point MCP near the phalanx of the little finger. Based on this, it can be calculated in the hand based on the predicted joint bending angle.
  • the first direction vector of the vector formed by the two key points of the hand in the hand coordinate system can include the following steps:
  • the hand model is established as follows: Assume that the wrist key point O and the key point MCP of all fingers are coplanar in the three-dimensional space. Assume that the wrist key point O and the key points of each finger MCP, PIP, DIP, and TIP are coplanar in the three-dimensional space. And it is parallel to the plane where the palm is located. Due to the limitation of the hand skeleton, the joints of each finger can only do some bending and stretching movements. Therefore, the key points of multiple fingers other than the key point O of the wrist are always coplanar. To simplify the problem, suppose that the key point O of the wrist is also coplanar with the key point of each finger.
  • the direction from the wrist key point O to the middle finger key point MCP is the positive direction of the y-axis to establish the y-axis. It can be seen that the y-axis is located on the plane where the palm is located.
  • the direction of the thumb side is the positive direction of the x-axis to establish the x-axis; the direction perpendicular to the xy plane and the back of the hand is the positive direction of the z-axis to establish the z-axis.
  • the first direction vector of the vector C formed from the key point O of the wrist to the key point MCP of the middle finger can be obtained as (0, 1, 0).
  • the direction of the vector has nothing to do with the length of the vector.
  • the direction vector of one vector can be obtained by rotating the vector of the known direction vector by a certain angle. From the key point of the wrist to the key point near the phalanx of each finger The vector formed by the MCP can be obtained from the first direction vector of the vector from the key point of the wrist to the key point of the proximal phalanx of the middle finger and the predicted joint bending angle.
  • the first direction vector of the vector D can be obtained by the vector C and the included angle ⁇ , that is, the direction vector rotation of the vector C
  • the angle ⁇ can obtain the first direction vector of the vector D, that is, the first direction vector of the vector D is sin ⁇ , cos ⁇ , 0).
  • the first direction vector of the vector formed by the key point O of the wrist to the key point MCP near the phalanx of other fingers can be obtained by rotating the adjacent vector by a certain angle, which is the joint predicted by the three-dimensional information prediction model.
  • the angle of the bend is ⁇ and a in Figure 2.
  • the wrist key point O to the finger’s near phalange key point MCP can be used Calculate the first direction vector of the vector between the two key points connected by each phalanx of the finger, and the bending angle of the multiple joints predicted by the three-dimensional information prediction model.
  • the first direction vector of the vector B from the key point O of the wrist to the key point MCP of the little finger near the phalanx has been calculated in the aforementioned S3062, and the key point MCP of the little finger can be obtained through the three-dimensional information prediction model.
  • PIP, DIP joint bending angles are b, c, d
  • the angle between vector E and vector B formed by key point MCP and key point PIP is b
  • key point PIP and key The angle between the vector F and the vector B formed by the point DIP is the sum of the angle b and the angle c
  • the angle between the vector G and the vector B formed by the key point DIP and the key point TIP is the angle b and the angle c
  • the angle d after knowing the angle between each vector and the wrist key point O to the near phalange key point MCP, the vector B formed by rotating the wrist key point O to the finger near phalange key point MCP can get each The first direction vector of the vector.
  • the above takes the little finger as an example to illustrate the calculation method of the first direction vector of the vector formed by multiple key points on the little finger.
  • the method of calculating the first direction vector of the vector formed by multiple key points is the same, and will not be detailed here. Narrated.
  • the first direction vector of each vector is the direction vector in the hand coordinate system. Since the hand has a certain pose in space, it is necessary to convert the first direction vector of each vector to world coordinates.
  • the direction vector under the system that is, the second direction vector, can use Euler angles to calculate the Euler rotation matrix, and calculate the product of the first direction vector and the Euler rotation matrix to get the second direction vector of the first direction vector in the world coordinate system .
  • Fig. 4 is a schematic diagram of the hand coordinate system and the world coordinate system in the embodiment of the present invention.
  • the Euler angle can be represented by three angles ⁇ , ⁇ , ⁇ , and the coordinate system xyz is the hand coordinate system
  • XYZ is the world coordinate system
  • the angle between the x axis and the N axis is ⁇
  • the angle between the z axis and the Z axis is ⁇
  • the angle between the N axis and the X axis is ⁇
  • the N axis is x
  • the state of the hand is the initial state.
  • the initial state of the hand after Euler angle rotation can get the current state of the hand in the three-dimensional space, that is, the pose of the hand in the world coordinate system.
  • the hand coordinate system rotates simultaneously with the rotation of the hand.
  • the coordinates of the key points of the hand in the hand coordinate system do not change, while the coordinates of the key points of the hand change in the world coordinate system.
  • the process of rotation of the part can be as follows: first rotate the angle ⁇ around the z axis, then rotate the angle ⁇ around the N axis, and finally rotate the angle ⁇ around the Y axis, to get the current state of the hand in the world coordinate system.
  • the Euler rotation matrix expresses the conversion relationship of the vector from the hand coordinate system to the world coordinate system, and the Euler rotation matrix is as follows:
  • the second direction vector of the vector in the world coordinate system can be obtained after the first direction vector is multiplied by the Euler rotation matrix.
  • S308 Calculate the vector length of the vector by using the two-dimensional coordinates in the heat map.
  • the vector is composed of two hand key points. For each vector, the two hand key points constituting the vector can be determined, and the two-dimensional coordinates of the two hand key points can be determined based on the heat map of the two hand key points. , And then use the two-dimensional coordinates of the two key points of the hand to calculate the length of the vector.
  • the heat map of each key point of the hand expresses the distribution of the position of the key point of the hand in the heat map, and each pixel on the heat map can be associated with a probability value. It expresses the probability that the key point of the hand is at each pixel.
  • the two-dimensional coordinates of the two key points of the hand are determined by the heat map based on the two key points of the hand, including: for each key point of the hand, the heat map of the key point of the hand can be Determine the pixel with the largest probability value; obtain the coordinates of the pixel with the largest probability value in the heat map to obtain the local two-dimensional coordinates, and convert the local two-dimensional coordinates to the coordinates in the hand image to obtain the two key points of the hand Dimensional coordinates.
  • the heat map is proportional to the hand image, and the coordinates of the key points of the hand in the heat map are multiplied by The scale factor obtains the coordinates of the key points of the hand in the hand image, that is, the two-dimensional coordinates.
  • the vector represents the length and direction of the vector.
  • D m ⁇ A
  • m is the length of the vector.
  • the two-dimensional coordinates of all key points of the hand can be obtained according to the heat map.
  • the vector is the expression of the vector length and the direction vector, and the product of the vector length and the second direction vector is calculated as the vector.
  • the vector D m ⁇ A, m is the length of the vector, and A is the direction vector of the vector.
  • S310 Use the vector to calculate the three-dimensional coordinates of the two key hand points constituting the vector in the world coordinate system.
  • the three-dimensional coordinates of the key points of the wrist in the world coordinate system can be obtained, and the three-dimensional coordinates of the key points of the wrist in the world coordinate system and the vector are used to calculate the two key points of the vector.
  • Three-dimensional coordinates in the world coordinate system can be obtained, and the three-dimensional coordinates of the key points of the wrist in the world coordinate system and the vector are used to calculate the two key points of the vector.
  • the three-dimensional coordinates of the key points of the wrist in the world coordinate system can be obtained from the hand image, that is, the three-dimensional coordinates of the key points of the wrist in the world coordinate system can be obtained from the image of the hand according to the imaging principle of near large and far small.
  • the coordinates of the wrist key points in the world coordinate system are O(X0, Y0, Z0)
  • the vector from the wrist key point O to the key point MCP of each finger is D(X, Y, Z)
  • the three-dimensional coordinates of each key point of the hand can be sequentially calculated according to the physiological structure connection sequence of the key point of the hand and the key point of the wrist. For example, for the little finger in FIG. After the three-dimensional coordinates of the wrist key point O, since the vector B from the wrist key point O to the little finger key point MCP has been obtained, the little finger key point MCP can be obtained by summing the three-dimensional coordinates of the wrist key point O and the vector B.
  • the vector E from the pinky key point MCP to the pinky key point PIP has been obtained, then the three-dimensional coordinates of the pinky key point MCP and the vector E can be used to calculate the three-dimensional coordinates of the pinky key point PIP, and so on until the little finger is calculated The three-dimensional coordinates of the key point TIP.
  • the embodiment of the present invention detects the hand from the acquired original image and cuts out the hand image to be detected, and obtains the heat map of the key points of the hand and the joint bending angle and Euler angle through the heat map model and the three-dimensional information prediction model.
  • the hand structured connection information of the hand, and the first direction vector of the vector formed by the key points of the hand in the hand coordinate system is calculated by the joint bending angle, and the first direction vector is converted into the first direction vector in the world coordinate system through Euler angles.
  • Two-direction vector the two-dimensional coordinates of multiple key points of the hand are obtained through the heat map to calculate the vector length of the vector, and the vector at the multiple key points of the hand is determined by the vector length and the second direction vector, which can then be calculated according to the vector
  • the two models are used to predict the two-dimensional coordinates and the structured connection information of the hand to calculate the three-dimensional coordinates of the key points of the hand. Compared with the three-dimensional coordinates of the key points of the hand directly returned through the deep neural network, each model has a simple structure and a calculation amount.
  • FIG. 5 is a flowchart of a gesture recognition method provided by Embodiment 3 of the present invention.
  • the embodiment of the present invention is applicable to the situation of recognizing gestures based on hand images.
  • the method can be executed by a gesture recognition device, which can be implemented by software. And/or hardware, and integrated in the device that executes the method.
  • the gesture recognition method of the embodiment of the present invention may include the following steps:
  • S501 Acquire a hand image to be recognized.
  • the hand image to be recognized may be an image that needs to recognize a gesture.
  • the hand image may be an image acquired in the scene of a gesture recognition application.
  • the scene of a gesture recognition application may be Gesture-controlled human-computer interaction (VR control), sign language recognition (sign language teaching) and other scenes.
  • hand images can be collected by an image acquisition device, and the images can also be recognized to obtain hand images from the images.
  • the embodiment of the present invention does not impose restrictions on the scene and method of acquiring the hand image.
  • the hand image to be recognized can be input into the pre-trained heat map model to obtain the heat map of the key points of the hand.
  • the heat map contains the two-dimensional coordinates of the key points of the hand; input the heat map and the hand image into the pre-trained three-dimensional information Obtain the structured connection information of the hand in the prediction model; determine the three-dimensional coordinates of the key points of the hand in the world coordinate system according to the structured connection information of the hand and the two-dimensional coordinates in the heat map.
  • detecting the key points of the hand means determining the three-dimensional coordinates of multiple key points of the hand in space.
  • the hand can be detected by the hand key point detection method provided in the first embodiment or the second embodiment of the present invention.
  • the three-dimensional coordinates of the key points of the hand in the three-dimensional space in the image please refer to the first embodiment or the second embodiment, which will not be described in detail here.
  • S503 Recognizing the gesture expressed by the hand in the hand image based on the key point.
  • Gestures are composed when multiple key points of the finger are located at different positions. Different gestures can express different meanings. Gesture recognition is a gesture that can be expressed by recognizing the three-dimensional coordinates of multiple key points of the finger.
  • Fig. 6 is a schematic diagram of the key points of the hand detected during gesture recognition in an embodiment of the present invention.
  • the hand may include 21 key points.
  • the gesture expressed by the hand for example, a hand skeleton image is obtained after connecting multiple key points of the hand, and a hand gesture can be obtained by recognizing the hand skeleton image.
  • Fig. 7 shows the key points of the hand detected in Fig. 6 Schematic representation of the gestures expressed.
  • the gesture recognition method of the embodiment of the present invention acquires the image to be recognized
  • the key point of the hand is detected by the key point detection method of the embodiment of the present invention, and the gesture expressed by the hand in the hand image is recognized based on the key point.
  • Hand key point detection uses two models to predict the two-dimensional coordinates and the structured connection information of the hand to calculate the three-dimensional coordinates of the key points of the hand. It does not need to go through the deep neural network to directly return the three-dimensional coordinates of the key points of the hand.
  • Each model structure Simple, small amount of data calculation, suitable for mobile terminals with limited computing power, and due to the simple model structure, small amount of data calculation, and short detection time for key points of the hand, it realizes the real-time detection of key points of the hand on the mobile terminal, which is beneficial Gesture recognition is applied to mobile terminals.
  • FIG. 8 is a structural block diagram of a hand key point detection device according to the fourth embodiment of the present invention.
  • the hand key point detection device in the embodiment of the present invention may include the following modules: a hand image acquisition module 801, which is configured to acquire The heat map acquisition module 802 is configured to input the hand image into a pre-trained heat map model to obtain a heat map of key points of the hand, and the heat map contains a two-dimensional view of the key points of the hand Coordinates; hand structured connection information acquisition module 803, configured to input the heat map and the hand image into a pre-trained three-dimensional information prediction model to obtain hand structured connection information; three-dimensional coordinate calculation module 804, set to The three-dimensional coordinates of the key points of the hand in the world coordinate system are determined according to the structured connection information of the hand and the two-dimensional coordinates in the heat map.
  • the hand key point detection device provided in the embodiment of the present invention can execute the hand key point detection method provided in the embodiment of the present invention, and has the functional modules and effects corresponding to the execution method.
  • FIG. 9 is a structural block diagram of a gesture recognition device provided by Embodiment 5 of the present invention.
  • the gesture recognition device in this embodiment of the present invention may include the following modules: a hand image acquisition module 901, configured to acquire a hand image to be recognized; key The point detection module 902 is configured to detect key points in the hand image; the gesture recognition module 903 is configured to recognize gestures expressed by the hand in the hand image based on the key points; the key points are based on Detected by the hand key point detection device described in the fourth embodiment.
  • the gesture recognition device provided by the embodiment of the present invention can execute the gesture recognition method provided by the embodiment of the present invention, and has functional modules and effects corresponding to the execution method.
  • the device may include: a processor 1000, a memory 1001, a display screen 1002 with a touch function, an input device 1003, an output device 1004, and a communication device 1005.
  • the number of processors 1000 in the device may be one or more.
  • One processor 1000 is taken as an example in FIG. 10.
  • the number of memories 1001 in the device may be one or more.
  • One memory 1001 is taken as an example in FIG. 10.
  • the processor 1000, the memory 1001, the display screen 1002, the input device 1003, the output device 1004, and the communication device 1005 of the device may be connected by a bus or other means. In FIG. 10, the connection by a bus is taken as an example.
  • the memory 100 can be configured to store software programs, computer-executable programs, and modules, such as the program instructions/modules corresponding to the hand key point detection method described in Embodiment 1 to Embodiment 2 of the present invention (For example, the hand image acquisition module 801, the heat map acquisition module 802, the hand structured connection information acquisition module 803, and the three-dimensional coordinate calculation module 804 in the above-mentioned hand key point detection device), or as described in the third embodiment of the present invention
  • the program instructions/modules corresponding to the aforementioned gesture recognition method for example, the hand image acquisition module 901, the key point detection module 902, and the gesture recognition module 903 in the aforementioned gesture recognition device).
  • the processor 1000 executes multiple functional applications and data processing of the device by running software programs, instructions, and modules stored in the memory 1001, that is, realizing the aforementioned hand key point detection method and/or gesture recognition method.
  • the processor 1000 executes one or more programs stored in the memory 1001, it implements the steps of the hand key point detection method and/or gesture recognition method provided in the embodiment of the present invention.
  • the embodiment of the present invention also provides a computer-readable storage medium.
  • the instructions in the storage medium are executed by the processor of the device, the device can execute the hand key point detection method and/or as described in the above method embodiment. Gesture recognition method.
  • the present disclosure can be implemented by software and necessary general-purpose hardware, or can be implemented by hardware.
  • the present disclosure can be embodied in the form of a software product.
  • the computer software product can be stored in a computer-readable storage medium, such as a computer floppy disk, Read-Only Memory (ROM), and Random Access Memory (Random Access Memory). , RAM), flash memory (FLASH), hard disk or optical disk, etc., including multiple instructions to make a computer device (which can be a robot, a personal computer, a server, or a network device, etc.) execute the hand described in any embodiment of the present disclosure Key point detection method and/or gesture recognition method.
  • a computer device which can be a robot, a personal computer, a server, or a network device, etc.
  • the multiple units and modules included are only divided according to functional logic, but are not limited to the above division, as long as the corresponding functions can be realized;
  • the names of multiple functional units are only used to distinguish each other, and are not used to limit the protection scope of the present disclosure.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Psychiatry (AREA)
  • Social Psychology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • User Interface Of Digital Computer (AREA)
  • Image Analysis (AREA)

Abstract

一种手部关键点检测方法、手势识别方法及相关装置,手部关键点检测方法包括:获取待检测的手部图像(S101);将手部图像输入预先训练的热力图模型中得到手部关键点的热力图,所述热力图包含所述手部关键点的二维坐标(S102);将热力图和手部图像输入预先训练的三维信息预测模型中获得手部结构化连接信息(S103);根据手部结构化连接信息和热力图中的二维坐标确定手部关键点在世界坐标系下的三维坐标(S104)。

Description

手部关键点检测方法、手势识别方法及相关装置
本申请要求在2019年11月29日提交中国专利局、申请号为201911198688.8的中国专利申请的优先权,该申请的全部内容通过引用结合在本申请中。
技术领域
本公开涉及计算机视觉技术领域,例如涉及手部关键点检测方法、手部关键点检测装置、手势识别方法、手势识别装置、设备和存储介质。
背景技术
在计算机视觉领域中,手势识别被广泛应用于人机交互、手语识别等场景中,而手势识别依赖于手部关键点检测,并且随着移动终端和移动互联网的普及,手势识别也广泛应用于移动终端中。
手部关键点是指手部中的多个关节点,相关技术中手部关键点检测最常用的方法是使用深度卷积神经网络,通过深度卷积神经网络输出手部关键点的三维坐标,例如,使用包含多个卷积层和全连接层的深度卷积神经网络提取二维手部图像的图像特征后,通过全连接层回归手部关键点的三维坐标,此种方式深度卷积神经网络复杂、数据计算量大,然而,受限于移动终端的计算能力,上述通过深度卷积神经网络直接回归手部关键点三维坐标的方式应用于移动终端后,计算时间长,难以通过移动终端实时地检测手部关键点,限制了手势识别在移动终端的应用。
发明内容
本公开提供一种手部关键点检测方法、手部关键点检测装置、手势识别方法、手势识别装置、设备和存储介质,以解决相关技术中手部关键点检测方法应用于移动终端后存在计算时间长、实时性差、限制了手势识别在移动终端应用的问题。
提供了一种手部关键点检测方法,包括:
获取待检测的手部图像;
将所述手部图像输入预先训练的热力图模型中得到手部关键点的热力图,所述热力图包含所述手部关键点的二维坐标;
将所述热力图和所述手部图像输入预先训练的三维信息预测模型中获得手部结构化连接信息;
根据所述手部结构化连接信息和所述热力图中的二维坐标确定所述手部关键点在世界坐标系下的三维坐标。
还提供了一种手势识别方法,包括:
获取待识别的手部图像;
检测出所述手部图像中的关键点;
基于所述关键点识别所述手部图像中手部所表达的手势;
其中,所述检测出所述手部图像中的关键点包括:根据本公开所述的手部关键点检测方法检测出所述手部图像中的关键点。
还提供了一种手部关键点检测装置,包括:
手部图像获取模块,设置为获取待检测的手部图像;
热力图获取模块,设置为将所述手部图像输入预先训练的热力图模型中得到手部关键点的热力图,所述热力图包含所述手部关键点的二维坐标;
手部结构化连接信息获取模块,设置为将所述热力图和所述手部图像输入预先训练的三维信息预测模型中获得手部结构化连接信息;
三维坐标计算模块,设置为根据所述手部结构化连接信息和所述热力图中的二维坐标确定所述手部关键点在世界坐标系下的三维坐标。
还提供了一种手势识别装置,包括:
手部图像获取模块,设置为获取待识别的手部图像;
关键点检测模块,设置为检测出所述手部图像中的关键点;
手势识别模块,设置为基于所述关键点识别所述手部图像中手部所表达的手势;
其中,所述关键点检测模块是设置为根据本公开所述的手部关键点检测装置检测出所述手部图像中的关键点。
还提供了一种设备,所述设备包括:
一个或多个处理器;
存储装置,设置为存储一个或多个程序,
当所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现本公开所述的手部关键点检测方法和/或手势识别方法。
还提供了一种计算机可读存储介质,其上存储有计算机程序,该计算机程序被处理器执行时实现本公开所述的手部关键点检测方法和/或手势识别方法。
附图说明
图1是本发明实施例一提供的一种手部关键点检测方法的流程图;
图2是本发明实施例的手部关键点的示意图;
图3是本发明实施例二提供的一种手部关键点检测方法的流程图;
图4是本发明实施例中手部坐标系和世界坐标系的示意图;
图5是本发明实施例三提供的一种手势识别方法的流程图;
图6是本发明实施例中手势识别时检测到的手部关键点的示意图;
图7是图6中的手部关键点所表达的手势的示意图;
图8是本发明实施例四提供的一种手部关键点检测装置的结构框图;
图9是本发明实施例五提供的一种手势识别装置的结构框图;
图10是本发明实施例六提供的一种设备的结构框图。
具体实施方式
下面结合附图和实施例对本公开进行说明。为了便于描述,附图中仅示出了与本公开相关的部分而非全部结构。
实施例一
图1为本发明实施例一提供的一种手部关键点检测方法的流程图,本发明实施例可适用于检测手部关键点的情况,该方法可以由手部关键点检测装置来执行,该装置可以通过软件和/或硬件的方式来实现,并集成在执行本方法的设备中,如图1所示,本发明实施例的手部关键点检测方法可以包括如下步骤:
S101、获取待检测的手部图像。
在本发明实施例中,待检测的手部图像可以是需要检测出手部关键点的三维坐标的图像,该手部图像可以是位图(bitmap,bmp)、联合图像专家组(Joint Photographic Experts Group,jpg)、便携式网络图形(Portable Network Graphics,png)、标签图像文件格式(Tag Image File Format,tif)等存储格式的、包含手部生理特征的图像,另外,手部图像可以是彩色图像。
在实际应用中,可以在手势识别应用的场景中获取手部图像,该手势识别应用的场景可以是通过手势控制的人机交互(虚拟现实(Virtual Reality,VR)应用)、手语识别(直播手语教学)等场景,在上述场景中,可以通过图像采集装置采集手部图像,还可以对图像进行检测以从图像中获得手部图像,本发明实施例对获取手部图像的场景和方式不加以限制。
S102、将所述手部图像输入预先训练的热力图模型中得到手部关键点的热力图,所述热力图包含所述手部关键点的二维坐标。
在本发明实施例中,热力图可以是以特殊高亮的形式显示手部关键点所属区域的图像,热力图上一个位置关联的值为手部关键点在该位置上的概率,一个位置上的概率越大,该位置离热力图上高斯核的中心越近,因此高斯核的中心即为概率最大的位姿,也就是手部关键点的位置。
可以预先训练热力图模型,该热力图模型可以输出手部关键点的热力图,热力图模型可以由一种或多种神经网络得到,例如,可以使用深度卷积神经网络训练热力图模型,比如,对于已知手部关键点的二维坐标的手部图像,首先,使用手部关键点的二维坐标生成一个高斯核,该高斯核即为热力图的高斯核,训练时,将该手部图像输入深度卷积神经网络输出一热力图,采用该输出的热力图中的高斯核与之前生成的高斯核计算损失率,再对深度卷积神经网络的参数进行调整,不断迭代该深度卷积神经网络,直到损失率小于预设值或者达到预设的迭代次数后停止迭代,最终得到的深度卷积神经网络即为热力图模型,向该热力图模型输入一个手部图像后,可以获得手部的多个手部关键点的热力图,该热力图中高斯核的中心即为手部关键点所在位置,即高斯核中心的坐标即为手部关键点的二维坐标。
S103、将所述热力图和所述手部图像输入预先训练的三维信息预测模型中获得手部结构化连接信息。
如图2所示为手部关键点的示意图,手部关键点可以包括手腕关键点O以及每根手指上的多个关键点(关键点MCP、PIP、DIP和TIP),如图2所示,手腕关键点为点O,每个手指包括有点MCP、PIP、DIP和TIP四个关键点,手腕关键点和多个手指的关键点构成手部的手部关键点。
在本发明实施例中,手部结构化连接信息可以包括手部的欧拉角以及多个手部关键点所形成的关节弯曲角度,手部的欧拉角可以是手部坐标系相对于世界坐标系的表示,表达了手部在三维空间中的位姿;手部关键点所形成的关节弯曲角度可以包括相邻两个手指的关键点MCP到手腕关键点O的连线形成的夹角a,还包括关键点MCP处的夹角b,关键点PIP处的夹角c,关键点DIP处的夹角d。
在本发明实施例中,可以先将通过人工标注的方法标注的手部的欧拉角、通过传感器获得的手部关节处的角度以及热力图模型预测的多个手部关键点的热力图作为训练数据来训练三维信息预测模型,该三维信息预测模型可以是多种神经网络。将热力图和手部图像输入训练好的三维信息预测模型后,可以获得包括手部的欧拉角以及多个手部关键点所形成的关节弯曲角度的手部结构化 连接信息。
S104、根据所述手部结构化连接信息和所述热力图中的二维坐标确定所述手部关键点在世界坐标系下的三维坐标。
本发明实施例通过三维信息预测模型获得手部结构化连接信息后,可以通过该手部结构化连接信息确定相邻两个手部关键点所形成的、在手部坐标系下的向量的方向向量,然后通过欧拉角将该方向向量转换为世界坐标系下的方向向量,同时通过热力图可以获得每个手部关键点的二维坐标,以此可以求得两个手部关键点形成的向量的向量长度,知道向量的向量长度和方向向量后可以确定该向量,同时,可以通过手部图像的成像原理获得手腕关键点在世界坐标系下的三维坐标,根据向量加法原理,可以获得每个手部关键点在世界坐标系下的三维坐标。
本发明实施例在获取待检测的手部图像后,先通过热力图模型得到包含手部关键点的二维坐标的热力图,再通过三维信息预测模型获得手部结构化连接信息,最后通过二维坐标和手部结构化连接信息计算手部关键点的三维坐标,相对于直接通过深度神经网络回归手部关键点的三维坐标,通过两个模型先后预测二维坐标和手部结构化连接信息以计算手部关键点的三维坐标,每个模型结构简单,计算量小,适用于计算能力有限的移动终端,并且由于模型结构简单、计算量小,手部关键点的检测时间短,实现了在移动终端实时检测手部关键点,有利于手势识别应用于移动终端上。
实施例二
图3为本发明实施例二提供的一种手部关键点检测方法的流程图,本发明实施例在前述实施例一的基础上进行说明,如图3所示,本发明实施例的手部关键点检测方法可以包括如下步骤:
S301、获取原始图像。
在本发明实施例中,原始图像可以是包括手部的图像,例如,可以是包含整个人体的图像或者是包含手臂和手掌在内的图像,该原始图像可以是图像采集装置采集到的图像,例如可以是直播过程中通过摄像头采集到的图像,或者是从多媒体视频数据中提取到的图像。
S302、从所述原始图像中检测出手部。
手部可以是从手腕到手指末端的部分,在本发明实施例中可以通过手部检测算法从原始图像中检测出手部,例如,可以通过语义分割网络检测出原始图像中的手部,还可以通过其他方式从原始图像中检测出手部,本发明实施例对从原始图像中检测出手部的方式不加以限制。
S303、截取包含所述手部的、预设尺寸的图像作为待检测的手部图像。
在实际应用中,手部关键点检测时一般假设原始图像中的手部位于手部检测器中,并通过手部检测器来检测手部关键点,可以先从原始图像中截取包含手部的图像区域作为手部图像,该图像区域可以是预设尺寸的区域,例如,该图像区域的形状可以为正方形,并且将该正方形区域缩放为64×64的尺寸大小,对于每张手部图像均为64×64×3的三维张量,64×64是手部图像的尺寸大小,3是二维图像的RGB通道。
本发明实施例从原始图像中截取包含手部的、预设尺寸的图像作为待检测的手部图像,使得该手部图像所包含的背景减少,后续模型处理更关注于手部本身的特征,降低了需要处理的数据量,能够提高手部关键点检测的效率。
S304、将所述手部图像输入预先训练的热力图模型中获得每一个手部关键点的热力图,每一个手部关键点的热力图的尺寸与所述手部图像的尺寸相同。
本发明实施例的热力图模型可以预先训练,该热力图模型可以输出手部关键点的热力图,热力图模型可以由一种或多种神经网络得到,例如,可以使用深度卷积神经网络训练热力图模型,该热力图模型输入一个手部图像后,可以获得手部的多个手部关键点的热力图,该热力图中高斯核的中心即为手部关键点所在位置,高斯核的中心的坐标即为手部关键点的二维坐标。
本发明实施例中,手部图像经缩放后可以为64×64×3的三维张量,将该三维张量输入热力图模型后,该热力图模型实际为深度神经网络,通过深度神经网络提取图像特征,最后输出所有手部关键点的热力图,如图2所示,手部关键点总共有20个,则热力图模型输出20个热力图,每个热力图的尺寸大小与手部图像相同,即热力图的尺寸大小也为64×64。
S305、将所有手部关键点的热力图和所述手部图像输入预先训练的三维信息预测模型中,以获得所述手部关键点所形成的关节弯曲角度和所述手部的欧拉角。
在本发明实施例中,可以通过已知手部的欧拉角、手部关节处的角度以及热力图模型预测的多个手部关键点的热力图来训练三维信息预测模型,该三维信息预测模型在输入手部关键点的热力图和手部图像后可以输出手部的欧拉角和多个手部关键点的关节弯曲所形成的夹角的角度。
本发明实施例中,手部关键点的热力图和手部图像的大小尺寸相同,手部图像为64×64×3的三维张量,手部关键点有20个,所有手部关键点的热力图可以表示为一个64×64×20的三维张量,以上两个张量连接形成一个64×64×23的张量输入到训练好的三维信息预测模型中,得到多个手部关键点所形成的关节 弯曲角度和手部的欧拉角。
S306、根据所述关节弯曲角度计算在所述手部的手部坐标系下两个手部关键点所构成的向量的第一方向向量。
向量为具有大小和方向的量,本发明实施例中,任意两个手部关键点均可以构成一个向量,该向量的大小为两个手部关键点之间的距离,向量的方向为两个手部关键点连线的方向,如图2所示的向量B为手腕关键点O到小指的近指骨关键点MCP所形成的向量,基于此,可以根据预测得到的关节弯曲角度计算在手部的手部坐标系下两个手部关键点所构成的向量的第一方向向量,可以包括如下步骤:
S3061、基于预先建立的手部模型确定手腕关键点到中指的近指骨关键点的向量的第一方向向量。
建立手部模型为:假设手腕关键点O和所有手指的关键点MCP在三维空间中共面,假设手腕关键点O和每个手指的关键点MCP,PIP,DIP,TIP五点在三维空间中共面且平行于手掌所在的平面,由于手部骨架的限制,每个手指的关节只能做一些弯曲和伸展的动作,所以手腕关键点O以外的多个手指的关键点总是共面的,为简化问题,假设手腕关键点O也和每个手指的关键点共面。
基于上述手部模型,建立手部坐标系如下(如图2):
以手腕关键点O到中指的关键点MCP(近指骨关键点)的方向为y轴正方向建立y轴,可知y轴位于手掌所在的平面上,在此平面上,以垂直于y轴,大拇指侧的方向为x轴正方向,建立x轴;以垂直于xy平面,手背朝向为z轴正方向,建立z轴。
根据上述建立的手部坐标系,即可得到手腕关键点O到中指的关键点MCP构成的向量C的第一方向向量为(0,1,0)。
S3062、采用所述关节弯曲角度、所述手腕关键点到中指的近指骨关键点的向量的第一方向向量分别计算所述手腕关键点到每个手指的近指骨关键点的向量的第一方向向量。
向量的方向和向量长度无关,已知一个向量的方向向量后,另一向量的方向向量可以通过已知方向向量的向量旋转一定的角度所得,对于手腕关键点到每个手指的近指骨关键点MCP形成的向量而言,可以通过手腕关键点到中指的近指骨关键点的向量的第一方向向量和预测得到的关节弯曲角度求得。
如图2所示,对于手腕关键点O到无名指的近指骨关键点MCP所形成的向量D,向量D的第一方向向量可以通过向量C以及夹角θ求得,即向量C的方向向量旋转角度θ即可以得到向量D的第一方向向量,即向量D的第一方向向 量为sinθ,cosθ,0)。
同理,手腕关键点O到其他手指的近指骨关键点MCP所形成的向量的第一方向向量均可以通过相邻的向量旋转一定的夹角所得,该夹角为三维信息预测模型预测得到关节弯曲处的角度,如图2中的θ、a。通过步骤S3062后,可以得到手腕关键点O分别到小指、无名指、中指、食指以及拇指的关键点MCP的向量的第一方向向量。
S3063、针对每个手指,采用所述关节弯曲角度、所述手腕关键点到所述每个手指的近指骨关键点的向量的第一方向向量计算所述手指的每个指骨连接的两个关键点之间的向量的第一方向向量。
如图2所示,对于每个手指,获得该手腕关键点O到该手指的近指骨关键点MCP的向量的第一方向向量后,可以采用手腕关键点O到该手指的近指骨关键点MCP的向量的第一方向向量、三维信息预测模型预测的多个关节弯曲角度计算该手指的每个指骨连接的两个关键点之间的向量的第一方向向量。
如图2所示,以小指为示例,在前述S3062已经计算出手腕关键点O到小指近指骨关键点MCP的向量B的第一方向向量,并且通过三维信息预测模型可以得到小指的关键点MCP、PIP、DIP处关节弯曲夹角的角度分别为b、c、d,则在小指中,关键点MCP与关键点PIP所形成的向量E与向量B的夹角为b,关键点PIP与关键点DIP所形成的向量F与向量B的夹角为夹角b和夹角c的和,关键点DIP与关键点TIP所形成的向量G与向量B的夹角为夹角b、夹角c和夹角d的和,当知道每个向量与手腕关键点O到近指骨关键点MCP的夹角后,可以通过旋转手腕关键点O到手指近指骨关键点MCP所形成的向量B得到每个向量的第一方向向量。
以上以小指为示例说明了小指上多个关键点形成的向量的第一方向向量的计算方式,对于其他手指计算多个关键点形成的向量的第一方向向量的方式相同,在此不再详述。
S307、采用所述欧拉角将所述第一方向向量转换为世界坐标系下的第二方向向量。
在实际应用中,每个向量的第一方向向量为在手部坐标系下的方向向量,由于手部在空间中具有一定的位姿,需要将每个向量的第一方向向量转换为世界坐标系下的方向向量,即第二方向向量,可以采用欧拉角计算欧拉旋转矩阵,计算第一方向向量和欧拉旋转矩阵的乘积得到第一方向向量在世界坐标系下的第二方向向量。
图4是本发明实施例中手部坐标系和世界坐标系的示意图,如图4所示, 欧拉角可以用三个夹角α,β,γ来表示,坐标系xyz为手部坐标系,XYZ为世界坐标系,x轴和N轴之间的夹角为α,z轴和Z轴之间的夹角为β,N轴和X轴之间的夹角为γ,N轴是x轴在绕z轴旋转后的位置。
假设一个手掌平行于x-y平面且中指关节到手腕关键点处的直线L垂直于x轴,平行于y轴,此时手部的状态为初始状态。初始状态的手部经过欧拉角的旋转可以得到当前三维空间中手部的状态,即手部在世界坐标系下的位姿。在手部旋转的过程中,手部坐标系随着手部的旋转同时旋转,手部关键点在手部坐标系下的坐标不变,而手部关键点在世界坐标系下的坐标改变,手部的旋转的过程可以如下:先绕z轴旋转角度α,再绕N轴旋转角度β,最后绕Y轴旋转角度γ,即可得到当前手部在世界坐标系中的状态。
在本发明实施例中,欧拉旋转矩阵表达了向量从手部坐标系到世界坐标系下的转换关系,欧拉旋转矩阵如下:
Figure PCTCN2020107960-appb-000001
对于每个向量的第一方向向量,其第一方向向量与上述欧拉旋转矩阵相乘后即可以的到该向量在世界坐标系下的第二方向向量。
S308、采用所述热力图中的二维坐标计算所述向量的向量长度。
向量由两个手部关键点构成,对于每个向量可以确定构成该向量的两个手部关键点,基于两个手部关键点的热力图分别确定该两个手部关键点的二维坐标,然后采用两个手部关键点的二维坐标计算向量的长度。
在本发明实施例中,每个手部关键点的热力图表达了手部关键点在该热力图的位置的分布,该热力图上的每个像素点均可以关联一个概率值,该概率值表达了手部关键点在所述每个像素点的概率。因此,所述基于两个手部关键点的热力图分别确定所述两个手部关键点的二维坐标,包括:对于每个手部关键点,可以从该手部关键点的热力图上确定概率值最大的像素点;获取该概率值最大的像素点在热力图中的坐标得到局部二维坐标,将该局部二维坐标转换为手部图像中的坐标得到该手部关键点的二维坐标。亦即,先确定手部关键点在热力图中的坐标(概率值最大的像素点的位置),同时热力图和手部图像成比例的关系,手部关键点在热力图中的坐标乘上比例系数即得到手部关键点在手部图像中的坐标,亦即二维坐标。
如图2所示,向量为向量长度和向量方向的表示,假设手腕关键点到无名指关键点MCP的向量是D=(X,Y,Z),已知向量D的第二方向向量为A=(x, y,z),所以D=m×A,m为向量长度。把第二方向向量A投影到x-y平面上,也就是方向向量B=(x,y,0)。
同时,根据热力图可以求出所有手部关键点的二维坐标,该二维坐标也是手部关键点在x-y平面上的投影,由于向量D在x-y平面上的投影为C=(X,Y,0),即C=m×B,该投影C中的坐标X,Y即为热力图所求得的二维坐标,即已知B和C,可以求出向量长度m,同时已知方向向量A,可以求出向量D。
S309、计算所述向量长度和所述第二方向向量的乘积以得到所述向量。
即向量为向量长度和方向向量的表示,计算向量长度和第二方向向量的乘积即为向量,如向量D=m×A,m为向量长度,A为向量的方向向量。
S310、采用所述向量计算构成所述向量的两个手部关键点在所述世界坐标系下的三维坐标。
在本发明实施例中,可以获取手部关键点中手腕关键点在世界坐标系下的三维坐标,采用手腕关键点在世界坐标系下的三维坐标和向量计算构成向量的两个手部关键点在世界坐标系下的三维坐标。
手腕关键点在世界坐标系下的三维坐标可以通过手部图像获取,即可以根据近大远小的成像原理从手部图像获得到手腕关键点在世界坐标系下的三维坐标。
假设手腕关键点在世界坐标系下的坐标为O(X0,Y0,Z0),并且已知手腕关键点O到每个手指的关键点MCP的向量为D(X,Y,Z),则关键点MCP的三维坐标为(X0,Y0,Z0)+(X,Y,Z)=(X0+X,Y0+Y,Z0+Z),即在构成向量的两个手部关键点中,已知一个手部关键点的三维坐标和该向量,可以通过向量求和得到另一个手部关键点的三维坐标。
本发明实施例中,对于每个手指,可以根据该手指上手部关键点与手腕关键点的生理结构连接顺序依次计算每个手部关键点的三维坐标,例如对于图2中的小指,在获得手腕关键点O的三维坐标后,由于前述已获得手腕关键点O到小指关键点MCP的向量B,则可以通过手腕关键点O的三维坐标和向量B通过向量求和的方式得到小指关键点MCP的三维坐标,同时前述已获得小指关键点MCP到小指关键点PIP的向量E,则可以通过小指关键点MCP的三维坐标和向量E计算小指关键点PIP的三维坐标,以此类推直到计算出小指关键点TIP的三维坐标。
本发明实施例从获取的原始图像中检测出手部并截取出待检测的手部图像,通过热力图模型和三维信息预测模型分别获得手部关键点的热力图和包含关节弯曲角度和欧拉角的手部结构化连接信息,并且通过关节弯曲角度计算手 部关键点构成的向量在手部坐标系下的第一方向向量,通过欧拉角将第一方向向量转换为世界坐标系下的第二方向向量,通过热力图获得多个手部关键点的二维坐标以计算向量的向量长度,通过向量长度和第二方向向量确定多个手部关键点处的向量,进而能够根据向量计算形成向量的手部关键点在世界坐标系下的三维坐标。通过两个模型先后预测二维坐标和手部结构化连接信息以计算手部关键点的三维坐标,相对于直接通过深度神经网络回归手部关键点的三维坐标,每个模型结构简单,计算量小,适用于计算能力有限的移动终端,并且由于模型结构简单、计算量小,手部关键点的检测时间短,实现了在移动终端实时检测手部关键点,有利于手势识别应用于移动终端上。
实施例三
图5为本发明实施例三提供的一种手势识别方法的流程图,本发明实施例可适用于基于手部图像识别手势的情况,该方法可以由手势识别装置来执行,该装置可以通过软件和/或硬件的方式来实现,并集成在执行本方法的设备中,如图5所示,本发明实施例的手势识别方法可以包括如下步骤:
S501、获取待识别的手部图像。
在发明实施例中,待识别的手部图像可以是需要识别出手势的图像,该手部图像可以是在手势识别应用的场景中获取的图像,可选地,手势识别应用的场景可以是通过手势控制的人机交互(VR控制)、手语识别(手语教学)等场景,在上述场景中,可以通过图像采集装置采集手部图像,还可以对图像进行识别以从图像中获得手部图像,本发明实施例对获取手部图像的场景和方式不加以限制。
S502、检测出所述手部图像中的关键点。
可以将待识别的手部图像输入预先训练的热力图模型中得到手部关键点的热力图,热力图包含手部关键点的二维坐标;将热力图和手部图像输入预先训练的三维信息预测模型中获得手部结构化连接信息;根据手部结构化连接信息和热力图中的二维坐标确定手部关键点在世界坐标系下的三维坐标。
在本发明实施例中,检测手部关键点即确定手部的多个关键点在空间中的三维坐标,可以通过本发明实施例一或者实施例二所提供的手部关键点检测方法检测出手部图像中手部关键点在三维空间中的三维坐标,可参考实施例一或者实施例二,在此不再详述。
S503、基于所述关键点识别所述手部图像中手部所表达的手势。
手势即为手指的多个关键点位于不同位置时所组成,不同的手势可以表达不同的意思,手势识别即识别手指的多个关键点的三维坐标所能表达的手势。
图6是本发明实施例中手势识别时检测到的手部关键点的示意图,如图6所示,手部可以包括21个关键点,在获取到该21个关键点的三维坐标后,可以基于三维坐标识别手部图像中手部所表达的手势,在本发明的一个示例中,可以按照手部的骨骼结构连接多个手部关键点,并根据多个手部关键点的三维坐标识别该手部所表达的手势,例如,连接多个手部关键点后得到一手部骨骼图像,可以对该手部骨骼图像识别得到一手势,图7为图6中所检测到的手部关键点所表达的手势的示意图。
本发明实施例的手势识别方法在获取待识别图像后,通过本发明实施例的手部关键点检测方法检测出手部关键点,并基于关键点识别手部图像中手部所表达的手势,由于手部关键点检测通过两个模型先后预测二维坐标和手部结构化连接信息以计算手部关键点的三维坐标,无需通过深度神经网络直接回归手部关键点的三维坐标,每个模型结构简单,数据计算量小,适用于计算能力有限的移动终端,并且由于模型结构简单、数据计算量小,手部关键点的检测时间短,实现了在移动终端实时检测手部关键点,有利于手势识别应用于移动终端上。
实施例四
图8是本发明实施例四提供的一种手部关键点检测装置的结构框图,本发明实施例的手部关键点检测装置可以包括如下模块:手部图像获取模块801,设置为获取待检测的手部图像;热力图获取模块802,设置为将所述手部图像输入预先训练的热力图模型中得到手部关键点的热力图,所述热力图包含所述手部关键点的二维坐标;手部结构化连接信息获取模块803,设置为将所述热力图和所述手部图像输入预先训练的三维信息预测模型中获得手部结构化连接信息;三维坐标计算模块804,设置为根据所述手部结构化连接信息和所述热力图中的二维坐标确定所述手部关键点在世界坐标系下的三维坐标。
本发明实施例所提供的手部关键点检测装置可执行本发明实施例所提供的手部关键点检测方法,具备执行方法相应的功能模块和效果。
实施例五
图9是本发明实施例五提供的一种手势识别装置的结构框图,本发明实施例的手势识别装置可以包括如下模块:手部图像获取模块901,设置为获取待识别的手部图像;关键点检测模块902,设置为检测出所述手部图像中的关键点;手势识别模块903,设置为基于所述关键点识别所述手部图像中手部所表达的手势;所述关键点根据实施例四所述的手部关键点检测装置所检测。
本发明实施例所提供的手势识别装置可执行本发明实施例所提供的手势识 别方法,具备执行方法相应的功能模块和效果。
实施例六
参照图10,示出了本发明一个示例中的一种设备的结构示意图。如图10所示,该设备可以包括:处理器1000、存储器1001、具有触摸功能的显示屏1002、输入装置1003、输出装置1004以及通信装置1005。该设备中处理器1000的数量可以是一个或者多个,图10中以一个处理器1000为例。该设备中存储器1001的数量可以是一个或者多个,图10中以一个存储器1001为例。该设备的处理器1000、存储器1001、显示屏1002、输入装置1003、输出装置1004以及通信装置1005可以通过总线或者其他方式连接,图10中以通过总线连接为例。
存储器1001作为一种计算机可读存储介质,可设置为存储软件程序、计算机可执行程序以及模块,如本发明实施例一到实施例二所述的手部关键点检测方法对应的程序指令/模块(例如,上述手部关键点检测装置中的手部图像获取模块801、热力图获取模块802、手部结构化连接信息获取模块803和三维坐标计算模块804),或如本发明实施例三所述的手势识别方法对应的程序指令/模块(例如,上述手势识别装置中的手部图像采集模块901、关键点检测模块902和手势识别模块903)。
处理器1000通过运行存储在存储器1001中的软件程序、指令以及模块,从而执行设备的多种功能应用以及数据处理,即实现上述手部关键点检测方法和/或手势识别方法。
实施例中,处理器1000执行存储器1001中存储的一个或多个程序时,实现本发明实施例提供的手部关键点检测方法和/或手势识别方法的步骤。
本发明实施例还提供一种计算机可读存储介质,所述存储介质中的指令由设备的处理器执行时,使得设备能够执行如上述方法实施例所述的手部关键点检测方法和/或手势识别方法。
对于装置、设备、存储介质实施例而言,由于其与方法实施例基本相似,所以描述的比较简单,相关之处参见方法实施例的部分说明即可。
通过以上关于实施方式的描述,所属领域的技术人员可以清楚地了解到,本公开可借助软件及必需的通用硬件来实现,也可以通过硬件实现。本公开可以以软件产品的形式体现出来,该计算机软件产品可以存储在计算机可读存储介质中,如计算机的软盘、只读存储器(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、闪存(FLASH)、硬盘或光盘等,包括多条指令用以使得一台计算机设备(可以是机器人,个人计算机,服务器,或者网络设备等)执行本公开任意实施例所述的手部关键点检测方法和/或手势 识别方法。
上述手部关键点检测装置和/或手势识别装置中,所包括的多个单元和模块只是按照功能逻辑进行划分的,但并不局限于上述的划分,只要能够实现相应的功能即可;另外,多个功能单元的名称也只是为了便于相互区分,并不用于限制本公开的保护范围。

Claims (15)

  1. 一种手部关键点的检测方法,包括:
    获取待检测的手部图像;
    将所述手部图像输入预先训练的热力图模型中得到手部关键点的热力图,所述热力图包含所述手部关键点的二维坐标;
    将所述热力图和所述手部图像输入预先训练的三维信息预测模型中获得手部结构化连接信息;
    根据所述手部结构化连接信息和所述热力图中的二维坐标确定所述手部关键点在世界坐标系下的三维坐标。
  2. 根据权利要求1所述的检测方法,其中,所述获取待检测的手部图像,包括:
    获取原始图像;
    从所述原始图像中检测出手部;
    截取包含所述手部的、预设尺寸的图像作为待检测的手部图像。
  3. 根据权利要求1所述的检测方法,其中,将所述手部图像输入预先训练的热力图模型中得到手部关键点的热力图,包括:
    将所述手部图像输入预先训练的热力图模型中获得每一个手部关键点的热力图,其中,每一个手部关键点的热力图的尺寸与所述手部图像的尺寸相同。
  4. 根据权利要求1所述的检测方法,其中,所述热力图包括每一个手部关键点的热力图,所述将所述热力图和所述手部图像输入预先训练的三维信息预测模型中获得手部结构化连接信息,包括:
    将所有手部关键点的热力图和所述手部图像输入预先训练的三维信息预测模型中,以获得所述手部关键点所形成的关节弯曲角度和所述手部的欧拉角。
  5. 根据权利要求1-4任一项所述的检测方法,其中,所述手部结构化连接信息包括所述手部关键点所形成的关节弯曲角度和所述手部的欧拉角,所述根据所述手部结构化连接信息和所述热力图中的二维坐标确定所述手部关键点在世界坐标系下的三维坐标,包括:
    根据所述关节弯曲角度计算在所述手部的手部坐标系下两个手部关键点所构成的向量的第一方向向量;
    采用所述欧拉角将所述第一方向向量转换为世界坐标系下的第二方向向量;
    采用所述热力图中的二维坐标计算所述向量的向量长度;
    计算所述向量长度和所述第二方向向量的乘积以得到所述向量;
    采用所述向量计算构成所述向量的所述两个手部关键点在所述世界坐标系下的三维坐标。
  6. 根据权利要求5所述的检测方法,其中,所述根据所述关节弯曲角度计算在所述手部的手部坐标系下两个关键点之间构成的向量的第一方向向量,包括:
    基于预先建立的手部模型确定手腕关键点到中指的近指骨关键点的向量的第一方向向量;
    采用所述关节弯曲角度、所述手腕关键点到中指的近指骨关键点的向量的第一方向向量分别计算所述手腕关键点到每个手指的近指骨关键点的向量的第一方向向量;
    采用所述关节弯曲角度、所述手腕关键点到所述每个手指的近指骨关键点的向量的第一方向向量计算所述每个手指的每个指骨连接的两个关键点之间的向量的第一方向向量。
  7. 根据权利要求5所述的检测方法,其中,所述采用所述欧拉角将所述第一方向向量转换为所述世界坐标系下的第二方向向量,包括:
    采用所述欧拉角计算欧拉旋转矩阵;
    计算所述第一方向向量和所述欧拉旋转矩阵的乘积得到所述第一方向向量在所述世界坐标系下的第二方向向量。
  8. 根据权利要求5所述的检测方法,其中,所述采用所述热力图中的二维坐标计算所述向量的向量长度,包括:
    确定构成所述向量的两个手部关键点;
    基于所述两个手部关键点的热力图分别确定所述两个手部关键点的二维坐标;
    采用所述两个手部关键点的二维坐标计算所述向量的长度。
  9. 根据权利要求8所述的检测方法,其中,所述热力图上的每个像素点关联一个概率值,所述概率值表达了所述手部关键点在所述每个像素点处的概率,所述基于两个手部关键点的热力图分别确定所述两个手部关键点的二维坐标,包括:
    从每个手部关键点的热力图上确定概率值最大的像素点;
    获取所述概率值最大的像素点在所述热力图中的坐标得到局部二维坐标;
    将所述局部二维坐标转换为所述手部图像中的坐标得到所述每个手部关键点的二维坐标。
  10. 根据权利要求5所述的检测方法,其中,所述采用所述向量计算构成所述向量的所述两个手部关键点在世界坐标系下的三维坐标,包括:
    获取所述手部关键点中手腕关键点在所述世界坐标系下的三维坐标;
    采用所述手腕关键点在所述世界坐标系下的三维坐标和所述向量计算构成所述向量的所述两个手部关键点在所述世界坐标系下的三维坐标。
  11. 一种手势识别方法,包括:
    获取待识别的手部图像;
    检测出所述手部图像中的关键点;
    基于所述关键点识别所述手部图像中手部所表达的手势;
    其中,所述检测出所述手部图像中的关键点包括:根据权利要求1-10任一项所述的手部关键点检测方法检测出所述手部图像中的关键点。
  12. 一种手部关键点的检测装置,包括:
    手部图像获取模块,设置为获取待检测的手部图像;
    热力图获取模块,设置为将所述手部图像输入预先训练的热力图模型中得到手部关键点的热力图,所述热力图包含所述手部关键点的二维坐标;
    手部结构化连接信息获取模块,设置为将所述热力图和所述手部图像输入预先训练的三维信息预测模型中获得手部结构化连接信息;
    三维坐标计算模块,设置为根据所述手部结构化连接信息和所述热力图中的二维坐标确定所述手部关键点在世界坐标系下的三维坐标。
  13. 一种手势识别装置,包括:
    手部图像获取模块,设置为获取待识别的手部图像;
    关键点检测模块,设置为检测出所述手部图像中的关键点;
    手势识别模块,设置为基于所述关键点识别所述手部图像中手部所表达的手势;
    其中,所述关键点检测模块是设置为根据权利要求12所述的手部关键点检测装置检测出所述手部图像中的关键点。
  14. 一种设备,包括:
    至少一个处理器;
    存储装置,设置为存储至少一个程序,
    当所述至少一个程序被所述至少一个处理器执行,使得所述至少一个处理器实现如权利要求1-10中任一所述的手部关键点检测方法和如权利要求11所述的手势识别方法中的至少之一。
  15. 一种计算机可读存储介质,存储有计算机程序,所述计算机程序被处理器执行时实现如权利要求1-10中任一所述的手部关键点检测方法和如权利要求11所述的手势识别方法中的至少之一。
PCT/CN2020/107960 2019-11-29 2020-08-07 手部关键点检测方法、手势识别方法及相关装置 WO2021103648A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP20893388.7A EP4068150A4 (en) 2019-11-29 2020-08-07 HANDKEY POINT DETECTION METHODS, GESTURE RECOGNITION METHODS AND RELATED DEVICES
US17/780,694 US20230252670A1 (en) 2019-11-29 2020-08-07 Method for detecting hand key points, method for recognizing gesture, and related devices

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201911198688.8A CN110991319B (zh) 2019-11-29 2019-11-29 手部关键点检测方法、手势识别方法及相关装置
CN201911198688.8 2019-11-29

Publications (1)

Publication Number Publication Date
WO2021103648A1 true WO2021103648A1 (zh) 2021-06-03

Family

ID=70088256

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/107960 WO2021103648A1 (zh) 2019-11-29 2020-08-07 手部关键点检测方法、手势识别方法及相关装置

Country Status (4)

Country Link
US (1) US20230252670A1 (zh)
EP (1) EP4068150A4 (zh)
CN (1) CN110991319B (zh)
WO (1) WO2021103648A1 (zh)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113723187A (zh) * 2021-07-27 2021-11-30 武汉光庭信息技术股份有限公司 手势关键点的半自动标注方法及系统
CN115471874A (zh) * 2022-10-28 2022-12-13 山东新众通信息科技有限公司 基于监控视频的施工现场危险行为识别方法
CN116309591A (zh) * 2023-05-19 2023-06-23 杭州健培科技有限公司 一种医学影像3d关键点检测方法、模型训练方法及装置

Families Citing this family (27)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110991319B (zh) * 2019-11-29 2021-10-19 广州市百果园信息技术有限公司 手部关键点检测方法、手势识别方法及相关装置
CN111401318B (zh) * 2020-04-14 2022-10-04 支付宝(杭州)信息技术有限公司 动作识别方法及装置
CN111507354B (zh) * 2020-04-17 2023-12-12 北京百度网讯科技有限公司 信息抽取方法、装置、设备以及存储介质
CN113553884B (zh) * 2020-04-26 2023-04-18 武汉Tcl集团工业研究院有限公司 手势识别方法、终端设备及计算机可读存储介质
CN111753669A (zh) * 2020-05-29 2020-10-09 广州幻境科技有限公司 基于图卷积网络的手部数据识别方法、系统和存储介质
CN111695484A (zh) * 2020-06-08 2020-09-22 深兰人工智能芯片研究院(江苏)有限公司 一种用于手势姿态分类的方法
CN111783626B (zh) * 2020-06-29 2024-03-26 北京字节跳动网络技术有限公司 图像识别方法、装置、电子设备及存储介质
CN111832468A (zh) * 2020-07-09 2020-10-27 平安科技(深圳)有限公司 基于生物识别的手势识别方法、装置、计算机设备及介质
CN111882531B (zh) * 2020-07-15 2021-08-17 中国科学技术大学 髋关节超声图像自动分析方法
CN111985556A (zh) * 2020-08-19 2020-11-24 南京地平线机器人技术有限公司 关键点识别模型的生成方法和关键点识别方法
CN111998822B (zh) * 2020-10-29 2021-01-15 江西明天高科技股份有限公司 一种空间角度姿态计算方法
CN112270669B (zh) 2020-11-09 2024-03-01 北京百度网讯科技有限公司 人体3d关键点检测方法、模型训练方法及相关装置
CN112527113A (zh) * 2020-12-09 2021-03-19 北京地平线信息技术有限公司 手势识别及手势识别网络的训练方法和装置、介质和设备
CN112509123A (zh) * 2020-12-09 2021-03-16 北京达佳互联信息技术有限公司 三维重建方法、装置、电子设备及存储介质
US20220189195A1 (en) * 2020-12-15 2022-06-16 Digitrack Llc Methods and apparatus for automatic hand pose estimation using machine learning
KR20220086971A (ko) * 2020-12-17 2022-06-24 삼성전자주식회사 손 관절을 추적하는 방법 및 장치
CN112613409A (zh) * 2020-12-24 2021-04-06 咪咕动漫有限公司 手部关键点检测方法、装置、网络设备及存储介质
CN112784810A (zh) * 2021-02-08 2021-05-11 风变科技(深圳)有限公司 手势识别方法、装置、计算机设备和存储介质
CN112906594B (zh) * 2021-03-03 2022-06-03 杭州海康威视数字技术股份有限公司 一种布防区域生成方法、装置、设备及存储介质
CN113326751B (zh) * 2021-05-19 2024-02-13 中国科学院上海微系统与信息技术研究所 一种手部3d关键点的标注方法
CN113421182B (zh) * 2021-05-20 2023-11-28 北京达佳互联信息技术有限公司 三维重建方法、装置、电子设备及存储介质
CN113384291A (zh) * 2021-06-11 2021-09-14 北京华医共享医疗科技有限公司 一种医用超声检测方法及系统
CN113724393B (zh) * 2021-08-12 2024-03-19 北京达佳互联信息技术有限公司 三维重建方法、装置、设备及存储介质
CN114035687B (zh) * 2021-11-12 2023-07-25 郑州大学 一种基于虚拟现实的手势识别方法和系统
CN114066986B (zh) * 2022-01-11 2022-04-19 南昌虚拟现实研究院股份有限公司 三维坐标的确定方法、装置、电子设备及存储介质
CN116360603A (zh) * 2023-05-29 2023-06-30 中数元宇数字科技(上海)有限公司 基于时序信号匹配的交互方法、设备、介质及程序产品
CN117322872A (zh) * 2023-10-26 2024-01-02 北京软体机器人科技股份有限公司 运动捕捉方法及装置

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102298649A (zh) * 2011-10-09 2011-12-28 南京大学 一种人体动作数据的空间轨迹检索方法
US20120068917A1 (en) * 2010-09-17 2012-03-22 Sony Corporation System and method for dynamic gesture recognition using geometric classification
CN109214282A (zh) * 2018-08-01 2019-01-15 中南民族大学 一种基于神经网络的三维手势关键点检测方法和系统
CN109858524A (zh) * 2019-01-04 2019-06-07 北京达佳互联信息技术有限公司 手势识别方法、装置、电子设备及存储介质
CN110147767A (zh) * 2019-05-22 2019-08-20 深圳市凌云视迅科技有限责任公司 基于二维图像的三维手势姿态预测方法
CN110991319A (zh) * 2019-11-29 2020-04-10 广州市百果园信息技术有限公司 手部关键点检测方法、手势识别方法及相关装置

Family Cites Families (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6204852B1 (en) * 1998-12-09 2001-03-20 Lucent Technologies Inc. Video hand image three-dimensional computer interface
US10338387B2 (en) * 2014-12-15 2019-07-02 Autodesk, Inc. Skin-based approach to virtual modeling
CN104778746B (zh) * 2015-03-16 2017-06-16 浙江大学 一种基于数据手套使用自然手势进行精确三维建模的方法
FR3061979B1 (fr) * 2017-01-17 2020-07-31 Exsens Procede de creation d'une representation tridimensionnelle virtuelle d'une personne
US10620696B2 (en) * 2017-03-20 2020-04-14 Tactual Labs Co. Apparatus and method for sensing deformation
CN109871857A (zh) * 2017-12-05 2019-06-11 博世汽车部件(苏州)有限公司 用于识别手势的方法和装置
US11544871B2 (en) * 2017-12-13 2023-01-03 Google Llc Hand skeleton learning, lifting, and denoising from 2D images
CN108399367B (zh) * 2018-01-31 2020-06-23 深圳市阿西莫夫科技有限公司 手部动作识别方法、装置、计算机设备及可读存储介质
CN108305321B (zh) * 2018-02-11 2022-09-30 牧星天佑(北京)科技文化发展有限公司 一种基于双目彩色成像系统的立体人手3d骨架模型实时重建方法和装置
CN110163048B (zh) * 2018-07-10 2023-06-02 腾讯科技(深圳)有限公司 手部关键点的识别模型训练方法、识别方法及设备
CN109271933B (zh) * 2018-09-17 2021-11-16 北京航空航天大学青岛研究院 基于视频流进行三维人体姿态估计的方法
CN110020633B (zh) * 2019-04-12 2022-11-04 腾讯科技(深圳)有限公司 姿态识别模型的训练方法、图像识别方法及装置
CN110188700B (zh) * 2019-05-31 2022-11-29 安徽大学 基于分组回归模型的人体三维关节点预测方法
CN110443148B (zh) * 2019-07-10 2021-10-22 广州市讯码通讯科技有限公司 一种动作识别方法、系统和存储介质
CN110443154B (zh) * 2019-07-15 2022-06-03 北京达佳互联信息技术有限公司 关键点的三维坐标定位方法、装置、电子设备和存储介质

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120068917A1 (en) * 2010-09-17 2012-03-22 Sony Corporation System and method for dynamic gesture recognition using geometric classification
CN102298649A (zh) * 2011-10-09 2011-12-28 南京大学 一种人体动作数据的空间轨迹检索方法
CN109214282A (zh) * 2018-08-01 2019-01-15 中南民族大学 一种基于神经网络的三维手势关键点检测方法和系统
CN109858524A (zh) * 2019-01-04 2019-06-07 北京达佳互联信息技术有限公司 手势识别方法、装置、电子设备及存储介质
CN110147767A (zh) * 2019-05-22 2019-08-20 深圳市凌云视迅科技有限责任公司 基于二维图像的三维手势姿态预测方法
CN110991319A (zh) * 2019-11-29 2020-04-10 广州市百果园信息技术有限公司 手部关键点检测方法、手势识别方法及相关装置

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP4068150A4 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113723187A (zh) * 2021-07-27 2021-11-30 武汉光庭信息技术股份有限公司 手势关键点的半自动标注方法及系统
CN115471874A (zh) * 2022-10-28 2022-12-13 山东新众通信息科技有限公司 基于监控视频的施工现场危险行为识别方法
CN115471874B (zh) * 2022-10-28 2023-02-07 山东新众通信息科技有限公司 基于监控视频的施工现场危险行为识别方法
CN116309591A (zh) * 2023-05-19 2023-06-23 杭州健培科技有限公司 一种医学影像3d关键点检测方法、模型训练方法及装置
CN116309591B (zh) * 2023-05-19 2023-08-25 杭州健培科技有限公司 一种医学影像3d关键点检测方法、模型训练方法及装置

Also Published As

Publication number Publication date
EP4068150A4 (en) 2022-12-21
US20230252670A1 (en) 2023-08-10
EP4068150A1 (en) 2022-10-05
CN110991319A (zh) 2020-04-10
CN110991319B (zh) 2021-10-19

Similar Documents

Publication Publication Date Title
WO2021103648A1 (zh) 手部关键点检测方法、手势识别方法及相关装置
US11783496B2 (en) Scalable real-time hand tracking
US10043308B2 (en) Image processing method and apparatus for three-dimensional reconstruction
KR101865655B1 (ko) 증강현실 상호 작용 서비스 제공 장치 및 방법
Erol et al. Vision-based hand pose estimation: A review
CN108509026B (zh) 基于增强交互方式的远程维修支持系统及方法
US20130335318A1 (en) Method and apparatus for doing hand and face gesture recognition using 3d sensors and hardware non-linear classifiers
CN111709268B (zh) 一种深度图像中的基于人手结构指导的人手姿态估计方法和装置
CN110210426B (zh) 基于注意力机制从单幅彩色图像进行手部姿态估计的方法
CN113034652A (zh) 虚拟形象驱动方法、装置、设备及存储介质
Zhang et al. A practical robotic grasping method by using 6-D pose estimation with protective correction
CN110569817A (zh) 基于视觉实现手势识别的系统和方法
Liang et al. Hough forest with optimized leaves for global hand pose estimation with arbitrary postures
CN114641799A (zh) 对象检测设备、方法和系统
Xu et al. Robust hand gesture recognition based on RGB-D Data for natural human–computer interaction
CN115210763A (zh) 用于包括姿态和大小估计的对象检测的系统和方法
CN114332927A (zh) 课堂举手行为检测方法、系统、计算机设备和存储介质
Zhang et al. Digital twin-enabled grasp outcomes assessment for unknown objects using visual-tactile fusion perception
Kang et al. Yolo-6d+: single shot 6d pose estimation using privileged silhouette information
Pang et al. Basicnet: Lightweight 3d hand pose estimation network based on biomechanical structure information for dexterous manipulator teleoperation
Bhuyan et al. Hand gesture recognition and animation for local hand motions
Jeong et al. Hand gesture user interface for transforming objects in 3d virtual space
CN116686006A (zh) 基于可变形模型的三维扫描配准
Wang Real-time hand-tracking as a user input device
Shin et al. Deep Learning-based Hand Pose Estimation from 2D Image

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20893388

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2020893388

Country of ref document: EP

Effective date: 20220629