WO2021218126A1 - 手势识别方法、终端设备及计算机可读存储介质 - Google Patents

手势识别方法、终端设备及计算机可读存储介质 Download PDF

Info

Publication number
WO2021218126A1
WO2021218126A1 PCT/CN2020/130575 CN2020130575W WO2021218126A1 WO 2021218126 A1 WO2021218126 A1 WO 2021218126A1 CN 2020130575 W CN2020130575 W CN 2020130575W WO 2021218126 A1 WO2021218126 A1 WO 2021218126A1
Authority
WO
WIPO (PCT)
Prior art keywords
gesture
attribute
skeleton data
node
global
Prior art date
Application number
PCT/CN2020/130575
Other languages
English (en)
French (fr)
Inventor
刘璐
胡振邦
刘阳兴
Original Assignee
武汉Tcl集团工业研究院有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 武汉Tcl集团工业研究院有限公司 filed Critical 武汉Tcl集团工业研究院有限公司
Publication of WO2021218126A1 publication Critical patent/WO2021218126A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Definitions

  • This application belongs to the technical field of gesture recognition, and in particular relates to a gesture recognition method, a terminal device, and a computer-readable storage medium.
  • Gesture recognition is an emerging human-computer interaction method. Because of its user-friendliness and natural interaction, it has been applied to many scenarios, such as sign language understanding, virtual reality, and robot control.
  • the existing gesture recognition method uses a convolutional neural network to input the gesture image into the convolutional neural network for feature extraction, and recognizes the type of gesture in the gesture image. Because the convolutional neural network needs to perform feature extraction on the entire gesture image, Its gesture recognition speed is slow.
  • This application provides a gesture recognition method, a terminal device, and a computer-readable storage medium to improve the speed of gesture recognition.
  • an embodiment of the present application provides a gesture recognition method, and the gesture recognition method includes:
  • the gesture type corresponding to the target gesture is determined.
  • an embodiment of the present application provides a gesture recognition device, and the gesture recognition device includes:
  • the skeleton data acquisition module is used to acquire the gesture skeleton data corresponding to the target gesture
  • the attribute data determining module is configured to determine the hand attribute data corresponding to the target gesture according to the gesture skeleton data, wherein the hand attribute data is used to reflect the joint point characteristics and bone characteristics of the target gesture;
  • An initial attribute determining module configured to determine an initial global attribute corresponding to the target gesture according to the hand attribute data, wherein the initial global attribute is used to reflect the gesture characteristics of the target gesture;
  • the gesture type determining module is configured to determine the gesture type corresponding to the target gesture according to the initial global attribute.
  • an embodiment of the present application provides a terminal device, including a memory, a processor, and a computer program stored in the memory and running on the processor.
  • the processor executes the computer program, The steps of the gesture recognition method as described in the first aspect are implemented.
  • an embodiment of the present application provides a computer-readable storage medium that stores a computer program that, when executed by a processor, implements the gesture recognition method described in the first aspect above A step of.
  • the embodiments of the present application provide a computer program product that, when the computer program product runs on a terminal device, causes the terminal device to perform the steps of the gesture recognition method described in the first aspect.
  • this application can effectively extract the hand attribute data reflecting the gesture characteristics of the target gesture by using the gesture skeleton data, and extract the initial global attributes reflecting the gesture characteristics of the target gesture according to the hand attribute data, and then identify according to the initial global attributes
  • the gesture type of the target gesture compared with the feature extraction of the entire gesture image, the amount of data of the gesture skeleton data is smaller.
  • Using the gesture skeleton data for gesture recognition reduces the amount of data calculation in the gesture recognition process and improves the gesture recognition speed.
  • FIG. 1 is a schematic diagram of the implementation process of the gesture recognition method provided by Embodiment 1 of the present application;
  • Figure 2a is an example diagram of joint points in the gesture skeleton
  • Figure 2b is an example diagram of dynamic gestures
  • Fig. 3 is a schematic diagram of the implementation process of the gesture recognition method provided in the second embodiment of the present application.
  • Figure 4 is an example diagram of the gesture recognition process
  • FIG. 5a is an example diagram of a confusion matrix of gesture classification on a gesture skeleton data set including 14 gesture types
  • FIG. 5b is an example diagram of a confusion matrix of gesture classification on a gesture skeleton data set including 28 gesture types;
  • FIG. 6 is a schematic structural diagram of a gesture recognition device provided in Embodiment 3 of the present application.
  • FIG. 7 is a schematic structural diagram of a terminal device provided in Embodiment 4 of the present application.
  • the gesture recognition method involved in the embodiments of the present application may be that after the terminal collects the gesture image, the terminal analyzes the gesture image and recognizes the gesture type in the gesture image; it may also be that the terminal sends the collected gesture image to the server, and the server Analyze the gesture image and identify the type of gesture in the gesture image.
  • FIG. 1 is a schematic diagram of the implementation process of the gesture recognition method provided by Embodiment 1 of the present application.
  • the gesture recognition method is applied to a terminal device. As shown in the figure, the gesture recognition method may include the following steps:
  • Step 101 Obtain gesture skeleton data corresponding to the target gesture.
  • the gesture skeleton data corresponding to the target gesture can be acquired through the gesture skeleton detection device, and the gesture skeleton data corresponding to the target gesture can also be acquired from the gesture image, which is not limited here.
  • the gesture skeleton detection device is a device that can directly collect the gesture skeleton data corresponding to the target gesture
  • the gesture image is an image containing the target gesture.
  • the number of gesture skeleton data can be one group or at least two groups, which is not limited here.
  • the target gesture may refer to a gesture for which gesture recognition is to be performed.
  • Gestures refer to the postures of the hands, which are the various postures or actions that people reflect when they use their hands.
  • Gesture skeleton data refers to the position information of the joint points in the gesture skeleton corresponding to the gesture skeleton data.
  • a coordinate system can be established for the gesture skeleton data.
  • the position information of the joint points can be the coordinates of the joint points in the coordinate system.
  • the above-mentioned coordinate system can be It is a two-dimensional coordinate system or a three-dimensional coordinate system, which is not limited here.
  • the joint points in the gesture skeleton are the connection points between the bones of the gesture skeleton.
  • the gesture skeleton usually contains 21 joint points.
  • Figure 2a shows an example of the joint points in the gesture skeleton. The symbol in the gesture skeleton is shown in Figure 2a. "Is the joint point, and the position information of the joint point in the gesture skeleton in Figure 2a is the gesture skeleton data.
  • the method before acquiring the gesture skeleton data corresponding to the target gesture, the method further includes:
  • the acquiring gesture skeleton data corresponding to the target gesture includes:
  • Gestures are usually divided into static gestures and dynamic gestures.
  • the recognition of static gestures mainly considers the appearance characteristics of the gesture at a certain point in time. Therefore, when recognizing static gestures, a frame of gesture image can be acquired through an image acquisition device, and gesture recognition can be performed based on a frame of gesture image; dynamic gesture recognition is mainly It considers a series of actions over a period of time and consists of a series of static gestures. Therefore, when recognizing dynamic gestures, continuous N frames of gesture images can be acquired through the image acquisition device, and gesture recognition can be performed based on the continuous N frames of gesture images.
  • the gesture image acquisition method can be acquired through an image acquisition device, or it can be acquired through a server or other equipment. In this embodiment, the gesture image acquisition method is not limited. .
  • the continuous N frames of gesture images may refer to N frames of gesture images collected by the image acquisition device at a preset time interval.
  • the image acquisition device collects gesture images once in 0.05 seconds until N frames of gesture images are collected. Any of the above N frames of gesture images
  • the collection time interval of two adjacent frames of gesture images are both 0.05 seconds.
  • a set of gesture skeleton data can be obtained from the one frame of gesture image; if N consecutive frames of gesture images are obtained, N sets of gesture skeleton data can be obtained from the N frames of gesture images.
  • the gesture images each correspond to a set of gesture skeleton data.
  • Step 102 Determine the hand attribute data corresponding to the target gesture according to the gesture skeleton data.
  • the hand attribute data is used to reflect the joint point characteristics and bone characteristics of the target gesture.
  • all target bones can be found from all bones of the gesture skeleton.
  • Each target bone is a bone between two adjacent joint points.
  • the length of each target bone can be compared with the coordinate axis.
  • the angle of rotation is used as the attribute of each target bone
  • the position information and movement speed of each joint point of the gesture skeleton are used as the attribute of each joint point
  • the attributes of all target bones and all joints in the gesture skeleton of the target gesture are determined.
  • the attribute of the point is the attribute data of the hand corresponding to the target gesture.
  • the attributes of the joint points are joint point characteristics
  • the attributes of the target bone are bone characteristics.
  • a two-dimensional coordinate system is established with joint point 1 as the origin, and the units of the X-axis and Y-axis are cm.
  • the length of the target bone is 1.02cm
  • the rotation angle of the target bone to the X axis is 100°
  • the rotation angle to the Y axis is 30°
  • the attributes of the target bone can be (1.02, 100°, 30°)
  • the coordinates of joint point 3 are (-0.2, 1)
  • the motion speed is 0.02 m/s
  • the attribute of joint point 3 can be (-0.2, 1, 0.02).
  • Step 103 Determine an initial global attribute corresponding to the target gesture according to the hand attribute data.
  • the initial global attributes are used to reflect the gesture characteristics of the target gesture, and the gesture characteristics of the target gesture may refer to the gesture shape presented by the target gesture, as shown in Fig. 2a, the gesture characteristic of the gesture is open.
  • the global attributes of the gesture skeleton data can be set in advance.
  • the global attributes of the gesture skeleton data are used to fuse the attributes of all related nodes of the gesture skeleton data and the attributes of all target bones. , Get the initial global attributes that can reflect the gesture characteristics of the target gesture.
  • the global attribute of the gesture skeleton data refers to the characteristics of the attributes of all related nodes and the attributes of all target bones used to aggregate the gesture skeleton data.
  • the user can set the global attributes of the gesture skeleton data according to actual needs (for example, set to 0) , It is not limited here.
  • Step 104 Determine the gesture type corresponding to the target gesture according to the initial global attribute.
  • the initial global attributes corresponding to different gesture types can be set in advance. After the initial global attributes corresponding to the target gesture are determined, the initial global attributes corresponding to the target gesture and the initial global attributes corresponding to the different gesture types are obtained.
  • the gesture type with the greatest similarity is regarded as the gesture type corresponding to the target gesture. Among them, the gesture type is the posture presented by the target gesture, such as grabbing, opening, shaking, and so on.
  • the trained classification model can be used to identify the gesture type corresponding to the target gesture.
  • the classification model needs to be trained.
  • the training sample data can be obtained first, and the training The sample data and labels (the label is the correct gesture type corresponding to the training sample data) are input into the classification model, and the classification model is updated and learned to continuously reduce the value of the objective function (such as the loss function).
  • the value of the objective function is small
  • the model training ends and the trained classification model is obtained.
  • Gestures are usually divided into static gestures and dynamic gestures.
  • the recognition of static gestures mainly considers the appearance characteristics of the gesture at a certain point in time (ie, gesture characteristics).
  • the recognition of dynamic gestures mainly considers a series of actions over a period of time. Gesture pose.
  • the target gesture is a static gesture
  • the number of initial global attributes is one
  • the determining the gesture type corresponding to the target gesture according to the initial global attributes includes:
  • the initial global attributes are input to the trained classification model, and the gesture type corresponding to the target gesture is identified through the classification model.
  • the target gesture is a static gesture
  • an initial global attribute that can reflect the gesture characteristics of the target gesture can be directly input into the trained classification model for gesture recognition, and the gesture type corresponding to the target gesture can be obtained.
  • the gesture shown in Figure 2a is a static gesture.
  • the number of initial global attributes is at least two, and the determining the gesture type corresponding to the target gesture according to the initial global attributes includes:
  • Target global attribute Determining a target global attribute according to at least two of the initial global attributes, wherein the target global attribute is used to reflect the movement characteristics of the target gesture
  • the gesture type corresponding to the target gesture is determined.
  • the target gesture is a dynamic gesture
  • at least two initial global attributes are acquired, and at least two initial global attributes are spliced together.
  • the splicing result is the target global attribute that can reflect the motion characteristics of the target gesture, according to the motion characteristics of the target gesture.
  • the target global attribute of the characteristic identifies the gesture type corresponding to the target gesture.
  • the recognition of dynamic gestures mainly considers a series of actions over a period of time, and is composed of a series of static gestures, as shown in Figure 2b is an example diagram of dynamic gestures, which is composed of 8 frames of static gestures.
  • the number of initial global attributes is 8, respectively Is the initial global attribute corresponding to the first set of gesture skeleton data, Is the initial global attribute corresponding to the second set of gesture skeleton data, Is the initial global attribute corresponding to the third set of gesture skeleton data, Is the initial global attribute corresponding to the fourth set of gesture skeleton data, Is the initial global attribute corresponding to the fifth group of gesture skeleton data, Is the initial global attribute corresponding to the sixth group of gesture skeleton data, Is the initial global attribute corresponding to the seventh set of gesture skeleton data,
  • the above 8 initial global attributes are spliced to obtain the target global attribute as
  • the target global attributes corresponding to different gesture types can be set in advance. After the target global attributes corresponding to the target gestures are determined, the target global attributes corresponding to the target gestures are obtained, respectively. For the similarity of the corresponding target global attributes, the gesture type with the greatest similarity is used as the gesture type corresponding to the target gesture. Taking four different gesture types, such as grabbing, opening, shaking, and tapping, as an example, the global attributes of the target gesture corresponding to the target gesture are 80% similar to the global attributes of the grabbed target, which is similar to the global attributes of the opened target The degree of similarity is 10%, the similarity to the global attribute of the shaking target is 5%, and the similarity to the global attribute of the tapping target is 5%. It can be seen that the target global attribute corresponding to the target gesture is similar to the global attribute of the grabbing target The degree is the largest, so the target gesture is determined to be grabbing.
  • the global attributes of the target are input to the trained classification model, and the gesture type corresponding to the target gesture is identified through the classification model.
  • the above-mentioned trained classification model is obtained by training based on a plurality of training samples, and each group of training samples includes a target global attribute and a gesture type corresponding to the target global attribute.
  • the embodiment of the application can effectively extract the hand attribute data reflecting the gesture characteristics of the target gesture by using the gesture skeleton data, and extract the initial global attributes reflecting the gesture characteristics of the target gesture according to the hand attribute data, and then identify the target according to the initial global attributes Compared with the feature extraction of the entire gesture image, the gesture type of the gesture has a smaller amount of data in the gesture skeleton data.
  • Using the gesture skeleton data for gesture recognition reduces the amount of data calculation in the gesture recognition process and improves the speed of gesture recognition.
  • FIG. 3 is a schematic diagram of the implementation process of the gesture recognition method provided in the second embodiment of the present application.
  • the gesture recognition method is applied to a terminal device. As shown in the figure, the gesture recognition method may include the following steps:
  • Step 301 Obtain gesture skeleton data corresponding to the target gesture.
  • step S101 This step is the same as step S101.
  • step S101 For details, please refer to the related description of step S101, which will not be repeated here.
  • Step 302 Acquire the first attribute of each node in the gesture skeleton data according to the position information of each node in the gesture skeleton data.
  • the gesture skeleton data includes position information of at least two nodes, and the above at least two nodes are joint points of the gesture skeleton corresponding to the gesture skeleton data.
  • a coordinate system is established for the gesture skeleton data.
  • the position information of each node in the gesture skeleton data is the coordinate of each node in the coordinate system.
  • the first attribute of each node can refer to the node determined according to the position information of each node.
  • the attributes include but are not limited to the position information of the node and the movement speed of the node. It should be noted that each node has its own corresponding first attribute.
  • the target gesture corresponds to N sets of gesture skeleton data, where N is an integer greater than 1, and the sequence of the N sets of gesture skeleton data is determined according to the movement sequence of the target gesture, and the N sets of gesture skeleton data
  • the data includes the first group of gesture skeleton data and N-1 groups of non-first group of gesture skeleton data;
  • the acquiring the first attribute of each node in the gesture skeleton data according to the position information of each node in the gesture skeleton data includes:
  • the position information and the preset motion speed of each node in the first set of gesture skeleton data are the first attributes of the node.
  • the target gesture corresponds to N sets of gesture skeleton data
  • the target gesture is a dynamic gesture, corresponding to a gesture movement process
  • the N sets of gesture skeleton data are sorted according to the sequence in which the N sets of gesture skeleton data are acquired during the gesture movement.
  • the dynamic gesture in Figure 2b includes 8 frames of static gestures, namely b1, b2, b3, b4, b5, b6, b7, and b8.
  • the above 8 frames of static gestures are in the order b1, b2, b3, b4, b5, b6, B7, b8 complete a dynamic gesture, then the order of the eight groups of gesture skeleton data corresponding to this dynamic gesture is b1, b2, b3, b4, b5, b6, b7, b8.
  • the group of gesture skeleton data is the first The group of gesture skeleton data does not have the previous group of gesture skeleton data, so the preset movement speed can be used as the movement speed of the node in the first group of gesture skeleton data; if the group of gesture skeleton data is not the first group of gesture skeleton data, it exists The previous group of gesture skeleton data, therefore, the movement speed of the node in the non-first group of gesture skeleton data can be calculated according to the position information of the node in the group of gesture skeleton data and the position information of the previous group of gesture skeleton data.
  • the preset movement speed is a preset movement speed, and the user can set the value of the movement speed according to actual needs, for example, set the preset movement speed to zero.
  • the acquiring the first attribute of each node in the gesture skeleton data according to the position information of each node in the gesture skeleton data includes:
  • the j-th group of non-first group of gesture skeleton data is any one of the N-1 groups of non-first group of gesture skeleton data, j is a positive integer less than or equal to N-1, and the j-th group of non-first group
  • the method for determining the first attribute of each node in the gesture skeleton data is:
  • the movement speed of the i-th node in the j-th group of non-first group of gesture skeleton data that is, any node in the j-th group of non-first group of gesture skeleton data
  • the collection time interval between the skeleton data and the j-1th group of non-first group of gesture skeleton data calculate the position information of the i-th node in the jth group of non-first group of gesture skeleton data and the position information in the j-1th group of non-first group
  • the difference between the position information in the gesture skeleton data, and the value obtained by dividing the difference by the collection time interval is the movement speed of the i-th node.
  • the position information of the i-th node in the j-th group of non-first group of gesture skeleton data is the two-dimensional coordinate
  • the position information in the j-1th group of non-first group of gesture skeleton data is two-dimensional coordinates
  • the speed of the i-th node in the X-axis direction is The speed in the Y-axis direction is That is, the movement speed of the i-th node is
  • the coordinate axis of the gesture skeleton data is a three-dimensional coordinate system
  • the position information of the i-th node in the j-th group of non-first group of gesture skeleton data is the three-dimensional coordinate
  • the position information in the j-1th group of non-first group of gesture skeleton data is three-dimensional coordinates
  • the speed of the i-th node in the X-axis direction is The speed in the Y-axis direction is The speed in the Z-axis direction is That is
  • the superscripts of x, y, and z above represent the group number of the non-first group of gesture skeleton data
  • the subscript represents the node number.
  • the superscript j represents the jth group of non-first gesture skeleton data
  • the subscript is i represents the first group of gesture skeleton data. i nodes; above
  • the superscript j represents the group number of the non-first group of gesture skeleton data (that is, the j-th group of non-first group of gesture skeleton data)
  • the subscript x represents the X axis
  • y represents the Y axis
  • z represents the Z axis
  • the subscript i represents the node Number (ie the i-th node).
  • the j-1th group of non-first group of gesture skeleton data refers to the first group of gesture skeleton data.
  • ⁇ t is the collection time interval between two adjacent groups of gesture skeleton data.
  • 4 groups of gesture skeleton data are collected continuously, which can be called the first group according to the order of collection time.
  • Gesture skeleton data, the first group of non-first group of gesture skeleton data, the second group of non-first group of gesture skeleton data, and the third group of non-first group of gesture skeleton numbers are collected continuously, which can be called the first group according to the order of collection time.
  • the i-th node in the first group of gesture skeleton data determines the position information of the node in the first group of gesture skeleton data And the preset motion speed is the first attribute of the node;
  • the position information of the node in the first group of non-first group of gesture skeleton data is The position information in the first set of gesture skeleton data is Then the speed of the node in the X-axis direction is calculated as The speed in the Y-axis direction is The speed in the Z-axis direction is That is, the movement speed of the node is Determine the position information of the node in the first group of non-first group of gesture skeleton data And movement speed Is the first attribute of the node; for the i-
  • Step 303 Acquire the first attribute of each edge in the gesture skeleton data according to the position information of each pair of target nodes in the gesture skeleton data.
  • each pair of target nodes refers to two adjacent nodes that meet a preset condition, and the two adjacent nodes are connected by an edge.
  • the above-mentioned preset conditions are related to the biological characteristics of the gesture skeleton, and may refer to nodes located at two end points of a bone in the gesture skeleton.
  • An edge between the above two adjacent nodes refers to a piece of bone between the above two adjacent nodes, as shown in Figure 2a.
  • node 1 and node 2 are a pair of target nodes, node 1 and node 3. It is also a pair of target nodes, and node 2 and node 3 are not a pair of target nodes.
  • the acquiring the first attribute of each edge in the gesture skeleton data according to the position information of each pair of target nodes in the gesture skeleton data includes:
  • the length of each side and the rotation angle of the side in the gesture skeleton data are determined as the first attribute of the side.
  • the position information of each pair of target nodes refers to the position information of each target node in each pair of target nodes in the gesture skeleton data.
  • the location information refers to the location information of node 1 and the location information of node 2.
  • the rotation angle of each side in the gesture skeleton data refers to the angle between each side and each coordinate axis in the coordinate system.
  • the t-th group of gesture skeleton data Take the t-th group of gesture skeleton data (if the number of gesture skeleton data is one group, the group of gesture skeleton data is the t-th group of gesture skeleton data; if the number of gesture skeleton data is N, the t-th group of gesture skeleton data is N Take any group of gesture skeleton data in the group of gesture skeleton data) and three-dimensional coordinate system as an example. For any pair of target nodes in the gesture skeleton data, the two target nodes in the pair of target nodes use the i-th node and the g-th node respectively.
  • Nodes indicate that the location information of the i-th node is The location information of the gth node is Then the length of an edge (for example, the k-th edge) corresponding to the pair of target nodes The angle between the side and the X axis The angle between the side and the Y axis The angle between this side and the Z axis
  • Step 304 Obtain the first global attribute of the gesture skeleton data.
  • the first global attribute of the gesture skeleton data can be set in advance, for example, the first global attribute of the gesture skeleton data is set to zero.
  • the user can set the first global attribute of the gesture skeleton data according to actual needs, which is not limited here.
  • Step 305 Determine the first global attribute of the gesture skeleton data, the first attribute of each node, and the first attribute of each edge as the hand attribute data corresponding to the target gesture.
  • a set of gesture skeleton data includes 21 nodes, and there are 21 pairs of target nodes in 21 nodes, then a set of gesture skeleton data includes 21 edges, and the first global attribute of a set of gesture skeleton data can be determined, The first attribute of each of the 21 nodes and the first attribute of each of the 21 edges are the hand attribute data corresponding to the target gesture.
  • Step 306 Determine an initial global attribute corresponding to the target gesture according to the hand attribute data.
  • step S103 This step is the same as step S103.
  • step S103 For details, please refer to the related description of step S103, which will not be repeated here.
  • the determining the initial global attribute corresponding to the target gesture according to the hand attribute data includes:
  • the first global attribute of the gesture skeleton data, the first attribute of each node, and the first attribute of each edge are input to the trained graph network for processing, and the graph network outputs the initial global corresponding to the target gesture Attributes.
  • the graph network is a neural network used to manipulate and calculate graph data.
  • the hand attribute data is input as graph data to the trained graph network for processing, and the target gesture corresponding to the gesture can be obtained.
  • the initial global attributes of the characteristic is a neural network used to manipulate and calculate graph data.
  • the graph network Before using the graph network to process the opponent attribute data, the graph network needs to be trained. Supervised training can be used.
  • the loss function is defined as cross-entropy loss.
  • the backpropagation algorithm is used to calculate the gradient, and the optimizer is used to perform the graph network. Training, where the optimizer can be Stochastic Gradient Descent (SGD), Adam, Momentum and other commonly used optimizers. Training is achieved by minimizing the loss function. After the graph network is trained to convergence, the model parameters are saved to obtain the trained graph The internet.
  • the graph network includes a first graph network block, a second graph network block, and a third graph network block.
  • the first global attribute of the gesture skeleton data, the first attribute of each node, and each The first attribute of the edge is input to the trained graph network for processing, and the initial global attribute corresponding to the target gesture output by the graph network includes:
  • the first global attribute of the gesture skeleton data, the first attribute of each node, and the first attribute of each edge are input to the first graph network block, and the first graph network block is used for the gesture skeleton
  • the first global attribute of the data, the first attribute of each node, and the first attribute of each edge are updated, and the second global attribute of the gesture skeleton data, the second attribute of each node, and the first attribute of each edge are output.
  • the second global attribute of the gesture skeleton data, the second attribute of each node, and the second attribute of each edge are input to the second graph network block, and the second graph network block compares the gesture skeleton data Update and aggregate the second global attribute of each node, the second attribute of each node, and the second attribute of each edge, and output the third global attribute of the gesture skeleton data;
  • the third global attribute of the gesture skeleton data is input to the third graph network block, and the third graph network block updates the third global attribute, and outputs the initial global attribute corresponding to the target gesture.
  • the second global attribute of the gesture skeleton data is the attribute obtained after the first global attribute of the gesture skeleton data is updated using the first graph network block; the second attribute of each node in the gesture skeleton data is the use of the first graph network block The attribute obtained after updating the first attribute of each node; the second attribute of each edge in the gesture skeleton data is the attribute obtained after updating the first attribute of each edge using the first graph network block.
  • the third global attribute of the gesture skeleton data is the attribute obtained by using the second graph network block to update and aggregate the second global attribute of the gesture skeleton data, the second attribute of each node, and the second attribute of each edge.
  • the initial global attribute corresponding to the target gesture is the attribute obtained after using the third graph network to update the third global attribute of the gesture skeleton data.
  • the first graph network block includes a first attribute update layer, a first node update layer, and a first edge update layer, the first attribute update layer, the first node update layer, and the first edge update layer.
  • the edge update layer is a fully connected layer or a convolutional layer.
  • the first global attribute of the gesture skeleton data, the first attribute of each node, and the first attribute of each edge are input to the first graph network block, so The first graph network block updates the first global attribute of the gesture skeleton data, the first attribute of each node, and the first attribute of each edge respectively, and outputs the second global attribute of the gesture skeleton data, each
  • the second attribute of each node and the second attribute of each edge include:
  • the first attribute update layer, the first node update layer, and the first edge update layer in the first graph network block may be fully connected layers or convolutional layers. If the first attribute update layer is a fully connected layer, then the first attribute update layer to update the first global attribute of the gesture skeleton data can refer to multiplying the first global attribute by the weight matrix of the fully connected layer, and the result is the gesture.
  • the second global attribute of the skeleton data; if the first attribute update layer is a convolutional layer, then the first attribute update layer to update the first global attribute of the gesture skeleton data can refer to convolution of the first global attribute using a convolution kernel Operation, the result obtained is the second global attribute of the gesture skeleton data.
  • the first node update layer to update the first attribute of each node of the gesture skeleton data can refer to multiplying the first attribute of each node by the weight matrix of the fully connected layer , The result is the second attribute of each node; if the first node update layer is a convolutional layer, then the first node update layer to update the first attribute of each node of the gesture skeleton data can refer to the use of convolution The convolution operation is performed on the first attribute of each node, and the result is the second attribute of each node.
  • the first edge update layer to update the first attribute of each edge of the gesture skeleton data can refer to multiplying the first attribute of each edge by the weight matrix of the fully connected layer , The result is the second attribute of each edge; if the first edge update layer is a convolutional layer, then the first edge update layer updates the first attribute of each edge of the gesture skeleton data can refer to the use of convolution Check the convolution operation on the first attribute of each edge, and the result is the second attribute of each edge.
  • the second graph network block includes a second attribute update layer, a second node update layer, a second edge update layer, a first aggregation layer, a second aggregation layer, and a third aggregation layer.
  • the second attribute The update layer, the second node update layer, and the second edge update layer are fully connected layers or convolutional layers, and the second global attribute of the gesture skeleton data, the second attribute of each node, and each The second attribute of the edge is input to the second graph network block, the second global attribute of the gesture skeleton data of the second graph network block, the second attribute of each node, and the second attribute of each edge Perform update and aggregation, and output the third global attribute of the gesture skeleton data including:
  • the second attribute of each edge in the gesture skeleton data, the second attribute of each pair of target nodes corresponding to each edge, and the second global attribute of the gesture skeleton data are input to the second edge update layer, so The second edge update layer updates the second attribute of each edge in the gesture skeleton data, and outputs the third attribute of each edge in the gesture skeleton data;
  • the second attribute of each node in the gesture skeleton data, the node attribute corresponding to the edge of each node, and the second global attribute of the gesture skeleton data are input to the second node update layer, and the second node
  • the update layer updates the second attribute of each node in the gesture skeleton data, and outputs the third attribute of each node in the gesture skeleton data;
  • the third attributes of all nodes in the gesture skeleton data are input to the second aggregation layer, and the second aggregation layer aggregates the third attributes of all nodes in the gesture skeleton data, and outputs the gesture skeleton data Global attributes corresponding to all nodes in
  • the third attributes of all edges in the gesture skeleton data are input to the third aggregation layer, which aggregates the third attributes of all edges in the gesture skeleton data, and outputs the gesture skeleton data Global attributes corresponding to all edges in;
  • the second global attributes of the gesture skeleton data, the global attributes corresponding to all nodes, and the global attributes corresponding to all edges are input to the second attribute update layer, and the second attribute update layer is for the second attribute update layer of the gesture skeleton data. 2.
  • the global attribute is updated, and the third global attribute of the gesture skeleton data is output.
  • the second attribute of each edge in the gesture skeleton data, the second attribute of each pair of target nodes corresponding to each edge, and the second global attribute of the gesture skeleton data are input to the second edge update layer, and the second edge update layer
  • the update of the second attribute of each edge may specifically refer to the second attribute of each edge in the gesture skeleton data, the second attribute of each pair of target nodes corresponding to each edge, and the second global attribute of the gesture skeleton data.
  • the splicing result is multiplied by the weight matrix of the fully connected layer or the convolution kernel of the convolution layer to obtain the third attribute of each edge.
  • the second attribute of the k-th edge is The second attributes of the two target nodes corresponding to the kth edge are with The second global attribute of the gesture skeleton data is h 2 , and the second attribute of the k-th edge The second attribute of the two target nodes corresponding to the k-th edge with And the second global attribute h 2 of the gesture skeleton data is spliced, and the attribute obtained after splicing is It should be noted that the splicing order of the second attribute of each edge in the gesture skeleton data, the second attribute of each pair of target nodes corresponding to each edge, and the second global attribute of the gesture skeleton data is not limited here, for example, after splicing Income attribute For or
  • the first aggregation layer is the summation layer.
  • the third attribute of the edge to which each node belongs in the gesture skeleton data is input to the first aggregation layer.
  • the first aggregation layer sums the third attribute of the edge to which each node belongs. The result is the node attribute corresponding to the edge of each node.
  • the edge of the i-th node in the gesture skeleton data is the k-th edge and the n-th edge
  • the third attribute of the k-th edge is The third attribute of the nth edge is Then the third attribute of the first aggregation layer to the kth edge
  • the third attribute of the nth edge is in, Is the second attribute of the two target nodes (i-th node and d-th node, respectively) corresponding to the nth edge.
  • the second attribute of each node in the gesture skeleton data, the node attribute corresponding to the edge of each node, and the second global attribute of the gesture skeleton data are input to the second edge update layer.
  • the update of the two attributes can specifically refer to splicing the second attribute of each node in the gesture skeleton data, the node attribute corresponding to the edge of each node, and the second global attribute of the gesture skeleton data, and the result of the splicing is multiplied by the fully connected layer
  • the weight matrix or the convolution kernel corresponding to the convolution layer obtains the third attribute of each node.
  • the above attribute splicing process can refer to the splicing process of the second attribute of each edge, the second attribute of each pair of target nodes corresponding to each edge, and the second global attribute of the gesture skeleton data, which will not be repeated here. .
  • the second aggregation layer is the summation layer.
  • the third attributes of all nodes in the gesture skeleton data are input to the second aggregation layer.
  • the second aggregation layer sums the third attributes of all nodes, and the result of the summation is the correspondence of all nodes Global attributes.
  • the third aggregation layer is the summation layer.
  • the third attribute of all edges in the gesture skeleton data is input to the third aggregation layer.
  • the third aggregation layer sums the third attributes of all edges, and the result of the summation is the correspondence of all edges.
  • the second global attribute of the gesture skeleton data, the global attributes corresponding to all nodes, and the global attributes corresponding to all edges are input to the second attribute update layer.
  • the second attribute update layer updates the second attribute of the gesture skeleton data.
  • the second global attribute of the gesture skeleton data, the global attribute corresponding to all nodes, and the global attribute corresponding to all edges are spliced, and the splicing result is multiplied by the weight matrix of the fully connected layer or the convolution kernel corresponding to the convolution layer to obtain the gesture skeleton data
  • the third global attribute is used to obtain the gesture skeleton data.
  • the above attribute splicing process can refer to the splicing process of the second attribute of each edge, the second attribute of each pair of target nodes corresponding to each edge, and the second global attribute of the gesture skeleton data, which will not be repeated here. .
  • the third graph network block includes a third attribute update layer, the third attribute update layer is a fully connected layer or a convolutional layer, and the third global attribute of the gesture skeleton data is input to all
  • the third graph network block, the third graph network block updating the third global attribute, and outputting the initial global attribute of the gesture skeleton data includes:
  • the third global attribute of the gesture skeleton data is input to the third attribute update layer, and the third attribute update layer updates the third global attribute, and outputs the initial global attribute corresponding to the target gesture.
  • the third global attribute of the gesture skeleton data is input to the third attribute update layer, and the third attribute update layer updates the third global attribute of the gesture skeleton data. Specifically, it can refer to multiplying the third global attribute of the gesture skeleton data by the full connection.
  • the weight matrix of the layer or the convolution kernel corresponding to the convolution layer obtains the initial global attributes corresponding to the target gesture.
  • Step 307 Determine the gesture type corresponding to the target gesture according to the initial global attribute.
  • step S104 This step is the same as step S104.
  • step S104 For details, please refer to the related description of step S104, which will not be repeated here.
  • the initial global attributes before the initial global attributes are input to the classification model, the initial global attributes may be input to the output layer, and the initial global attributes processed by the output layer are input to the classification model for gesture recognition.
  • the output layer may be Fully connected layer to reduce the dimensionality of the initial global attributes and improve the robustness of the graph network.
  • Figure 4 shows an example of the gesture recognition process, including T (T is an integer greater than 1) group of gesture skeleton data, one group of gesture skeleton data corresponds to one graph network, and T group of gesture skeleton data corresponds to T graph networks, with different gestures
  • the skeleton data corresponds to different graph networks, each graph network includes the first graph network block, the second graph network block and the third graph network block; take the first set of gesture skeleton data as an example, u 1 is the first set of gesture skeleton data V 1 is the first attribute of any node of the first group of gesture skeleton data, E 1 is the first attribute of any edge of the first group of gesture skeleton data, in the first graph network block , Update the layer for the first attribute, Update the layer for the first node, Update the layer for the first side, Is the second global attribute of the first set of gesture skeleton data, Is the second attribute of any of the above nodes, Is the second attribute of any of the above edges; in the second graph network block, Update the layer for
  • the processed global attributes corresponding to the t-th group of gesture skeleton data (any group of gesture skeleton data in the T group of gesture skeleton data) Is the initial global attribute corresponding to the t-th group of gesture skeleton data, Is the weight matrix of the output layer.
  • each group of gesture skeleton data in the gesture skeleton data set is input into the graph network of this embodiment and the existing The neural network performs gesture recognition respectively, and calculates the gesture recognition accuracy rate of the graph network of this embodiment and the gesture recognition accuracy rate of the existing neural network for comparison.
  • Table 1 is a comparison table of the gesture recognition accuracy rate of the graph network in this embodiment and the gesture recognition accuracy rate of the existing neural network.
  • Parallel GNN is the graph network of this embodiment. It can be seen from Table 1 that the gesture recognition accuracy of the graph network of this embodiment exceeds that of the existing neural network.
  • Figure 5a shows an example of the confusion matrix of gesture classification on the gesture skeleton data set including 14 gesture types. It can be seen from Figure 5a that in the 14 gesture types, clockwise rotation, counterclockwise rotation, sliding to the right, The recognition accuracy of 9 gesture types such as shaking reached 100%, and the recognition accuracy of 3 gesture types such as tapping, opening, and sliding down reached 90%, which proves that the graph network of this embodiment is effective in recognizing common gestures. Type of validity.
  • Figure 5b is an example diagram of the confusion matrix of gesture classification on the gesture skeleton data set including 28 gesture types. It can be seen from Figure 5b that among the 28 gesture types, the graph network of this embodiment still has accurate recognition of 18 gesture types.
  • the 28 gesture types in Figure 5b are a further refinement of the 14 gesture types in Figure 5a.
  • Each gesture type in Figure 5a is subdivided into two gesture types according to the execution mode of the gesture.
  • the suffix 1 and 2 are added to distinguish different execution methods.
  • the suffix 1 indicates that the execution method is to use one finger to complete the gesture
  • the suffix 2 indicates that the execution method is to use the whole hand to complete the gesture.
  • Grabbing 1 in Figure 5b represents a grabbing gesture done with one finger
  • grabbing 2 represents a grabbing gesture done with the whole hand.
  • the embodiment of the application uses the graph network to perform gesture recognition on the target gesture, and can merge the node attributes and edge attributes of the gesture skeleton data corresponding to the target gesture into the global attributes of the gesture skeleton data, and obtain the target global attributes that can reflect the motion characteristics of the gesture. According to the global attributes of the target, the gesture type corresponding to the target gesture can be recognized. Compared with the existing neural network, the accuracy of gesture recognition is improved.
  • FIG. 6 is a schematic structural diagram of a gesture recognition device provided in Embodiment 3 of the present application. For ease of description, only parts related to the embodiment of the present application are shown.
  • the gesture recognition device includes:
  • the skeleton data acquisition module 61 is used to acquire the gesture skeleton data corresponding to the target gesture
  • the attribute data determining module 62 is configured to determine the hand attribute data corresponding to the target gesture according to the gesture skeleton data, where the hand attribute data is used to reflect the joint point characteristics and bone characteristics of the target gesture;
  • the initial attribute determining module 63 is configured to determine the initial global attribute corresponding to the target gesture according to the hand attribute data, where the initial global attribute is used to reflect the gesture characteristics of the target gesture;
  • the gesture type determining module 64 is configured to determine the gesture type corresponding to the target gesture according to the initial global attribute.
  • the gesture skeleton data includes position information of at least two nodes, and the at least two nodes are joint points of the gesture skeleton corresponding to the gesture skeleton data, and the attribute data determining module 62 includes:
  • a node attribute obtaining unit configured to obtain the first attribute of each node in the gesture skeleton data according to the position information of each node in the gesture skeleton data;
  • the edge attribute acquiring unit is configured to acquire the first attribute of each edge in the gesture skeleton data according to the position information of each pair of target nodes in the gesture skeleton data, where each pair of target nodes refers to satisfying a preset Two adjacent nodes of the condition, the two adjacent nodes are connected by an edge;
  • a global attribute acquiring unit configured to acquire the first global attribute of the gesture skeleton data
  • the attribute data determining unit is configured to determine that the first global attribute of the gesture skeleton data, the first attribute of each node, and the first attribute of each edge are the hand attribute data corresponding to the target gesture.
  • the target gesture corresponds to N sets of gesture skeleton data, where N is an integer greater than 1, and the sequence of the N sets of gesture skeleton data is determined according to the movement sequence of the target gesture, and the N sets of gesture skeleton data
  • the data includes the first group of gesture skeleton data and N-1 groups of non-first group of gesture skeleton data;
  • the node attribute acquiring unit includes:
  • a determining subunit configured to determine that the position information and preset motion speed of each node in the first set of gesture skeleton data are the first attribute of the node
  • the node attribute acquiring unit includes:
  • the obtaining subunit is used to obtain the first group of each node contained in each of the N-1 groups of non-first group gesture skeleton data according to the position information of each node contained in each of the N-1 groups of non-first group gesture skeleton data. Attributes;
  • the j-th group of non-first group of gesture skeleton data is any one of the N-1 groups of non-first group of gesture skeleton data, j is a positive integer less than or equal to N-1, and the j-th group of non-first group
  • the method for determining the first attribute of each node in the gesture skeleton data is:
  • the edge attribute obtaining unit is specifically configured to:
  • the length of each side and the rotation angle of the side in the gesture skeleton data are determined as the first attribute of the side.
  • the initial attribute determining module 63 is specifically configured to:
  • the first global attribute of the gesture skeleton data, the first attribute of each node, and the first attribute of each edge are input to the trained graph network for processing, and the graph network outputs the initial global corresponding to the target gesture Attributes.
  • the graph network includes a first graph network block, a second graph network block, and a third graph network block
  • the initial attribute determining module 63 includes:
  • the first update unit is configured to input the first global attribute of the gesture skeleton data, the first attribute of each node, and the first attribute of each edge to the first graph network block, and the first graph network
  • the block updates the first global attribute of the gesture skeleton data, the first attribute of each node, and the first attribute of each edge respectively, and outputs the second global attribute of the gesture skeleton data and the second attribute of each node. Attribute and the second attribute of each edge;
  • the second update unit is configured to input the second global attribute of the gesture skeleton data, the second attribute of each node, and the second attribute of each edge to the second graph network block, and the second graph network
  • the block updates and aggregates the second global attribute of the gesture skeleton data, the second attribute of each node, and the second attribute of each edge, and outputs the third global attribute of the gesture skeleton data;
  • the third update unit is configured to input the third global attribute of the gesture skeleton data to the third graph network block, and the third graph network block updates the third global attribute and outputs the target gesture The corresponding initial global attribute.
  • the first graph network block includes a first attribute update layer, a first node update layer, and a first edge update layer, the first attribute update layer, the first node update layer, and the first edge update layer.
  • the edge update layer is a fully connected layer or a convolutional layer, and the first update unit is specifically used for:
  • the second graph network block includes a second attribute update layer, a second node update layer, a second edge update layer, a first aggregation layer, a second aggregation layer, and a third aggregation layer.
  • the second attribute The update layer, the second node update layer, and the second edge update layer are fully connected layers or convolutional layers, and the second update unit is specifically configured to:
  • the second attribute of each edge in the gesture skeleton data, the second attribute of each pair of target nodes corresponding to each edge, and the second global attribute of the gesture skeleton data are input to the second edge update layer, so The second edge update layer updates the second attribute of each edge in the gesture skeleton data, and outputs the third attribute of each edge in the gesture skeleton data;
  • the second attribute of each node in the gesture skeleton data, the node attribute corresponding to the edge of each node, and the second global attribute of the gesture skeleton data are input to the second node update layer, and the second node
  • the update layer updates the second attribute of each node in the gesture skeleton data, and outputs the third attribute of each node in the gesture skeleton data;
  • the third attributes of all nodes in the gesture skeleton data are input to the second aggregation layer, and the second aggregation layer aggregates the third attributes of all nodes in the gesture skeleton data, and outputs the gesture skeleton data Global attributes corresponding to all nodes in
  • the third attributes of all edges in the gesture skeleton data are input to the third aggregation layer, which aggregates the third attributes of all edges in the gesture skeleton data, and outputs the gesture skeleton data Global attributes corresponding to all edges in;
  • the second global attributes of the gesture skeleton data, the global attributes corresponding to all nodes, and the global attributes corresponding to all edges are input to the second attribute update layer, and the second attribute update layer is for the second attribute update layer of the gesture skeleton data. 2.
  • the global attribute is updated, and the third global attribute of the gesture skeleton data is output.
  • the third graph network block includes a third attribute update layer, the third attribute update layer is a fully connected layer or a convolutional layer, and the third update unit is specifically configured to:
  • the third global attribute of the gesture skeleton data is input to the third attribute update layer, and the third attribute update layer updates the third global attribute, and outputs the initial global attribute corresponding to the target gesture.
  • the gesture recognition device further includes:
  • the image acquisition module is used to acquire one frame of gesture image or consecutive N frames of gesture image before acquiring the gesture skeleton data corresponding to the target gesture, where N is an integer greater than 1;
  • the skeleton data acquisition module 61 is specifically configured to:
  • the gesture type determining module 64 is specifically configured to:
  • the gesture type determining module 64 includes:
  • a first determining unit configured to determine a target global attribute according to at least two of the initial global attributes, wherein the target global attribute is used to reflect the motion characteristics of the target gesture;
  • the second determining unit is configured to determine the gesture type corresponding to the target gesture according to the global attribute of the target.
  • the second determining unit is specifically configured to:
  • the global attributes of the target are input to the classification model, and the gesture type corresponding to the target gesture is identified through the classification model.
  • the gesture recognition device provided in the embodiment of the present application can be applied to the foregoing method embodiment 1 and embodiment 2.
  • FIG. 7 is a schematic structural diagram of a terminal device provided in Embodiment 4 of the present application.
  • the terminal device 7 of this embodiment includes: a processor 70, a memory 71, and a computer program 72 stored in the memory 71 and running on the processor 70.
  • the processor 70 executes the computer program 72, the steps in the foregoing gesture recognition method embodiments are implemented.
  • the processor 70 executes the computer program 72, the functions of the modules/units in the foregoing device embodiments are realized.
  • the terminal device 7 may be a computing device such as a desktop computer, a notebook, a palmtop computer, and a cloud server.
  • the terminal device may include, but is not limited to, a processor 70 and a memory 71.
  • FIG. 7 is only an example of the terminal device 7 and does not constitute a limitation on the terminal device 7. It may include more or less components than those shown in the figure, or a combination of certain components, or different components.
  • the terminal device may also include input and output devices, network access devices, buses, and so on.
  • the memory 71 may be an internal storage unit of the terminal device 7, for example, a hard disk or a memory of the terminal device 7.
  • the memory 71 may also be an external storage device of the terminal device 7, such as a plug-in hard disk equipped on the terminal device 7, a smart media card (SMC), or a secure digital (SD) Card, Flash Card, etc. Further, the memory 71 may also include both an internal storage unit of the terminal device 7 and an external storage device.
  • the memory 71 is used to store the computer program and other programs and data required by the terminal device.
  • the memory 71 can also be used to temporarily store data that has been output or will be output.
  • the disclosed device/terminal device and method may be implemented in other ways.
  • the device/terminal device embodiments described above are merely illustrative.
  • the division of the modules or units is only a logical function division, and there may be other divisions in actual implementation, such as multiple units.
  • components can be combined or integrated into another system, or some features can be omitted or not implemented.
  • the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
  • the functional units in the various embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
  • the above-mentioned integrated unit can be implemented in the form of hardware or software functional unit.
  • the integrated module/unit is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer readable storage medium.
  • the present application implements all or part of the processes in the above-mentioned embodiments and methods, and can also be completed by instructing relevant hardware through a computer program.
  • the computer program can be stored in a computer-readable storage medium. When the program is executed by the processor, it can implement the steps of the foregoing method embodiments.
  • the computer program includes computer program code, and the computer program code may be in the form of source code, object code, executable file, or some intermediate forms.
  • the computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, U disk, mobile hard disk, magnetic disk, optical disk, computer memory, read-only memory (ROM, Read-Only Memory) , Random Access Memory (RAM, Random Access Memory), electrical carrier signal, telecommunications signal, and software distribution media, etc.
  • ROM Read-Only Memory
  • RAM Random Access Memory
  • electrical carrier signal telecommunications signal
  • software distribution media etc.
  • the content contained in the computer-readable medium can be appropriately added or deleted according to the requirements of the legislation and patent practice in the jurisdiction.
  • the computer-readable medium Does not include electrical carrier signals and telecommunication signals.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

一种手势识别方法、终端设备及计算机可读存储介质,适用于手势识别技术领域,该方法包括:获取目标手势对应的手势骨架数据(101);根据所述手势骨架数据,确定所述目标手势对应的手部属性数据(102),其中,所述手部属性数据用于反映所述目标手势的关节点特性和骨头特性;根据所述手部属性数据,确定所述目标手势对应的初始全局属性(103),其中,所述初始全局属性用于反映所述目标手势的手势特性;根据所述初始全局属性,确定所述目标手势对应的手势类型(104)。该方法可提高手势识别速度。

Description

手势识别方法、终端设备及计算机可读存储介质 技术领域
本申请属于手势识别技术领域,尤其涉及一种手势识别方法、终端设备及计算机可读存储介质。
背景技术
手势识别是一种新兴的人机交互方式,由于其对用户友好,交互方式比较自然,已应用到较多场景,例如手语理解、虚拟现实、机器人控制等。现有的手势识别方法是采用卷积神经网络,将手势图像输入至卷积神经网络进行特征提取,识别得到手势图像中手势所属类型,由于卷积神经网络需要对整个手势图像进行特征提取,导致其手势识别速度较慢。
技术问题
本申请提供了一种手势识别方法、终端设备及计算机可读存储介质,以提高手势识别速度。
技术解决方案
第一方面,本申请实施例提供了一种手势识别方法,所述手势识别方法包括:
获取目标手势对应的手势骨架数据;
根据所述手势骨架数据,确定所述目标手势对应的手部属性数据,其中,所述手部属性数据用于反映所述目标手势的关节点特性和骨头特性;
根据所述手部属性数据,确定所述目标手势对应的初始全局属性,其中,所述初始全局属性用于反映所述目标手势的手势特性;
根据所述初始全局属性,确定所述目标手势对应的手势类型。
第二方面,本申请实施例提供了一种手势识别装置,所述手势识别装置包括:
骨架数据获取模块,用于获取目标手势对应的手势骨架数据;
属性数据确定模块,用于根据所述手势骨架数据,确定所述目标手势对应的手部属性数据,其中,所述手部属性数据用于反映所述目标手势的关节点特性和骨头特性;
初始属性确定模块,用于根据所述手部属性数据,确定所述目标手势对应的初始全局属性,其中,所述初始全局属性用于反映所述目标手势的手势特性;
手势类型确定模块,用于根据所述初始全局属性,确定所述目标手势对应的手势类型。
第三方面,本申请实施例提供了一种终端设备,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机程序,所述处理器执行所述计算机程序时实现如上述第一方面所述手势识别方法的步骤。
第四方面,本申请实施例提供了一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,所述计算机程序被处理器执行时实现如上述第一方面所述手势识别方法的步骤。
第五方面,本申请实施例提供了一种计算机程序产品,当所述计算机程序产品在终端设备上运行时,使得所述终端设备执行如上述第一方面所述手势识别方法的步骤。
有益效果
由上可见,本申请通过使用手势骨架数据能够有效提取反映目标手势的姿态特性的手部属性数据,并根据手部属性数据提取反映目标手势的手势特性的初始全局属性,进而根据初始全局属性识别出目标手势的手势类型,相比于对整个手势图像进行特征提取,手势骨架数据的数据量较小,使用手势骨架数据进行手势识别,减少了手势识别过程中的数据计算量,提高了手势识别速度。
附图说明
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例或现有技术描述中所 需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。
图1是本申请实施例一提供的手势识别方法的实现流程示意图;
图2a是手势骨架中关节点示例图;图2b是动态手势示例图;
图3本申请实施例二提供的手势识别方法的实现流程示意图;
图4是手势识别过程示例图;
图5a是在包括14种手势类型的手势骨架数据集上手势分类的混淆矩阵示例图;图5b是在包括28种手势类型的手势骨架数据集上手势分类的混淆矩阵示例图;
图6是本申请实施例三提供的手势识别装置的结构示意图;
图7是本申请实施例四提供的终端设备的结构示意图。
本发明的实施方式
以下描述中,为了说明而不是为了限定,提出了诸如特定系统结构、技术之类的具体细节,以便透彻理解本申请实施例。然而,本领域的技术人员应当清楚,在没有这些具体细节的其它实施例中也可以实现本申请。在其它情况中,省略对众所周知的系统、装置、电路以及方法的详细说明,以免不必要的细节妨碍本申请的描述。
本申请实施例涉及的手势识别方法,可以是终端采集到手势图像后,终端对手势图像进行分析,识别出手势图像中的手势类型;也可以是终端将采集到的手势图像发送至服务器,服务器对手势图像进行分析,识别出手势图像中的手势类型。
应理解,本实施例中各步骤的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请实施例的实施过程构成任何限定。
为了说明本申请所述的技术方案,下面通过具体实施例来进行说明。
参见图1,是本申请实施例一提供的手势识别方法的实现流程示意图,该手势识别方法应用于终端设备,如图所示该手势识别方法可以包括以下步骤:
步骤101,获取目标手势对应的手势骨架数据。
在本申请实施例中,可以通过手势骨架检测装置获取目标手势对应的手势骨架数据,也可以从手势图像中获取目标手势对应的手势骨架数据,在此不作限定。其中,手势骨架检测装置是一种能够直接采集目标手势对应的手势骨架数据的装置,手势图像是包含目标手势的图像。手势骨架数据的数量可以为一组或者至少两组,在此不作限定。目标手势可以是指待进行手势识别的手势。手势是指手的姿势,是人在运用手部时所体现的各种姿势或动作。手势骨架数据是指手势骨架数据对应的手势骨架中关节点的位置信息,针对手势骨架数据可以建立一个坐标系,关节点的位置信息可以是指关节点在坐标系中的坐标,上述坐标系可以是二维坐标系,也可以是三维坐标系,在此不作限定。手势骨架中关节点是手势骨架的骨头之间的连接点,手势骨架通常包含21个关节点,如图2a所示是手势骨架中关节点示例图,图2a所示手势骨架中的符号“·”即为关节点,图2a中关节点在手势骨架中的位置信息即为手势骨架数据。
可选的,在获取目标手势对应的手势骨架数据之前,还包括:
获取一帧手势图像或者连续N帧手势图像,其中,N为大于1的整数;
相应的,所述获取目标手势对应的手势骨架数据包括:
根据所述一帧手势图像或者所述连续N帧手势图像,获取所述目标手势对应的手势骨架数据。
手势通常分为静态手势和动态手势。静态手势的识别主要考虑某个时间点上手势的外形特征,故在对静态手势进行识别时,可以通过图像采集装置获取一帧手势图像,基于一帧手势图像进行手势识别;动态手势的识别主要是考虑一段时间内的一系列动作,由一系列静态手势构成,故在对动态手势进行识别时,可以通过图像采集装置获取连续N帧手势 图像,基于连续N帧手势图像进行手势识别。需要说明的是,在本实施例中手势图像的获取方式可以是通过图像采集装置获取到的,也可以是通过服务器或者其他设备获取到的,在本实施例中不对手势图像的获取方式进行限定。
连续N帧手势图像可以是指图像采集装置以预设时间间隔采集的N帧手势图像,例如图像采集装置以0.05秒进行一次手势图像采集,直到采集N帧手势图像,上述N帧手势图像中任一相邻两帧手势图像的采集时间间隔均为0.05秒。
如果获取一帧手势图像,则可以从该一帧手势图像中获取一组手势骨架数据;如果获取连续N帧手势图像,则可以从该N帧手势图像中获取N组手势骨架数据,该N帧手势图像各自对应一组手势骨架数据。
步骤102,根据所述手势骨架数据,确定所述目标手势对应的手部属性数据。
其中,手部属性数据用于反映目标手势的关节点特性和骨头特性。
在本申请实施例中,可以从手势骨架的所有骨头中查找所有目标骨头,每个目标骨头是两个相邻关节点之间的骨头,可以将每个目标骨头的长度和相对于坐标轴的转动角度等作为每个目标骨头的属性,将手势骨架的每个关节点的位置信息和运动速度等作为每个关节点的属性,并确定目标手势的手势骨架中所有目标骨头的属性和所有关节点的属性为目标手势对应的手部属性数据。其中,上述关节点的属性即为关节点特性,上述目标骨头的属性即为骨头特性。
示例性的,以关节点1为原点,建立二维坐标系,X轴和Y轴的单位为cm,以图2a中关节点1和关节点3之间的目标骨头,以及关节点3为例,该目标骨头的长度为1.02cm,该目标骨头与X轴的转动角度为100°,与Y轴的转动角度为30°,那么该目标骨头的属性可以为(1.02,100°,30°),关节点3的坐标为(-0.2,1),运动速度为0.02m/s,那么关节点3的属性可以为(-0.2,1,0.02)。
步骤103,根据所述手部属性数据,确定所述目标手势对应的初始全局属性。
其中,初始全局属性用于反映目标手势的手势特性,目标手势的手势特性可以是指目标手势所呈现的手势形状,如图2a中手势的手势特性是张开。
在本申请实施例中,针对目标手势对应的手势骨架数据,可以预先设置手势骨架数据的全局属性,手势骨架数据的全局属性用于融合手势骨架数据的所有关节点的属性和所有目标骨头的属性,得到能够反映目标手势的手势特性的初始全局属性。其中,手势骨架数据的全局属性是指用于聚合手势骨架数据的所有关节点的属性和所有目标骨头的属性的特征,用户可以根据实际需要自行设置手势骨架数据的全局属性(例如设置为0),在此不作限定。
步骤104,根据所述初始全局属性,确定所述目标手势对应的手势类型。
在一种实现方式中,可以预先设置不同手势类型各自对应的初始全局属性,在确定目标手势对应的初始全局属性后,获取目标手势对应的初始全局属性与上述不同手势类型各自对应的初始全局属性的相似度,将相似度最大的手势类型作为目标手势对应的手势类型。其中,手势类型是目标手势所呈现的姿势,例如抓取、张开、摇动等。
在另一种实现方式中,可以使用已训练的分类模型识别目标手势对应的手势类型,在使用分类模型进行手势识别之前,需对分类模型进行训练,例如,可以先获取训练样本数据,将训练样本数据和标签(该标签是训练样本数据对应的正确的手势类型)输入分类模型中,对分类模型进行参数更新学习,不断减小目标函数(例如损失函数)的数值,当目标函数的数值小到可以达到准确率要求的时候,模型训练结束,得到已训练的分类模型。
手势通常分为静态手势和动态手势,静态手势的识别主要考虑某个时间点上手势的外形特征(即手势特性),动态手势的识别主要是考虑一段时间内的一系列动作,由一系列静态手势构成。
可选的,在目标手势为静态手势时,初始全局属性的数量为一个,所述根据所述初始全局属性,确定所述目标手势对应的手势类型包括:
将所述初始全局属性输入至已训练的分类模型,通过所述分类模型识别出所述目标手势对应的手势类型。
在目标手势是静态手势时,可以将能够反映目标手势的手势特性的一个初始全局属性直接输入已训练的分类模型进行手势识别,获得目标手势对应的手势类型。如图2a所示手势即为静态手势。
在目标手势为动态手势时,初始全局属性的数量为至少两个,所述根据所述初始全局属性,确定所述目标手势对应的手势类型包括:
根据至少两个所述初始全局属性,确定目标全局属性,其中,所述目标全局属性用于反映所述目标手势的运动特性;
根据所述目标全局属性,确定所述目标手势对应的手势类型。
在目标手势是动态手势时,获取至少两个初始全局属性,并将至少两个初始全局属性进行拼接,拼接结果即为能够反映目标手势的运动特性的目标全局属性,根据能够反映目标手势的运动特性的目标全局属性识别出目标手势对应的手势类型。其中,动态手势的识别主要是考虑一段时间内的一系列动作,由一系列静态手势构成,如图2b所示是动态手势示例图,该动态手势由8帧静态手势构成。例如,初始全局属性的数量为8个,分别为
Figure PCTCN2020130575-appb-000001
Figure PCTCN2020130575-appb-000002
为第一组手势骨架数据对应的初始全局属性,
Figure PCTCN2020130575-appb-000003
为第二组手势骨架数据对应的初始全局属性,
Figure PCTCN2020130575-appb-000004
为第三组手势骨架数据对应的初始全局属性,
Figure PCTCN2020130575-appb-000005
为第四组手势骨架数据对应的初始全局属性,
Figure PCTCN2020130575-appb-000006
为第五组手势骨架数据对应的初始全局属性,
Figure PCTCN2020130575-appb-000007
为第六组手势骨架数据对应的初始全局属性,
Figure PCTCN2020130575-appb-000008
为第七组手势骨架数据对应的初始全局属性,
Figure PCTCN2020130575-appb-000009
为第八组手势骨架数据对应的初始全局属性,将上述8个初始全局属性进行拼接,得到目标全局属性为
Figure PCTCN2020130575-appb-000010
对于动态手势,在一种实现方式中,可以预先设置不同手势类型各自对应的目标全局属性,在确定目标手势对应的目标全局属性后,获取目标手势对应的目标全局属性分别与上述不同手势类型各自对应的目标全局属性的相似度,将相似度最大的手势类型作为目标手势对应的手势类型。以抓取、张开、摇动、敲击等四种不同手势类型为例,目标手势对应的目标全局属性与抓取的目标全局属性的相似度为80%,与张开的目标全局属性的相似度为10%,与摇动的目标全局属性的相似度为5%,与敲击的目标全局属性的相似度为5%,可见,目标手势对应的目标全局属性与抓取的目标全局属性的相似度最大,故确定目标手势为抓取。
在另一种实现方式中,将目标全局属性输入至已训练的分类模型,通过该分类模型识别出目标手势对应的手势类型。
上述已训练的分类模型是根据多个训练样本训练得到的,每组训练样本包括一个目标全局属性和该目标全局属性对应的手势类型。
本申请实施例通过使用手势骨架数据能够有效提取反映目标手势的姿态特性的手部属性数据,并根据手部属性数据提取反映目标手势的手势特性的初始全局属性,进而根据初始全局属性识别出目标手势的手势类型,相比于对整个手势图像进行特征提取,手势骨架数据的数据量较小,使用手势骨架数据进行手势识别,减少了手势识别过程中的数据计算量,提高了手势识别速度。
参见图3,是本申请实施例二提供的手势识别方法的实现流程示意图,该手势识别方 法应用于终端设备,如图所示该手势识别方法可以包括以下步骤:
步骤301,获取目标手势对应的手势骨架数据。
该步骤与步骤S101相同,具体可参见步骤S101的相关描述,在此不再赘述。
步骤302,根据所述手势骨架数据中每个节点的位置信息,获取所述手势骨架数据中每个节点的第一属性。
其中,手势骨架数据包括至少两个节点的位置信息,上述至少两个节点均为手势骨架数据对应的手势骨架的关节点。针对手势骨架数据建立一个坐标系,手势骨架数据中每个节点的位置信息是每个节点在坐标系中的坐标,每个节点的第一属性可以是指根据每个节点的位置信息确定的节点的属性,包括但不限于节点的位置信息和节点的运动速度等。需要说明的是,每个节点均有各自对应的第一属性。
可选的,所述目标手势对应N组手势骨架数据,N为大于1的整数,所述N组手势骨架数据的排列顺序是根据所述目标手势的运动顺序确定的,所述N组手势骨架数据包括首组手势骨架数据和N-1组非首组手势骨架数据;
对于所述首组手势骨架数据,所述根据所述手势骨架数据中每个节点的位置信息,获取所述手势骨架数据中每个节点的第一属性包括:
确定所述首组手势骨架数据中每个节点的位置信息和预设运动速度为该节点的第一属性。
其中,在目标手势对应N组手势骨架数据时,目标手势是动态手势,对应一个手势运动过程,按照N组手势骨架数据在此手势运动过程中获取的先后顺序对N组手势骨架数据进行排序,例如图2b中的动态手势包括8帧静态手势,分别为b1、b2、b3、b4、b5、b6、b7、b8,上述8帧静态手势按照顺序b1、b2、b3、b4、b5、b6、b7、b8完成一个动态手势,那么此动态手势对应的8组手势骨架数据的排序顺序即为b1、b2、b3、b4、b5、b6、b7、b8。
在计算一组手势骨架数据中节点的运动速度时,需要根据节点在该组手势骨架数据中的位置信息和在前一组手势骨架数据中的位置信息进行计算,如果该组手势骨架数据是首组手势骨架数据,其并不存在前一组手势骨架数据,故可以将预设运动速度作为首组手势骨架数据中节点的运动速度;如果该组手势骨架数据是非首组手势骨架数据,其存在前一组手势骨架数据,故可以根据节点在该组手势骨架数据中的位置信息和前一组手势骨架数据中的位置信息计算非首组手势骨架数据中节点的运动速度。
预设运动速度是预先设置的运动速度,用户可以根据实际需要自行设定该运动速度的取值,例如将预设运动速度设置为零。
对于所述N-1组非首组手势骨架数据,所述根据所述手势骨架数据中每个节点的位置信息,获取所述手势骨架数据中每个节点的第一属性包括:
根据所述N-1组非首组手势骨架数据各自包含的每个节点的位置信息,获取所述N-1组非组手势骨架数据各自包含的每个节点的第一属性;
其中,第j组非首组手势骨架数据为所述N-1组非首组手势骨架数据中的任一组,j为小于或等于N-1的正整数,所述第j组非首组手势骨架数据中每个节点的第一属性的确定方式为:
根据所述第j组非首组手势骨架数据中每个节点的位置信息和该节点在j-1组非首组手势骨架数据中的位置信息,获取所述第j组非首组手势骨架数据中每个节点的运动速度;
确定所述第j组非首组手势骨架数据中每个节点的位置信息和该节点的运动速度为该节点的第一属性。
其中,在获取第j组非首组手势骨架数据中第i个节点(即第j组非首组手势骨架数据中的任一节点)的运动速度时,需要先获取第j组非首组手势骨架数据和第j-1组非首组手势骨架数据之间的采集时间间隔,计算第i个节点在第j组非首组手势骨架数据中的位置信息与在第j-1组非首组手势骨架数据中的位置信息之间的差值,该差值除以采集时间间隔的 所得值即为第i个节点的运动速度。在手势骨架数据的坐标轴为二维坐标系时,第i个节点在第j组非首组手势骨架数据中的位置信息是二维坐标
Figure PCTCN2020130575-appb-000011
在第j-1组非首组手势骨架数据中的位置信息是二维坐标
Figure PCTCN2020130575-appb-000012
第i个节点在X轴方向的速度为
Figure PCTCN2020130575-appb-000013
在Y轴方向的速度为
Figure PCTCN2020130575-appb-000014
即为第i个节点的运动速度为
Figure PCTCN2020130575-appb-000015
在手势骨架数据的坐标轴为三维坐标系时,第i个节点在第j组非首组手势骨架数据中的位置信息是三维坐标
Figure PCTCN2020130575-appb-000016
在第j-1组非首组手势骨架数据中的位置信息是三维坐标
Figure PCTCN2020130575-appb-000017
第i个节点在X轴方向的速度为
Figure PCTCN2020130575-appb-000018
在Y轴方向的速度为
Figure PCTCN2020130575-appb-000019
在Z轴方向的速度为
Figure PCTCN2020130575-appb-000020
即第i个节点的运动速度为
Figure PCTCN2020130575-appb-000021
Δt为相邻两组手势骨架数据之间的采集时间间隔。其中,上述x、y、z的上标表示非首组手势骨架数据的组号,下标表示节点号,例如上标为j表示第j组非首组手势骨架数据,下标为i表示第i个节点;上述
Figure PCTCN2020130575-appb-000022
中的上标j表示非首组手势骨架数据的组号(即第j组非首组手势骨架数据),下标x表示X轴,y表示Y轴,z表示Z轴,下标i表示节点号(即第i个节点)。
需要说明的是,在j取值为1时,第j-1组非首组手势骨架数据(即第零组非首组手势骨架数据)是指首组手势骨架数据。
以手势骨架数据的坐标轴为三维坐标系为例,Δt为相邻两组手势骨架数据之间的采集时间间隔,连续采集4组手势骨架数据,根据采集时间的先后顺序可以分别称之为首组手势骨架数据、第一组非首组手势骨架数据、第二组非首组手势骨架数据和第三组非首组手势骨架数。对于首组手势骨架数据中的第i个节点(该第i个节点为首组手势骨架数据中的任意一个节点),确定该节点在首组手势骨架数据中的位置信息
Figure PCTCN2020130575-appb-000023
和预设运动速度为该节点的第一属性;对于第一组非首组手势骨架数据中的第i个节点(该第i个节点为第一组非首组手势骨架数据中的任意一个节点),该节点在第一组非首组手势骨架数据中的位置信息为
Figure PCTCN2020130575-appb-000024
在首组手势骨架数据中的位置信息为
Figure PCTCN2020130575-appb-000025
则计算得到该节点在X轴方向的速度为
Figure PCTCN2020130575-appb-000026
在Y轴方向的速度为
Figure PCTCN2020130575-appb-000027
在Z轴方向的速度为
Figure PCTCN2020130575-appb-000028
即该节点的运动速度为
Figure PCTCN2020130575-appb-000029
确定该节点在第一组非首组手势骨架数据中的位置信息
Figure PCTCN2020130575-appb-000030
和运动速度
Figure PCTCN2020130575-appb-000031
为该节点的第一属性;对于第二组非 首组手势骨架数据中的第i个节点(该第i个节点为第二组非首组手势骨架数据中的任意一个节点),该节点在第二组非首组手势骨架数据中的位置信息为
Figure PCTCN2020130575-appb-000032
在第一组非首组手势骨架数据中的位置信息为
Figure PCTCN2020130575-appb-000033
则计算得到该节点在X轴方向的速度为
Figure PCTCN2020130575-appb-000034
在Y轴方向的速度为
Figure PCTCN2020130575-appb-000035
在Z轴方向的速度为
Figure PCTCN2020130575-appb-000036
即该节点的运动速度为
Figure PCTCN2020130575-appb-000037
确定该节点在第二组非首组手势骨架数据中的位置信息
Figure PCTCN2020130575-appb-000038
和运动速度
Figure PCTCN2020130575-appb-000039
为该节点的第一属性;对于第三组非首组手势骨架数据中的第i个节点(该第i个节点为第三组非首组手势骨架数据中的任意一个节点),该节点在第三组非首组手势骨架数据中的位置信息为
Figure PCTCN2020130575-appb-000040
在第二组非首组手势骨架数据中的位置信息为
Figure PCTCN2020130575-appb-000041
则计算得到该节点在X轴方向的速度为
Figure PCTCN2020130575-appb-000042
在Y轴方向的速度为
Figure PCTCN2020130575-appb-000043
在Z轴方向的速度为
Figure PCTCN2020130575-appb-000044
即该节点的运动速度为
Figure PCTCN2020130575-appb-000045
确定该节点在第三组非首组手势骨架数据中的位置信息
Figure PCTCN2020130575-appb-000046
和运动速度
Figure PCTCN2020130575-appb-000047
为该节点的第一属性。其中,
Figure PCTCN2020130575-appb-000048
中的上标0表示首组手势骨架数据。
步骤303,根据所述手势骨架数据中每对目标节点的位置信息,获取所述手势骨架数据中每条边的第一属性。
其中,所述每对目标节点是指满足预设条件的两个相邻节点,所述两个相邻节点之间通过一条边进行连接。
上述预设条件与手势骨架的生物特征有关,可以是指位于手势骨架中一块骨头的两个端点的节点。上述两个相邻节点之间的一条边是指上述两个相邻节点之间的一块骨头,如图2a所示,图2a中节点1和节点2是一对目标节点,节点1和节点3也是一对目标节点,而节点2和节点3不是一对目标节点。
可选的,所述根据所述手势骨架数据中每对目标节点的位置信息,获取所述手势骨架数据中每条边的第一属性包括:
根据所述手势骨架数据中每对目标节点的位置信息,获取所述手势骨架数据中每条边的长度和该边的转动角度;
确定所述手势骨架数据中每条边的长度和该边的转动角度为该边的第一属性。
其中,每对目标节点的位置信息是指每对目标节点中每个目标节点在手势骨架数据中的位置信息,如图2a中节点1和节点2组成的一对目标节点,该对目标节点的位置信息是指节点1的位置信息和节点2的位置信息。手势骨架数据中每条边的转动角度是指每条边与坐标系中各坐标轴之间的夹角。
以第t组手势骨架数据(如果手势骨架数据的数量为一组,该组手势骨架数据即为第t组手势骨架数据;如果手势骨架数据的数量为N,第t组手势骨架数据即为N组手势骨架数据中的任一组手势骨架数据)和三维坐标系为例,对于手势骨架数据中任一对目标节点,该对目标节点中的两个目标节点分别用第i个节点和第g个节点表示,第i个节点的位置信 息为
Figure PCTCN2020130575-appb-000049
第g个节点的位置信息为
Figure PCTCN2020130575-appb-000050
那么该对目标节点对应的一条边(例如第k条边)的长度
Figure PCTCN2020130575-appb-000051
该边与X轴之间的夹角
Figure PCTCN2020130575-appb-000052
该边与Y轴之间的夹角
Figure PCTCN2020130575-appb-000053
该边与Z轴之间的夹角
Figure PCTCN2020130575-appb-000054
步骤304,获取所述手势骨架数据的第一全局属性。
其中,可以预先设置手势骨架数据的第一全局属性,例如,将手势骨架数据的第一全局属性设置为零。可选的,用户可以根据实际需要自行设定手势骨架数据的第一全局属性,在此不作限定。
步骤305,确定所述手势骨架数据的第一全局属性、每个节点的第一属性和每条边的第一属性为所述目标手势对应的手部属性数据。
如图2a所示,一组手势骨架数据包括21个节点,21个节点中存在21对目标节点,那么一组手势骨架数据包括21条边,可以确定一组手势骨架数据的第一全局属性、21个节点中各节点的第一属性、21条边中各边的第一属性为目标手势对应的手部属性数据。
步骤306,根据所述手部属性数据,确定所述目标手势对应的初始全局属性。
该步骤与步骤S103相同,具体可参见步骤S103的相关描述,在此不再赘述。
可选的,所述根据所述手部属性数据,确定所述目标手势对应的初始全局属性包括:
将所述手势骨架数据的第一全局属性、每个节点的第一属性和每条边的第一属性输入至已训练的图网络进行处理,所述图网络输出所述目标手势对应的初始全局属性。
其中,图网络是一种用于对图数据进行操作和计算的神经网络,本实施例将手部属性数据作为图数据输入至已训练的图网络进行处理,可以得到目标手势对应的能够反映手势特性的初始全局属性。
在使用图网络对手部属性数据进行处理之前,需要先对图网络进行训练,可以采用监督式的训练,损失函数定义为交叉熵损失,使用反向传播算法计算梯度,采用优化器对图网络进行训练,其中优化器可以是随机梯度下降(Stochastic Gradient Descent,SGD)、Adam、Momentum等常用的优化器,通过最小化损失函数实现训练,图网络训练到收敛之后保存模型参数,获得已训练的图网络。
可选的,所述图网络包括第一图网络块、第二图网络块和第三图网络块,所述将所述手势骨架数据的第一全局属性、每个节点的第一属性和每条边的第一属性输入至已训练的图网络进行处理,所述图网络输出所述目标手势对应的初始全局属性包括:
将所述手势骨架数据的第一全局属性、每个节点的第一属性和每条边的第一属性输入至所述第一图网络块,所述第一图网络块分别对所述手势骨架数据的第一全局属性、每个节点的第一属性和每条边的第一属性进行更新,输出所述手势骨架数据的第二全局属性、每个节点的第二属性和每条边的第二属性;
将所述手势骨架数据的第二全局属性、每个节点的第二属性和每条边的第二属性输入至所述第二图网络块,所述第二图网络块对所述手势骨架数据的第二全局属性、每个节点的第二属性和每条边的第二属性进行更新和聚合,输出所述手势骨架数据的第三全局属性;
将所述手势骨架数据的第三全局属性输入至所述第三图网络块,所述第三图网络块对所述第三全局属性进行更新,输出所述目标手势对应的初始全局属性。
其中,手势骨架数据的第二全局属性是使用第一图网络块对手势骨架数据的第一全局属性进行更新后所得属性;手势骨架数据中每个节点的第二属性是使用第一图网络块对每 个节点的第一属性进行更新后所得属性;手势骨架数据中每条边的第二属性是使用第一图网络块对每条边的第一属性进行更新后所得属性。手势骨架数据的第三全局属性是使用第二图网络块对手势骨架数据的第二全局属性、每个节点的第二属性和每条边的第二属性进行更新和聚合后所得属性。目标手势对应的初始全局属性是使用第三图网络对手势骨架数据的第三全局属性进行更新后所得属性。
可选的,所述第一图网络块包括第一属性更新层、第一节点更新层和第一边更新层,所述第一属性更新层、所述第一节点更新层和所述第一边更新层为全连接层或者卷积层,将所述手势骨架数据的第一全局属性、每个节点的第一属性和每条边的第一属性输入至所述第一图网络块,所述第一图网络块分别对所述手势骨架数据的第一全局属性、每个节点的第一属性和每条边的第一属性进行更新,输出所述手势骨架数据的第二全局属性、每个节点的第二属性和每条边的第二属性包括:
将所述手势骨架数据的第一全局属性输入至所述第一属性更新层,对所述手势骨架数据的第一全局属性进行更新,输出所述手势骨架数据的第二全局属性;
将所述手势骨架数据的每个节点的第一属性输入至所述第一节点更新层,对所述手势骨架数据的每个节点的第一属性进行更新,输出所述手势骨架数据的每个节点的第二属性;
将所述手势骨架数据的每条边的第一属性输入至所述第一边更新层,对所述手势骨架数据的每条边的第一属性进行更新,输出所述手势骨架数据的每条边的第二属性。
在本申请实施例中,第一图网络块中的第一属性更新层、第一节点更新层和第一边更新层可以为全连接层,也可以为卷积层。若第一属性更新层为全连接层,则第一属性更新层对手势骨架数据的第一全局属性进行更新可以是指将第一全局属性乘以全连接层的权重矩阵,所得结果即为手势骨架数据的第二全局属性;若第一属性更新层为卷积层,则第一属性更新层对手势骨架数据的第一全局属性进行更新可以是指使用卷积核对第一全局属性进行卷积操作,所得结果即为手势骨架数据的第二全局属性。若第一节点更新层为全连接层,则第一节点更新层对手势骨架数据的每个节点的第一属性进行更新可以是指将每个节点的第一属性乘以全连接层的权重矩阵,所得结果即为每个节点的第二属性;若第一节点更新层为卷积层,则第一节点更新层对手势骨架数据的每个节点的第一属性进行更新可以是指使用卷积核对每个节点的第一属性进行卷积操作,所得结果即为每个节点的第二属性。若第一边更新层为全连接层,则第一边更新层对手势骨架数据的每条边的第一属性进行更新可以是指将每条边的第一属性乘以全连接层的权重矩阵,所得结果即为每条边的第二属性;若第一边更新层为卷积层,则第一边更新层对手势骨架数据的每条边的第一属性进行更新可以是指使用卷积核对每条边的第一属性进行卷积操作,所得结果即为每条边的第二属性。
可选的,所述第二图网络块包括第二属性更新层、第二节点更新层、第二边更新层、第一聚合层、第二聚合层和第三聚合层,所述第二属性更新层、所述第二节点更新层和所述第二边更新层为全连接层或者卷积层,所述将所述手势骨架数据的第二全局属性、每个节点的第二属性和每条边的第二属性输入至所述第二图网络块,所述第二图网络块对所述手势骨架数据的第二全局属性、每个节点的第二属性和每条边的第二属性进行更新和聚合,输出所述手势骨架数据的第三全局属性包括:
将所述手势骨架数据中每条边的第二属性、每条边对应的每对目标节点的第二属性、所述手势骨架数据的第二全局属性输入至所述第二边更新层,所述第二边更新层对所述手势骨架数据中每条边的第二属性进行更新,输出所述手势骨架数据中每条边的第三属性;
将所述手势骨架数据中每个节点所属边的第三属性输入至所述第一聚合层,所述第一聚合层对所述手势骨架数据中每个节点所属边的第三属性进行聚合,输出所述手势骨架数据中每个节点所属边对应的节点属性;
将所述手势骨架数据中每个节点的第二属性、每个节点所属边对应的节点属性和所述手势骨架数据的第二全局属性输入至所述第二节点更新层,所述第二节点更新层对所述手 势骨架数据中每个节点的第二属性进行更新,输出所述手势骨架数据中每个节点的第三属性;
将所述手势骨架数据中所有节点的第三属性输入至所述第二聚合层,所述第二聚合层对所述手势骨架数据中所有节点的第三属性进行聚合,输出所述手势骨架数据中所有节点对应的全局属性;
将所述手势骨架数据中所有边的第三属性输入至所述第三聚合层,所述第三聚合层对所述手势骨架数据中所有边的第三属性进行聚合,输出所述手势骨架数据中所有边对应的全局属性;
将所述手势骨架数据的第二全局属性、所有节点对应的全局属性和所有边对应的全局属性输入至所述第二属性更新层,所述第二属性更新层对所述手势骨架数据的第二全局属性进行更新,输出所述手势骨架数据的第三全局属性。
其中,将手势骨架数据中每条边的第二属性、每条边对应的每对目标节点的第二属性、手势骨架数据的第二全局属性输入至第二边更新层,第二边更新层对每条边的第二属性进行更新具体可以是指将手势骨架数据中每条边的第二属性、每条边对应的每对目标节点的第二属性、手势骨架数据的第二全局属性进行拼接,拼接结果乘以全连接层的权重矩阵或者卷积层的卷积核,得到每条边的第三属性。以手势骨架数据中第k条边为例介绍属性拼接过程,第k条边的第二属性为
Figure PCTCN2020130575-appb-000055
第k条边对应的两个目标节点的第二属性分别为
Figure PCTCN2020130575-appb-000056
Figure PCTCN2020130575-appb-000057
手势骨架数据的第二全局属性为h 2,将第k条边的第二属性
Figure PCTCN2020130575-appb-000058
第k条边对应的两个目标节点的第二属性
Figure PCTCN2020130575-appb-000059
Figure PCTCN2020130575-appb-000060
以及手势骨架数据的第二全局属性h 2进行拼接,拼接后所得属性为
Figure PCTCN2020130575-appb-000061
需要说明的是,手势骨架数据中每条边的第二属性、每条边对应的每对目标节点的第二属性、手势骨架数据的第二全局属性的拼接顺序在此不作限定,例如拼接后所得属性
Figure PCTCN2020130575-appb-000062
为或者
Figure PCTCN2020130575-appb-000063
第一聚合层为求和层,将手势骨架数据中每个节点所属边的第三属性输入至第一聚合层,第一聚合层对每个节点所属边的第三属性进行求和,求和结果即为每个节点所属边对应的节点属性,例如手势骨架数据中第i个节点所属边为第k条边和第n条边,第k条边的第三属性为
Figure PCTCN2020130575-appb-000064
第n条边的第三属性为
Figure PCTCN2020130575-appb-000065
那么第一聚合层对第k条边的第三属性
Figure PCTCN2020130575-appb-000066
和第n条边的第三属性
Figure PCTCN2020130575-appb-000067
进行求和,第i个节点所属边对应的节点属性为
Figure PCTCN2020130575-appb-000068
其中,
Figure PCTCN2020130575-appb-000069
为第n条边对应的两个目标节点(分别为第i个节点和第d个节点)的第二属性。
将手势骨架数据中每个节点的第二属性、每个节点所属边对应的节点属性和手势骨架数据的第二全局属性输入至第二边更新层,第二边更新层对每个节点的第二属性进行更新具体可以是指将手势骨架数据中每个节点的第二属性、每个节点所属边对应的节点属性和手势骨架数据的第二全局属性进行拼接,拼接结果乘以全连接层的权重矩阵或者卷积层对应的卷积核,得到每个节点的第三属性。需要说明的是,上述属性拼接过程可参考每条边的第二属性、每条边对应的每对目标节点的第二属性以及手势骨架数据的第二全局属性的拼接过程,在此不再赘述。
第二聚合层为求和层,将手势骨架数据中所有节点的第三属性输入至第二聚合层,第二聚合层对所有节点的第三属性进行求和,求和结果即为所有节点对应的全局属性。
第三聚合层为求和层,将手势骨架数据中所有边的第三属性输入至第三聚合层,第三聚合层对所有边的第三属性进行求和,求和结果即为所有边对应的全局属性。
将手势骨架数据的第二全局属性、所有节点对应的全局属性和所有边对应的全局属性输入至第二属性更新层,第二属性更新层对手势骨架数据的第二属性进行更新具体可以是指将手势骨架数据的第二全局属性、所有节点对应的全局属性和所有边对应的全局属性进行拼接,拼接结果乘以全连接层的权重矩阵或者卷积层对应的卷积核,得到手势骨架数据的第三全局属性。需要说明的是,上述属性拼接过程可参考每条边的第二属性、每条边对应的每对目标节点的第二属性以及手势骨架数据的第二全局属性的拼接过程,在此不再赘述。
可选的,所述第三图网络块包括第三属性更新层,所述第三属性更新层为全连接层或者卷积层,所述将所述手势骨架数据的第三全局属性输入至所述第三图网络块,所述第三图网络块对所述第三全局属性进行更新,输出所述手势骨架数据的初始全局属性包括:
将所述手势骨架数据的第三全局属性输入至所述第三属性更新层,所述第三属性更新层对所述第三全局属性进行更新,输出所述目标手势对应的初始全局属性。
将手势骨架数据的第三全局属性输入至第三属性更新层,第三属性更新层对手势骨架数据的第三全局属性进行更新具体可以是指将手势骨架数据的第三全局属性乘以全连接层的权重矩阵或者卷积层对应的卷积核,得到目标手势对应的初始全局属性。
步骤307,根据所述初始全局属性,确定所述目标手势对应的手势类型。
该步骤与步骤S104相同,具体可参见步骤S104的相关描述,在此不再赘述。
在本申请实施例中,在将初始全局属性输入分类模型之前,可以先将初始全局属性输入至输出层,将通过输出层处理的初始全局属性输入至分类模型进行手势识别,该输出层可以为全连接层,以对初始全局属性进行降维,提高图网络的鲁棒性。
如图4所示是手势识别过程示例图,包括T(T为大于1的整数)组手势骨架数据,一组手势骨架数据对应一个图网络,T组手势骨架数据对应T个图网络,不同手势骨架数据对应不同图网络,每个图网络包括第一图网络块、第二图网络块和第三图网络块;以第一组手势骨架数据为例说明,u 1为第一组手势骨架数据的第一全局属性,V 1为第一组手势骨架数据的任一个节点的第一属性,E 1为第一组手势骨架数据的任一条边的第一属性,在第一个图网络块中,
Figure PCTCN2020130575-appb-000070
为第一属性更新层,
Figure PCTCN2020130575-appb-000071
为第一节点更新层,
Figure PCTCN2020130575-appb-000072
为第一边更新层,
Figure PCTCN2020130575-appb-000073
为第一组手势骨架数据的第二全局属性,
Figure PCTCN2020130575-appb-000074
为上述任一个节点的第二属性,
Figure PCTCN2020130575-appb-000075
为上述任一条边的第二属性;在第二个图网络块中,
Figure PCTCN2020130575-appb-000076
为第二属性更新层,
Figure PCTCN2020130575-appb-000077
为第二节点更新层,
Figure PCTCN2020130575-appb-000078
为第二边更新层,ρ e→v为第一聚合层,ρ v→u为第二聚合层,ρ e→u为第三聚合层,
Figure PCTCN2020130575-appb-000079
为第一组手势骨架数据的第三全局属性;在第三图网络块中,
Figure PCTCN2020130575-appb-000080
为第三属性更新层,
Figure PCTCN2020130575-appb-000081
为第一组手势骨架数据对应的初始全局属性,可以先将第一组手势骨架数据对应的初始全局属性输入至输出层(即图4中output),输出处理后全局属性
Figure PCTCN2020130575-appb-000082
T组手势骨架数据对应T个处理后全局属性,将T个处理后全局属性进行拼接,得到目标全局属性
Figure PCTCN2020130575-appb-000083
将目标全局属性输入至已训练的分类模型,即可得到目标手势对应的手势类型。其中,第t组手势骨架数据(为T组手势骨架数据中的任一组手势骨架数据)对应的处理后全局属性
Figure PCTCN2020130575-appb-000084
Figure PCTCN2020130575-appb-000085
为第t组手势骨架数据对应的初始全局属性,
Figure PCTCN2020130575-appb-000086
为输出层的权重矩阵。
以包括14种手势类型的手势骨架数据集和包括28种手势类型的手势骨架数据集为例,将该手势骨架数据集中的每组手势骨架数据分别输入至本实施例的图网络以及现有的神经网络,分别进行手势识别,并计算本实施例的图网络的手势识别准确率以及现有的神经网络的手势识别准确率进行对比。表1是本实施例的图网络的手势识别准确率与现有的神经网络的手势识别准确率的对比表。其中Parallel GNN为本实施例的图网络,由表1可知,本实施例的图网络的手势识别准确率超过了现有的神经网络。
表1
Figure PCTCN2020130575-appb-000087
如图5a所示是在包括14种手势类型的手势骨架数据集上手势分类的混淆矩阵示例图,由图5a可知,在14种手势类型中,顺时针旋转、逆时针旋转、向右滑动、摇动等9种手势类型的识别准确率达到了100%,敲击、张开和向下滑动等3种手势类型的识别准确率达到了90%,证明了本实施例的图网络对于识别常见手势类型的有效性。图5b是在包括28种手势类型的手势骨架数据集上手势分类的混淆矩阵示例图,由图5b可知,在28种手势类型中,本实施例的图网络依然有18种手势类型的识别准确率达到了100%,5种手势类型的识别准确率达到80%,证明了在手势类型之间相似性进一步提高的情况下,本实施例的图网络仍可较为准确地进行手势识别。其中,图5b中28种手势类型是对图5a中14种手势类型的进一步细化,是按照手势的执行方式将图5a中每种手势类型细化为两种手势类型,通过在手势类型之后添加后缀1和2区分不同的执行方式,后缀1表示执行方式为使用一根手指完成手势,后缀2表示执行方式为使用整只手完成手势。如图5b中的抓取1表示使用一根手指完成的抓取手势,抓取2表示使用整只手完成的抓取手势。
本申请实施例通过使用图网络对目标手势进行手势识别,能够将目标手势对应的手势骨架数据的节点属性和边属性融合到手势骨架数据的全局属性中,获得能够反映手势运动特性的目标全局属性,根据该目标全局属性即可识别出目标手势对应的手势类型,相比于现有的神经网络,提高了手势识别的准确率。
参见图6,是本申请实施例三提供的手势识别装置的结构示意图,为了便于说明,仅示出了与本申请实施例相关的部分。
所述手势识别装置包括:
骨架数据获取模块61,用于获取目标手势对应的手势骨架数据;
属性数据确定模块62,用于根据所述手势骨架数据,确定所述目标手势对应的手部属性数据,其中,所述手部属性数据用于反映所述目标手势的关节点特性和骨头特性;
初始属性确定模块63,用于根据所述手部属性数据,确定所述目标手势对应的初始全局属性,其中,所述初始全局属性用于反映所述目标手势的手势特性;
手势类型确定模块64,用于根据所述初始全局属性,确定所述目标手势对应的手势类型。
可选的,所述手势骨架数据包括至少两个节点的位置信息,所述至少两个节点均为手势骨架数据对应的手势骨架的关节点,所述属性数据确定模块62包括:
节点属性获取单元,用于根据所述手势骨架数据中每个节点的位置信息,获取所述手势骨架数据中每个节点的第一属性;
边属性获取单元,用于根据所述手势骨架数据中每对目标节点的位置信息,获取所述手势骨架数据中每条边的第一属性,其中,所述每对目标节点是指满足预设条件的两个相邻节点,所述两个相邻节点之间通过一条边进行连接;
全局属性获取单元,用于获取所述手势骨架数据的第一全局属性;
属性数据确定单元,用于确定所述手势骨架数据的第一全局属性、每个节点的第一属性和每条边的第一属性为所述目标手势对应的手部属性数据。
可选的,所述目标手势对应N组手势骨架数据,N为大于1的整数,所述N组手势骨架数据的排列顺序是根据所述目标手势的运动顺序确定的,所述N组手势骨架数据包括首组手势骨架数据和N-1组非首组手势骨架数据;
对于所述首组手势骨架数据,所述节点属性获取单元包括:
确定子单元,用于确定所述首组手势骨架数据中每个节点的位置信息和预设运动速度为该节点的第一属性;
对于所述N-1组非首组手势骨架数据,所述节点属性获取单元包括:
获取子单元,用于根据所述N-1组非首组手势骨架数据各自包含的每个节点的位置信息,获取所述N-1组非组手势骨架数据各自包含的每个节点的第一属性;
其中,第j组非首组手势骨架数据为所述N-1组非首组手势骨架数据中的任一组,j为小于或等于N-1的正整数,所述第j组非首组手势骨架数据中每个节点的第一属性的确定方式为:
根据所述第j组非首组手势骨架数据中每个节点的位置信息和该节点在j-1组非首组手势骨架数据中的位置信息,获取所述第j组非首组手势骨架数据中每个节点的运动速度;
确定所述第j组非首组手势骨架数据中每个节点的位置信息和该节点的运动速度为该节点的第一属性。
可选的,所述边属性获取单元具体用于:
根据所述手势骨架数据中每对目标节点的位置信息,获取所述手势骨架数据中每条边的长度和该边的转动角度;
确定所述手势骨架数据中每条边的长度和该边的转动角度为该边的第一属性。
可选的,所述初始属性确定模块63具体用于:
将所述手势骨架数据的第一全局属性、每个节点的第一属性和每条边的第一属性输入至已训练的图网络进行处理,所述图网络输出所述目标手势对应的初始全局属性。
可选的,所述图网络包括第一图网络块、第二图网络块和第三图网络块,所述初始属性确定模块63包括:
第一更新单元,用于将所述手势骨架数据的第一全局属性、每个节点的第一属性和每条边的第一属性输入至所述第一图网络块,所述第一图网络块分别对所述手势骨架数据的第一全局属性、每个节点的第一属性和每条边的第一属性进行更新,输出所述手势骨架数据的第二全局属性、每个节点的第二属性和每条边的第二属性;
第二更新单元,用于将所述手势骨架数据的第二全局属性、每个节点的第二属性和每条边的第二属性输入至所述第二图网络块,所述第二图网络块对所述手势骨架数据的第二全局属性、每个节点的第二属性和每条边的第二属性进行更新和聚合,输出所述手势骨架数据的第三全局属性;
第三更新单元,用于将所述手势骨架数据的第三全局属性输入至所述第三图网络块,所述第三图网络块对所述第三全局属性进行更新,输出所述目标手势对应的初始全局属性。
可选的,所述第一图网络块包括第一属性更新层、第一节点更新层和第一边更新层,所述第一属性更新层、所述第一节点更新层和所述第一边更新层为全连接层或者卷积层,所述第一更新单元具体用于:
将所述手势骨架数据的第一全局属性输入至所述第一属性更新层,对所述手势骨架数据的第一全局属性进行更新,输出所述手势骨架数据的第二全局属性;
将所述手势骨架数据的每个节点的第一属性输入至所述第一节点更新层,对所述手势骨架数据的每个节点的第一属性进行更新,输出所述手势骨架数据的每个节点的第二属性;
将所述手势骨架数据的每条边的第一属性输入至所述第一边更新层,对所述手势骨架数据的每条边的第一属性进行更新,输出所述手势骨架数据的每条边的第二属性。
可选的,所述第二图网络块包括第二属性更新层、第二节点更新层、第二边更新层、第一聚合层、第二聚合层和第三聚合层,所述第二属性更新层、所述第二节点更新层和所述第二边更新层为全连接层或者卷积层,所述第二更新单元具体用于:
将所述手势骨架数据中每条边的第二属性、每条边对应的每对目标节点的第二属性、所述手势骨架数据的第二全局属性输入至所述第二边更新层,所述第二边更新层对所述手势骨架数据中每条边的第二属性进行更新,输出所述手势骨架数据中每条边的第三属性;
将所述手势骨架数据中每个节点所属边的第三属性输入至所述第一聚合层,所述第一聚合层对所述手势骨架数据中每个节点所属边的第三属性进行聚合,输出所述手势骨架数据中每个节点所属边对应的节点属性;
将所述手势骨架数据中每个节点的第二属性、每个节点所属边对应的节点属性和所述手势骨架数据的第二全局属性输入至所述第二节点更新层,所述第二节点更新层对所述手势骨架数据中每个节点的第二属性进行更新,输出所述手势骨架数据中每个节点的第三属性;
将所述手势骨架数据中所有节点的第三属性输入至所述第二聚合层,所述第二聚合层对所述手势骨架数据中所有节点的第三属性进行聚合,输出所述手势骨架数据中所有节点对应的全局属性;
将所述手势骨架数据中所有边的第三属性输入至所述第三聚合层,所述第三聚合层对所述手势骨架数据中所有边的第三属性进行聚合,输出所述手势骨架数据中所有边对应的全局属性;
将所述手势骨架数据的第二全局属性、所有节点对应的全局属性和所有边对应的全局属性输入至所述第二属性更新层,所述第二属性更新层对所述手势骨架数据的第二全局属性进行更新,输出所述手势骨架数据的第三全局属性。
可选的,所述第三图网络块包括第三属性更新层,所述第三属性更新层为全连接层或者卷积层,所述第三更新单元具体用于:
将所述手势骨架数据的第三全局属性输入至所述第三属性更新层,所述第三属性更新层对所述第三全局属性进行更新,输出所述目标手势对应的初始全局属性。
可选的,所述手势识别装置还包括:
图像获取模块,用于在获取目标手势对应的手势骨架数据之前,获取一帧手势图像或者连续N帧手势图像,其中,N为大于1的整数;
相应的,所述骨架数据获取模块61具体用于:
根据所述一帧手势图像或者所述连续N帧手势图像,获取所述目标手势对应的手势骨 架数据。
可选的,在所述初始全局属性的数量为一个时,所述手势类型确定模块64具体用于:
将所述初始全局属性输入至已训练的分类模型,通过所述分类模型识别出所述目标手势对应的手势类型;
在所述初始全局属性的数量为至少两个时,所述手势类型确定模块64包括:
第一确定单元,用于根据至少两个所述初始全局属性,确定目标全局属性,其中,所述目标全局属性用于反映所述目标手势的运动特性;
第二确定单元,用于根据所述目标全局属性,确定所述目标手势对应的手势类型。
可选的,所述第二确定单元具体用于:
将所述目标全局属性输入至所述分类模型,通过所述分类模型识别出所述目标手势对应的手势类型。
本申请实施例提供的手势识别装置可以应用在前述方法实施例一和实施例二中,详情参见上述方法实施例一和实施例二的描述,在此不再赘述。
图7是本申请实施例四提供的终端设备的结构示意图。如图7所示,该实施例的终端设备7包括:处理器70、存储器71以及存储在所述存储器71中并可在所述处理器70上运行的计算机程序72。所述处理器70执行所述计算机程序72时实现上述各个手势识别方法实施例中的步骤。或者,所述处理器70执行所述计算机程序72时实现上述各装置实施例中各模块/单元的功能。
所述终端设备7可以是桌上型计算机、笔记本、掌上电脑及云端服务器等计算设备。所述终端设备可包括,但不仅限于,处理器70、存储器71。本领域技术人员可以理解,图7仅仅是终端设备7的示例,并不构成对终端设备7的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件,例如所述终端设备还可以包括输入输出设备、网络接入设备、总线等。
所述存储器71可以是所述终端设备7的内部存储单元,例如终端设备7的硬盘或内存。所述存储器71也可以是所述终端设备7的外部存储设备,例如所述终端设备7上配备的插接式硬盘,智能存储卡(Smart Media Card,SMC),安全数字(Secure Digital,SD)卡,闪存卡(Flash Card)等。进一步地,所述存储器71还可以既包括所述终端设备7的内部存储单元也包括外部存储设备。所述存储器71用于存储所述计算机程序以及所述终端设备所需的其他程序和数据。所述存储器71还可以用于暂时地存储已经输出或者将要输出的数据。
所属领域的技术人员可以清楚地了解到,为了描述的方便和简洁,仅以上述各功能单元、模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能单元、模块完成,即将所述装置的内部结构划分成不同的功能单元或模块,以完成以上描述的全部或者部分功能。实施例中的各功能单元、模块可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中,上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。另外,各功能单元、模块的具体名称也只是为了便于相互区分,并不用于限制本申请的保护范围。上述系统中单元、模块的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
在上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详述或记载的部分,可以参见其它实施例的相关描述。
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
在本申请所提供的实施例中,应该理解到,所揭露的装置/终端设备和方法,可以通过其它的方式实现。例如,以上所描述的装置/终端设备实施例仅仅是示意性的,例如,所述模块或单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通讯连接可以是通过一些接口,装置或单元的间接耦合或通讯连接,可以是电性,机械或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。
所述集成的模块/单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请实现上述实施例方法中的全部或部分流程,也可以通过计算机程序来指令相关的硬件来完成,所述的计算机程序可存储于一计算机可读存储介质中,该计算机程序在被处理器执行时,可实现上述各个方法实施例的步骤。其中,所述计算机程序包括计算机程序代码,所述计算机程序代码可以为源代码形式、对象代码形式、可执行文件或某些中间形式等。所述计算机可读介质可以包括:能够携带所述计算机程序代码的任何实体或装置、记录介质、U盘、移动硬盘、磁碟、光盘、计算机存储器、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、电载波信号、电信信号以及软件分发介质等。需要说明的是,所述计算机可读介质包含的内容可以根据司法管辖区内立法和专利实践的要求进行适当的增减,例如在某些司法管辖区,根据立法和专利实践,计算机可读介质不包括电载波信号和电信信号。
以上所述实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的精神和范围,均应包含在本申请的保护范围之内。

Claims (14)

  1. 一种手势识别方法,其特征在于,所述手势识别方法包括:
    获取目标手势对应的手势骨架数据;
    根据所述手势骨架数据,确定所述目标手势对应的手部属性数据,其中,所述手部属性数据用于反映所述目标手势的关节点特性和骨头特性;
    根据所述手部属性数据,确定所述目标手势对应的初始全局属性,其中,所述初始全局属性用于反映所述目标手势的手势特性;
    根据所述初始全局属性,确定所述目标手势对应的手势类型。
  2. 如权利要求1所述的手势识别方法,其特征在于,所述手势骨架数据包括至少两个节点的位置信息,所述至少两个节点均为手势骨架数据对应的手势骨架的关节点,所述根据所述手势骨架数据,确定所述目标手势对应的手部属性数据包括:
    根据所述手势骨架数据中每个节点的位置信息,获取所述手势骨架数据中每个节点的第一属性;
    根据所述手势骨架数据中每对目标节点的位置信息,获取所述手势骨架数据中每条边的第一属性,其中,所述每对目标节点是指满足预设条件的两个相邻节点,所述两个相邻节点之间通过一条边进行连接;
    获取所述手势骨架数据的第一全局属性;
    确定所述手势骨架数据的第一全局属性、每个节点的第一属性和每条边的第一属性为所述目标手势对应的手部属性数据。
  3. 如权利要求2所述的手势识别方法,其特征在于,所述目标手势对应N组手势骨架数据,N为大于1的整数,所述N组手势骨架数据的排列顺序是根据所述目标手势的运动顺序确定的,所述N组手势骨架数据包括首组手势骨架数据和N-1组非首组手势骨架数据;
    对于所述首组手势骨架数据,所述根据所述手势骨架数据中每个节点的位置信息,获取所述手势骨架数据中每个节点的第一属性包括:
    确定所述首组手势骨架数据中每个节点的位置信息和预设运动速度为该节点的第一属性;
    对于所述N-1组非首组手势骨架数据,所述根据所述手势骨架数据中每个节点的位置信息,获取所述手势骨架数据中每个节点的第一属性包括:
    根据所述N-1组非首组手势骨架数据各自包含的每个节点的位置信息,获取所述N-1组非组手势骨架数据各自包含的每个节点的第一属性;
    其中,第j组非首组手势骨架数据为所述N-1组非首组手势骨架数据中的任一组,j为小于或等于N-1的正整数,所述第j组非首组手势骨架数据中每个节点的第一属性的确定方式为:
    根据所述第j组非首组手势骨架数据中每个节点的位置信息和该节点在j-1组非首组手势骨架数据中的位置信息,获取所述第j组非首组手势骨架数据中每个节点的运动速度;
    确定所述第j组非首组手势骨架数据中每个节点的位置信息和该节点的运动速度为该节点的第一属性。
  4. 如权利要求2所述的手势识别方法,其特征在于,所述根据所述手势骨架数据中每对目标节点的位置信息,获取所述手势骨架数据中每条边的第一属性包括:
    根据所述手势骨架数据中每对目标节点的位置信息,获取所述手势骨架数据中每条边的长度和该边的转动角度;
    确定所述手势骨架数据中每条边的长度和该边的转动角度为该边的第一属性。
  5. 如权利要求2所述的手势识别方法,其特征在于,所述根据所述手部属性数据,确定所述目标手势对应的初始全局属性包括:
    将所述手势骨架数据的第一全局属性、每个节点的第一属性和每条边的第一属性输入至已训练的图网络进行处理,所述图网络输出所述目标手势对应的初始全局属性。
  6. 如权利要求5所述的手势识别方法,其特征在于,所述图网络包括第一图网络块、第二图网络块和第三图网络块,所述将所述手势骨架数据的第一全局属性、每个节点的第一属性和每条边的第一属性输入至已训练的图网络进行处理,所述图网络输出所述目标手势对应的初始全局属性包括:
    将所述手势骨架数据的第一全局属性、每个节点的第一属性和每条边的第一属性输入至所述第一图网络块,所述第一图网络块分别对所述手势骨架数据的第一全局属性、每个节点的第一属性和每条边的第一属性进行更新,输出所述手势骨架数据的第二全局属性、每个节点的第二属性和每条边的第二属性;
    将所述手势骨架数据的第二全局属性、每个节点的第二属性和每条边的第二属性输入至所述第二图网络块,所述第二图网络块对所述手势骨架数据的第二全局属性、每个节点的第二属性和每条边的第二属性进行更新和聚合,输出所述手势骨架数据的第三全局属性;
    将所述手势骨架数据的第三全局属性输入至所述第三图网络块,所述第三图网络块对所述第三全局属性进行更新,输出所述目标手势对应的初始全局属性。
  7. 如权利要求6所述的手势识别方法,其特征在于,所述第一图网络块包括第一属性更新层、第一节点更新层和第一边更新层,所述第一属性更新层、所述第一节点更新层和所述第一边更新层为全连接层或者卷积层,将所述手势骨架数据的第一全局属性、每个节点的第一属性和每条边的第一属性输入至所述第一图网络块,所述第一图网络块分别对所述手势骨架数据的第一全局属性、每个节点的第一属性和每条边的第一属性进行更新,输出所述手势骨架数据的第二全局属性、每个节点的第二属性和每条边的第二属性包括:
    将所述手势骨架数据的第一全局属性输入至所述第一属性更新层,对所述手势骨架数据的第一全局属性进行更新,输出所述手势骨架数据的第二全局属性;
    将所述手势骨架数据的每个节点的第一属性输入至所述第一节点更新层,对所述手势骨架数据的每个节点的第一属性进行更新,输出所述手势骨架数据的每个节点的第二属性;
    将所述手势骨架数据的每条边的第一属性输入至所述第一边更新层,对所述手势骨架数据的每条边的第一属性进行更新,输出所述手势骨架数据的每条边的第二属性。
  8. 如权利要求6所述的手势识别方法,其特征在于,所述第二图网络块包括第二属性更新层、第二节点更新层、第二边更新层、第一聚合层、第二聚合层和第三聚合层,所述第二属性更新层、所述第二节点更新层和所述第二边更新层为全连接层或者卷积层,所述将所述手势骨架数据的第二全局属性、每个节点的第二属性和每条边的第二属性输入至所述第二图网络块,所述第二图网络块对所述手势骨架数据的第二全局属性、每个节点的第二属性和每条边的第二属性进行更新和聚合,输出所述手势骨架数据的第三全局属性包括:
    将所述手势骨架数据中每条边的第二属性、每条边对应的每对目标节点的第二属性、所述手势骨架数据的第二全局属性输入至所述第二边更新层,所述第二边更新层对所述手势骨架数据中每条边的第二属性进行更新,输出所述手势骨架数据中每条边的第三属性;
    将所述手势骨架数据中每个节点所属边的第三属性输入至所述第一聚合层,所述第一聚合层对所述手势骨架数据中每个节点所属边的第三属性进行聚合,输出所述手势骨架数据中每个节点所属边对应的节点属性;
    将所述手势骨架数据中每个节点的第二属性、每个节点所属边对应的节点属性和所述手势骨架数据的第二全局属性输入至所述第二节点更新层,所述第二节点更新层对所述手势骨架数据中每个节点的第二属性进行更新,输出所述手势骨架数据中每个节点的第三属性;
    将所述手势骨架数据中所有节点的第三属性输入至所述第二聚合层,所述第二聚合层对所述手势骨架数据中所有节点的第三属性进行聚合,输出所述手势骨架数据中所有节点对应的全局属性;
    将所述手势骨架数据中所有边的第三属性输入至所述第三聚合层,所述第三聚合层对所述手势骨架数据中所有边的第三属性进行聚合,输出所述手势骨架数据中所有边对应的全局属性;
    将所述手势骨架数据的第二全局属性、所有节点对应的全局属性和所有边对应的全局属性输入至所述第二属性更新层,所述第二属性更新层对所述手势骨架数据的第二全局属性进行更新,输出所述手势骨架数据的第三全局属性。
  9. 如权利要求6所述的手势识别方法,其特征在于,所述第三图网络块包括第三属性更新层,所述第三属性更新层为全连接层或者卷积层,所述将所述手势骨架数据的第三全局属性输入至所述第三图网络块,所述第三图网络块对所述第三全局属性进行更新,输出所述手势骨架数据的初始全局属性包括:
    将所述手势骨架数据的第三全局属性输入至所述第三属性更新层,所述第三属性更新层对所述第三全局属性进行更新,输出所述目标手势对应的初始全局属性。
  10. 如权利要求1所述的手势识别方法,其特征在于,在获取目标手势对应的手势骨架数据之前,还包括:
    获取一帧手势图像或者连续N帧手势图像,其中,N为大于1的整数;
    相应的,所述获取目标手势对应的手势骨架数据包括:
    根据所述一帧手势图像或者所述连续N帧手势图像,获取所述目标手势对应的手势骨架数据。
  11. 如权利要求1至10任一项所述的手势识别方法,其特征在于,在所述初始全局属性的数量为一个时,所述根据所述初始全局属性,确定所述目标手势对应的手势类型包括:
    将所述初始全局属性输入至已训练的分类模型,通过所述分类模型识别出所述目标手势对应的手势类型;
    在所述初始全局属性的数量为至少两个时,所述根据所述初始全局属性,确定所述目标手势对应的手势类型包括:
    根据至少两个所述初始全局属性,确定目标全局属性,其中,所述目标全局属性用于反映所述目标手势的运动特性;
    根据所述目标全局属性,确定所述目标手势对应的手势类型。
  12. 如权利要求11所述的手势识别方法,其特征在于,所述根据所述目标全局属性,确定所述目标手势对应的手势类型包括:
    将所述目标全局属性输入至所述分类模型,通过所述分类模型识别出所述目标手势对应的手势类型。
  13. 一种终端设备,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机程序,其特征在于,所述处理器执行所述计算机程序时实现如权利要求1至12任一项所述手势识别方法的步骤。
  14. 一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,其特征在于,所述计算机程序被处理器执行时实现如权利要求1至12任一项所述手势识别方法的步骤。
PCT/CN2020/130575 2020-04-26 2020-11-20 手势识别方法、终端设备及计算机可读存储介质 WO2021218126A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010337876.0 2020-04-26
CN202010337876.0A CN113553884B (zh) 2020-04-26 2020-04-26 手势识别方法、终端设备及计算机可读存储介质

Publications (1)

Publication Number Publication Date
WO2021218126A1 true WO2021218126A1 (zh) 2021-11-04

Family

ID=78129797

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/130575 WO2021218126A1 (zh) 2020-04-26 2020-11-20 手势识别方法、终端设备及计算机可读存储介质

Country Status (2)

Country Link
CN (1) CN113553884B (zh)
WO (1) WO2021218126A1 (zh)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI798038B (zh) * 2022-03-30 2023-04-01 國立勤益科技大學 具有手部姿勢操控之人機介面裝置

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109634415A (zh) * 2018-12-11 2019-04-16 哈尔滨拓博科技有限公司 一种用于控制模拟量的手势识别控制方法
CN109753876A (zh) * 2018-12-03 2019-05-14 西北工业大学 一种三维手势的提取识别和三维手势交互系统的构建方法
CN109902583A (zh) * 2019-01-28 2019-06-18 电子科技大学 一种基于双向独立循环神经网络的骨架手势识别方法
CN110390305A (zh) * 2019-07-25 2019-10-29 广东工业大学 基于图卷积神经网络的手势识别的方法及装置
US20200012946A1 (en) * 2018-07-06 2020-01-09 Tactual Labs Co. Delimitation in unsupervised classification of gestures
CN110895683A (zh) * 2019-10-15 2020-03-20 西安理工大学 一种基于Kinect的单视点手势姿势识别方法
CN110991319A (zh) * 2019-11-29 2020-04-10 广州市百果园信息技术有限公司 手部关键点检测方法、手势识别方法及相关装置

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102142055A (zh) * 2011-04-07 2011-08-03 上海大学 基于增强现实交互技术的真三维设计方法
KR20170024636A (ko) * 2015-08-25 2017-03-08 (의료)길의료재단 동작인식을 이용한 운동 보조 시스템
CN106125925B (zh) * 2016-06-20 2019-05-14 华南理工大学 基于手势和语音控制的智能抓捕方法
CN106326881B (zh) * 2016-09-21 2024-02-02 济南超感智能科技有限公司 用于实现人机交互的手势识别方法和手势识别设备
US10296102B1 (en) * 2018-01-31 2019-05-21 Piccolo Labs Inc. Gesture and motion recognition using skeleton tracking
CN108664877A (zh) * 2018-03-09 2018-10-16 北京理工大学 一种基于三维深度数据的动态手势识别方法
CN108549490A (zh) * 2018-05-03 2018-09-18 林潼 一种基于Leap Motion设备的手势识别互动方法
CN110163045A (zh) * 2018-06-07 2019-08-23 腾讯科技(深圳)有限公司 一种手势动作的识别方法、装置以及设备
CN109993073B (zh) * 2019-03-14 2021-07-02 北京工业大学 一种基于Leap Motion的复杂动态手势识别方法
CN110837778B (zh) * 2019-10-12 2023-08-18 南京信息工程大学 一种基于骨架关节点序列的交警指挥手势识别方法

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200012946A1 (en) * 2018-07-06 2020-01-09 Tactual Labs Co. Delimitation in unsupervised classification of gestures
CN109753876A (zh) * 2018-12-03 2019-05-14 西北工业大学 一种三维手势的提取识别和三维手势交互系统的构建方法
CN109634415A (zh) * 2018-12-11 2019-04-16 哈尔滨拓博科技有限公司 一种用于控制模拟量的手势识别控制方法
CN109902583A (zh) * 2019-01-28 2019-06-18 电子科技大学 一种基于双向独立循环神经网络的骨架手势识别方法
CN110390305A (zh) * 2019-07-25 2019-10-29 广东工业大学 基于图卷积神经网络的手势识别的方法及装置
CN110895683A (zh) * 2019-10-15 2020-03-20 西安理工大学 一种基于Kinect的单视点手势姿势识别方法
CN110991319A (zh) * 2019-11-29 2020-04-10 广州市百果园信息技术有限公司 手部关键点检测方法、手势识别方法及相关装置

Also Published As

Publication number Publication date
CN113553884B (zh) 2023-04-18
CN113553884A (zh) 2021-10-26

Similar Documents

Publication Publication Date Title
CN112613581B (zh) 一种图像识别方法、系统、计算机设备和存储介质
WO2021169116A1 (zh) 智能化的缺失数据填充方法、装置、设备及存储介质
WO2022116423A1 (zh) 物体位姿估计方法、装置、电子设备及计算机存储介质
WO2020244075A1 (zh) 手语识别方法、装置、计算机设备及存储介质
WO2022105118A1 (zh) 基于图像的健康状态识别方法、装置、设备及存储介质
CN112464865A (zh) 一种基于像素和几何混合特征的人脸表情识别方法
Santhalingam et al. Sign language recognition analysis using multimodal data
WO2021031817A1 (zh) 情绪识别方法、装置、计算机装置及存储介质
Núñez et al. Multiview 3D human pose estimation using improved least-squares and LSTM networks
CN111126249A (zh) 一种大数据和贝叶斯相结合的行人重识别方法及装置
CN110738070A (zh) 基于视频的行为识别方法、行为识别装置及终端设备
CN107944381A (zh) 人脸跟踪方法、装置、终端及存储介质
CN110738650A (zh) 一种传染病感染识别方法、终端设备及存储介质
WO2021218126A1 (zh) 手势识别方法、终端设备及计算机可读存储介质
CN110390307B (zh) 表情识别方法、表情识别模型训练方法及装置
JP2019086979A (ja) 情報処理装置、情報処理方法及びプログラム
He et al. On-device deep multi-task inference via multi-task zipping
CN115248890A (zh) 用户兴趣画像的生成方法、装置、电子设备以及存储介质
CN111753736A (zh) 基于分组卷积的人体姿态识别方法、装置、设备和介质
CN111104911A (zh) 一种基于大数据训练的行人重识别方法及装置
CN116503608A (zh) 基于人工智能的数据蒸馏方法及相关设备
CN111639500A (zh) 语义角色标注方法、装置、计算机设备及存储介质
Lu et al. Subspace Clustering by Capped l_1 l 1 Norm
CN114581177A (zh) 产品推荐方法、装置、设备及存储介质
CN113887501A (zh) 行为识别方法、装置、存储介质及电子设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20933922

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20933922

Country of ref document: EP

Kind code of ref document: A1