CN112131979A

CN112131979A - Continuous action identification method based on human skeleton information

Info

Publication number: CN112131979A
Application number: CN202010941604.1A
Authority: CN
Inventors: 黎张子康; 周小舟
Original assignee: Southeast University
Current assignee: Southeast University
Priority date: 2020-09-09
Filing date: 2020-09-09
Publication date: 2020-12-25

Abstract

The invention discloses a continuous action identification method based on human skeleton information, which comprises the following steps: (1) extracting a skeleton of a human body to obtain position information of a plurality of nodes of the human body; (2) judging key nodes and non-joint nodes in the action, and taking the distance information of the key nodes of continuous action as an observed value characteristic sequence of action identification; (3) normalizing the observed value characteristic sequence; (4) performing probability calculation on the observation sequence corresponding to the continuous action by adopting the trained HMM model; (5) and carrying out weighted average on the obtained probability according to different weights occupied by the corresponding nodes of different actions, and outputting a final result. The method and the device can reduce the influence of the information of the joint-free point on the motion recognition, reasonably carry out weighted average on the probability obtained by calculating different node sequences, reduce the operation amount, improve the recognition accuracy rate and improve the user experience.

Description

Continuous action identification method based on human skeleton information

Technical Field

The invention relates to the technical field of human body action recognition, in particular to a continuous action recognition method based on human body skeleton information.

Background

At present, human body action recognition is a popular research subject in the field of computer vision, and can be used in the fields of machine learning, image processing, computer vision and the like.

Nowadays, many devices are available to acquire human skeleton data, such as a Kinect, realsense and other RGB-D depth somatosensory cameras, which mainly include a microphone array, an infrared transmitter and an infrared receiver used for depth images, and an RGB camera. The depth somatosensory camera separates a human body from a background by using a separation strategy, and inputs the separated human body partial image into each part recognition model of the human body trained by a cluster system through data counted by TB to produce a human body skeleton model with up to 32 joint points, and outputs skeleton data at a speed of about 30f/s, so that the human body skeleton model with up to 6 persons can be recognized. The identification method of the invention is based on human skeleton data, in the existing body action identification algorithm, researchers mostly adopt the information of simultaneously obtaining a plurality of node directions or distances as characteristics for identification, the method can cover as much action information as possible in a sequence, but for a specific action, the method has certain problems: not every node has a great position or direction change in the action, the change amplitude ratio of different nodes in different actions is different, and the direct identification of all information can increase the influence of the joint-free point information on the identification rate.

Disclosure of Invention

In order to solve the problems, the invention discloses a continuous motion recognition method based on human body skeleton information, which can reduce the influence of joint-free point information on motion recognition, perform normalization processing on distance characteristics, reduce the amount of computation, improve the recognition stability and improve the user experience.

The invention relates to a continuous action identification method based on human skeleton information, which comprises the following steps:

(1) extracting a skeleton of a human body to obtain position information of a plurality of nodes of the human body;

(2) judging key nodes and non-joint nodes in the action, and taking the distance information of the key nodes of continuous action as an observed value characteristic sequence of action identification;

(3) normalizing the observed value characteristic sequence;

(4) performing probability calculation on the observation sequence corresponding to the continuous action by adopting an HMM model;

(5) and carrying out weighted average on the obtained probability according to different weights occupied by the corresponding nodes of different actions, and outputting a final result.

The invention further improves that:

in the step (1), the plurality of nodes of the whole body include NECK, SPINE, CHEST, LEFT SHOULDER, LEFT ELBOW, LEFT WRIST, RIGHT SHOULDER, RIGHT ELBOW, RIGHT WRIST (NECK, SPINE _ NAVAL, SPINE _ CHEST, SHOULDER _ LEFT, ELBOW _ LEFT, WRIST _ LEFT, SHOULDER _ RIGHT, ELBOW _ RIGHT, WRIST _ RIGHT).

The invention further improves that:

in the step (2), it is considered that the acquired spatial position of the node may not change greatly, that is, the node does not perform an obvious intentional action, so that the collection of the distance data sequence of the node is started only when the spatial position distance recorded before and after the node is greater than the distance threshold, and the collection of the ratio sequence of the node is stopped when the spatial position distance recorded before and after the node is less than the distance threshold. For nodes without obvious change, the sequence is empty, and the output probability of the model is 0; before the identification is started, data of certain nodes are not collected by artificially setting certain actions, and the output probability is directly assigned to be 0.

The invention further improves that:

in the step (3), the observed value feature sequence of the action recognition is a ratio O1 of a distance from a left shoulder node to a neck node and a distance from a chest node to a spine node, a ratio O2 of a distance from a right shoulder node to a neck node and a distance from a chest node to a spine node, a ratio O3 of a distance from a left elbow node to a neck node and a distance from a chest node to a spine node, a ratio O4 of a distance from a right elbow node to a neck node and a distance from a chest node to a spine node, a ratio O5 of a distance from a left wrist node to a neck node and a distance from a chest node to a spine node, and a ratio O6 of a distance from a right wrist node to a neck node and a distance from a chest node. The static frame data is O ═ O1, O2, O3, O4, O5, O6.

The invention further improves that:

in the step (4), when the length of the ratio sequence is greater than the length threshold lambda, calculating a forward algorithm output probability value of the node ratio sequence under each action model. And (3) iteratively training an HMM model by using a Baum-Welch algorithm according to a training database, wherein each key node has one HMM model for each action, and the n actions obtain 6n output probabilities.

The invention further improves that:

in the step (5), after the output probabilities of the key nodes of each action model are weighted and averaged according to different proportions of the key nodes in different actions, finally each action obtains a total probability, the probabilities are compared to obtain a maximum value, if the maximum value is too small, the action made by the user does not belong to any existing action in a database, namely when the maximum probability is smaller than a threshold value corresponding to the action, the action is judged to be an undefined action, otherwise, the action corresponding to the maximum probability is a final recognition result.

Has the advantages that: compared with the prior art, the feature extraction method for motion recognition only extracts key node data related to motion, the output probability of the non-joint nodes is 0, the calculation amount is greatly reduced, the influence of irrelevant amount on the recognition rate is also reduced, and the user experience is improved. And the characteristic distance is normalized in consideration of different human body material differences, so that the identification stability is improved. When the final output probability is calculated, the probability obtained by each key node sequence is weighted and averaged, and the identification accuracy is improved.

Drawings

FIG. 1 is a Kinect depth camera coordinate system according to an embodiment of the present invention;

FIG. 2 shows a human joint position and connection mode obtained by Kinect according to an embodiment of the present invention;

FIG. 3 is a graph of node distance ratios according to the present invention;

FIG. 4 is a diagram of HMM model parameters according to the present invention;

fig. 5 is a flow chart of a continuous motion recognition method based on human skeleton information according to the present invention.

Detailed Description

The present invention will be further illustrated with reference to the accompanying drawings and specific embodiments, which are to be understood as merely illustrative of the invention and not as limiting the scope of the invention. It should be noted that the terms "front," "back," "left," "right," "upper" and "lower" used in the following description refer to directions in the drawings, and the terms "inner" and "outer" refer to directions toward and away from, respectively, the geometric center of a particular component.

step 1, extracting a skeleton of a human body to obtain position information of a plurality of nodes of the human body;

as shown in fig. 1 and 2, the Kinect device is used in this example to obtain the location information of the body skeleton nodes: NECK, SPINE, CHEST, LEFT SHOULDER, LEFT ELBOW, LEFT WRIST, RIGHT SHOULDER, RIGHT ELBOW, RIGHT WRIST (NECK, SPINE _ NAVAL, SPINE _ CHEST, SHOULDER _ LEFT, ELBOW _ LEFT, WRIST _ LEFT, SHOULDER _ RIGHT, ELBOW _ RIGHT, WRIST _ RIGHT).

Step 2, judging key nodes and non-joint nodes in the action, and taking distance information of the key nodes in continuous action as an observed value characteristic sequence of action identification;

considering that the acquired spatial position of the node may not change greatly, that is, the node does not perform obvious intentional action, only when the spatial position distance recorded before and after the node is greater than the distance threshold, the collection of the distance data sequence of the node is started, for the right wrist, training shows that when the distance threshold is 10, the start and the end of the action can be recorded best, and when the spatial position distance recorded before and after the node is less than the distance threshold, the collection of the ratio sequence of the node is stopped. For nodes without obvious changes, the sequence is empty, and the model output probability is 0. Before the identification is started, data of certain nodes are not collected by artificially setting certain actions, and the output probability is directly assigned to be 0.

Step 3, normalizing the observed value characteristic sequence;

as shown in fig. 3, the observed value feature sequence of the motion recognition is a ratio of a distance from a left shoulder node to a neck node to a distance from a chest node to a spine node, a ratio of a distance from a right shoulder node to a neck node to a distance from a chest node to a spine node, a ratio of a distance from a left elbow node to a neck node to a distance from a chest node to a spine node, a ratio of a distance from a right elbow node to a neck node to a distance from a chest node to a spine node, a ratio of a distance from a left wrist node to a neck node to a distance from a chest node to a spine node, and a ratio of a distance from a right wrist node to a neck node to a distance from. The static frame data is O ═ O1, O2, O3, O4, O5, O6.

In the process of motion recognition, the bone distances of different users are greatly different, so that the observation value characteristic sequence for gesture recognition is required to be capable of adapting to the difference, and original information is reserved for observation value to the maximum extent possible, so that the accuracy of recognition is prevented from being influenced due to information loss.

From the perspective of different adaptability, the normalized distance between the nodes of the neck, the shoulder, the elbow and the wrist has good adaptability to the upper limbs of different users. The bone node distances of different people are different, but the ratio obtained after the distance from the spine to the chest is removed is not very different, so that the method is very suitable for being used as an observed value feature sequence of motion recognition, as shown in fig. 2, the distance between the neck and the wrist is different from person to person and is not suitable for being used as a recognition feature, the distance needs to be subjected to normalization processing, the distance between the neck and the wrist is divided by the distance from the chest to the spine to obtain the ratio of the wrist in a certain frame of image, the modulus ratio difference is very small when different people do the same motion, the modulus ratio can be used as an observed feature, and the ratio obtained in the whole motion process forms the feature sequence of the wrist.

Step 4, performing probability calculation on the observation sequence corresponding to the continuous action by adopting an HMM model;

for the characteristic sequence of the input model, the length of the sequence represents the length of the action displacement, too short displacement may be a false triggering gesture, and the probability value does not have referential property, and only when the length of the ratio sequence is greater than a set threshold lambda, the forward algorithm output probability value of the node ratio sequence under each action model is started to be calculated. Training has found that the sequence input to the model is most referential when the length threshold λ is 70% of the sequence average. For each action, each key node has 1 HMM model, each HMM model corresponding to each action is iteratively trained by using a Baum-Welch algorithm according to a training database, each model outputs one probability, and then n actions are defined to obtain 6n output probabilities.

And 5, carrying out weighted average on the obtained probability according to different weights occupied by the corresponding nodes of different actions, and outputting a final result.

In the training stage of the action, the output values of the forward algorithm of 30 sets of training data are calculated in each training, and the calculated values are used as the basis for judging whether convergence occurs. And extracting the output values of the forward algorithm of 30 groups of training data calculated during the last training, defining the probability threshold value of a certain action, taking h >0 as a harmonic parameter, taking mu as a mean value and taking sigma as a standard deviation, and storing the threshold value parameter as a model parameter in a corresponding action object.

＝μ–h*σ

After the output probabilities of the key node sequences are obtained, for the key nodes with the probabilities different from 0, even if the key nodes are right-handed nodes, the change amplitudes of the key nodes are different in different actions, for the action with the larger change amplitude, the node should be assigned with more weights, and for the action with the smaller change amplitude, the weights are also smaller, so that the obtained total probability can represent the action more.

And after weighted averaging is carried out on the output probability of each key node of each action model, finally obtaining a total probability for each action, comparing the sizes of the probabilities to obtain a maximum value, if the maximum value is too small, indicating that the action performed by the user does not belong to any existing action in a database, namely when the maximum probability is smaller than a threshold value corresponding to the action, judging the action to be undefined, otherwise, determining the action corresponding to the maximum probability to be a final recognition result.

The technical means disclosed in the invention scheme are not limited to the technical means disclosed in the above embodiments, but also include the technical scheme formed by any combination of the above technical features.

Claims

1. A continuous action recognition method based on human skeleton information is characterized in that,

the method comprises the following steps:

(3) normalizing the observed value characteristic sequence;

2. The method for recognizing continuous motion based on human skeletal information as claimed in claim 1, wherein in the step (1): the multiple joints of the body comprise neck, spine, chest, left shoulder, left elbow, left wrist, right shoulder, right elbow and right wrist.

3. The method for continuous motion recognition based on human skeletal information of claim 1, wherein in step (2), when the distance between two previous and subsequent recorded spatial positions of a node is greater than a distance threshold, the collection of the distance data sequence of the node is started, and when the distance between two previous and subsequent recorded spatial positions of the node is less than the distance threshold, the collection of the ratio sequence of the node is stopped; before the identification is started, data of certain nodes are not collected by artificially setting certain actions, and the output probability is directly assigned to be 0.

4. The method for continuous motion recognition based on human skeletal information of claim 1, wherein in the step (3), the observed feature sequence of motion recognition is a ratio O1 of a distance from a left shoulder node to a neck node to a distance from a chest to a spine node, a ratio O2 of a distance from a right shoulder node to a neck node to a distance from a chest to a spine node, a ratio O3 of a distance from a left elbow node to a neck node to a distance from a chest to a spine node, a ratio O4 of a distance from a right elbow node to a neck node to a distance from a chest to a spine node, a ratio O5 of a distance from a left wrist node to a neck node to a chest to a spine node, and a ratio O6 of a distance from a right wrist node to a neck node to a chest to a spine node; the static frame data is O ═ O1, O2, O3, O4, O5, O6.

5. The method for recognizing continuous motion based on human skeletal information as claimed in claim 1, wherein in step (4), when the length of the ratio sequence is greater than the length threshold λ, the forward algorithm output probability value of the node ratio sequence under each motion model is calculated; and (3) iteratively training an HMM model by using a Baum-Welch algorithm according to a training database, wherein each key node has one HMM model for each action, and the n actions obtain 6n output probabilities.

6. The method according to claim 1, wherein in the step (5), the probabilities of the nodes obtained by each motion model are weighted and averaged according to the different ratios of the nodes in different motions, and finally each motion is obtained as a probability, the probabilities are compared to obtain a maximum value, if the maximum value is too small, the motion performed by the user does not belong to any existing motion in a database, that is, if the maximum probability is smaller than a threshold corresponding to the motion, the motion is determined as undefined motion, otherwise, the motion corresponding to the maximum probability is the final recognition result.