CN113033501A

CN113033501A - Human body classification method and device based on joint quaternion

Info

Publication number: CN113033501A
Application number: CN202110491778.7A
Authority: CN
Inventors: 何春平
Original assignee: Zeen Technology Co ltd
Current assignee: Zeen Technology Co ltd
Priority date: 2021-05-06
Filing date: 2021-05-06
Publication date: 2021-06-25

Abstract

The invention discloses a human body classification method based on joint quaternion, which relates to the technical field of human body classification and solves the technical problems of low accuracy and poor robustness of the existing human body classification method, and comprises the following steps: detecting and segmenting each moving human body in the moving video to obtain each segmented frame and a three-dimensional human body skeleton of the moving human body; calculating according to the three-dimensional information of the three-dimensional human body skeleton to obtain joint quaternion of the left leg knee joint, the right leg knee joint and one elbow joint; selecting five parameters from the joint quaternion to form a characteristic vector M; and carrying out motion classification on the motion human body in the motion video according to the characteristic vector M to identify a walking or running result. The invention also discloses a human body classification device based on the joint quaternion. According to the invention, the walking or running result is classified and identified by acquiring the joint quaternion of the moving human body and selecting five parameters from the joint quaternion to form the characteristic vector M, so that the accuracy is high, the robustness is good, and the identification speed is high.

Description

Human body classification method and device based on joint quaternion

Technical Field

The invention relates to the technical field of human body classification, in particular to a human body classification method and device based on joint quaternion.

Background

The vision processing system is the primary tool used by humans to observe and perceive the outside world. In modern society, with the ever-increasing computer processing power, engineers expect computers to be able to recognize, observe, and interact with the world like humans with both eyes and brain, which requires computers to have almost all of the capabilities of human vision processing systems. As the processing power of computer hardware is continuously increased, and meanwhile, the technology of computer vision is also rapidly developed, the expectation becomes more realistic. The main content of computer vision technology research is how to solve human-centered related problems, including human body detection and tracking, face recognition, human body motion analysis, and the like, by using computer vision technology.

For a complicated and confused background, a human body is often shielded by obstacles and other non-measured human bodies, so that the extraction of the motion characteristics of the human body becomes difficult. The movement has complexity, and for the gesture movement with limb obstruction, how to ensure that the movement features can be accurately extracted and tracked in real time as much as possible is a problem worthy of study.

In the course of behavior training, training samples are usually collected under limited experimental conditions, but when behavior recognition is performed in a real scene using the same scheme, due to the influence of environmental factors such as illumination and noise, the training samples cannot be well close to the real environment, so the accuracy of behavior recognition and the robustness of the algorithm are reduced. In addition, the rapidity of the algorithm is also influenced by the complexity of human body motion and the complexity of characteristic information preprocessing. Therefore, how to improve the accuracy, robustness and rapidity of the moving human behavior recognition algorithm, improve the system performance and reduce the calculation cost of the system is also one of the difficult problems of research.

Disclosure of Invention

The invention aims to solve the technical problem of the prior art, and aims to provide a human body classification method based on joint quaternion, which has high accuracy and good robustness.

The invention also aims to provide the human body classification device based on the joint quaternion, which has high accuracy and good robustness.

In order to achieve the above object, the present invention provides a human body classification method based on joint quaternion, comprising:

detecting and segmenting each moving human body in the moving video to obtain each segmented frame and a three-dimensional human body skeleton of the moving human body;

establishing a standard body type library, filling the three-dimensional body skeleton by using the standard body type library to obtain a verification human body and a verification frame, comparing the verification human body in the verification frame with the moving human body in the segmentation frame, and if the region of the verification human body is overlapped with the region of the moving human body, keeping the three-dimensional body skeleton of the segmentation frame and the moving human body; otherwise, neglecting the segmented frame and the three-dimensional human body skeleton of the moving human body;

calculating according to the three-dimensional information of the three-dimensional human body skeleton to obtain joint quaternion of the left leg knee joint, the right leg knee joint and one elbow joint;

selecting five parameters from the joint quaternion to form a characteristic vector

M [ LKga, RKga, Ea, Lka, RKa ], wherein LKga and RKga respectively represent the average value of the slope of the rotation angle of the left knee and the average value of the slope of the rotation angle of the right knee, Ea represents the rotation average value of the elbow joint in a sampling period, and Lka and RKa respectively represent the average value of the rotation angle of the left knee and the average value of the rotation angle of the right knee;

and carrying out motion classification on the motion human body in the motion video according to the feature vector M to identify a walking or running result.

As a further improvement, the process of calculating the joint quaternion of the knee joint is as follows:

taking the coordinates (x) of the joint point at the top of the thigh_hip，y_hip，z_hip) Knee joint coordinate (x)_knee，y_knee，z_knee) Foot joint coordinate (x)_ankle，y_ankle，z_ankle) And calculating to obtain a vector V1 ═ x_knee-x_hip，y_knee-y_hip，z_knee-z_hip]Vector V2 ═ x_ankle-x_knee，y_ankle-y_knee，z_ankle-z_knee]；

Vector V1 and vector V2 are normalized:

vector V1_NEWVector V2_NEWCross product is performed to obtain the rotation axis vector axis [ axis (1), axis (2), axis (3)]，

axis(1)＝(y_knee-y_hip)(z_ankle-z_knee)-(y_ankle-y_knee)(z_knee-z_hip)

axis(2)＝(z_knee-z_hip)(x_ankle-x_knee)-(z_ankle-z_knee)(x_knee-x_hip)

axis(3)＝(x_knee-x_hip)(y_anMe-y_knee)-(y_knee-y_hip)(x_ankle-x_knee).

Normalizing the rotation axis vector axis to obtain normalized axis_NEWSo as to obtain a joint quaternion Q ═ W + Xi + Yj + Zk,

wherein W represents the rotation angle information under the object coordinate system, and the rotation radian can be obtained through inverse cosine transformation,

which represent the rotation of the knee joint point in the X-Y plane, Y-Z plane, Z-X plane, respectively.

Further, the one of the elbow joints is an elbow joint close to the image acquisition device.

Further, the detecting and segmenting each moving human body in the moving video comprises:

segmenting the motion video to obtain each video frame;

segmenting the human body area of each moving human body in each video frame to obtain each human body image;

establishing a three-dimensional human body skeleton of each human body image;

and calculating a key frame distance according to the three-dimensional human body skeletons of the two adjacent video frames, and segmenting the motion video according to the key frame distance to obtain segmented frames and corresponding three-dimensional human body skeletons.

Further, the method also comprises the steps of training joint quaternion of each segmented frame through a hidden Markov model, and identifying abnormal behaviors.

In order to achieve the second purpose, the invention provides a human body classification device based on joint quaternion, which comprises a video acquisition module, a segmentation module, a classification identification module and an output module;

the video acquisition module is used for acquiring a motion video and sending the motion video to the segmentation module;

the segmentation module is used for segmenting the motion video to obtain segmented frames and corresponding three-dimensional human body skeletons; and sending the data to the classification identification module and the output module;

the classification identification module identifies walking or running results according to the method and sends the walking or running results to the output module, and the output module is used for displaying and outputting the identification results and the three-dimensional human body skeleton of the segmented frame.

As a further improvement, the video acquisition module is a Kinect depth camera.

Advantageous effects

Compared with the prior art, the invention has the advantages that:

according to the invention, the walking or running result is classified and identified by acquiring the joint quaternion of the moving human body and selecting five parameters from the joint quaternion to form the characteristic vector M, so that the accuracy is high, the robustness is good, and the identification speed is high.

Drawings

FIG. 1 is a flow chart of the present invention;

FIG. 2 is a schematic view of the knee joint angle transformation according to the present invention;

FIG. 3 is a diagram of the selection of the joint point according to the present invention;

FIG. 4 is a schematic diagram of real-time display of a human skeleton according to the present invention;

FIG. 5 is a schematic representation of training sample behavior in accordance with the present invention;

FIG. 6 is a schematic representation of the behavior of a test sample in accordance with the present invention;

FIG. 7 is a schematic view of a multi-target human skeleton according to the present invention;

FIG. 8 is a schematic diagram of the human skeleton of normal/abnormal behavior in the present invention;

FIG. 9 is a flowchart illustrating an overall process of the video interframe distance segmentation model according to the present invention;

FIG. 10 is a diagram illustrating an interframe distance curve according to the present invention;

FIG. 11 is a diagram showing the results of the DBKF algorithm of the present invention.

Detailed Description

The invention will be further described with reference to specific embodiments shown in the drawings.

Referring to fig. 1 to 11, a method for classifying a human body based on a joint quaternion includes:

establishing a standard body type library, wherein the standard body type library is established according to a health human body standard of a country or a region, if height, weight, waistline and the like have reference values corresponding to standards, the three-dimensional human body skeleton is filled by using the standard body type library to obtain a verification human body and a verification frame, the verification human body in the verification frame is compared with the moving human body in the segmentation frame, and if the region of the verification human body is coincident with the region of the moving human body, the segmentation frame and the three-dimensional human body skeleton of the moving human body are reserved; otherwise, neglecting the segmented frame and the three-dimensional human body skeleton of the moving human body; the segmentation frames with inaccurate three-dimensional human body skeletons or large errors can be filtered out, so that the classification precision is improved;

calculating according to the three-dimensional information of the three-dimensional human body skeleton to obtain joint quaternion of the left leg knee joint, the right leg knee joint and one elbow joint; one of the elbow joints is an elbow joint close to the image acquisition equipment;

selecting five parameters from the joint quaternion to form a characteristic vector M [ Lkga, RKga, Ea, Lka, RKa ], wherein the Lkga and the RKga respectively represent the average value of the slope of the rotation angle of the left knee and the average value of the slope of the rotation angle of the right knee, the Ea represents the rotation average value of the elbow joint in a sampling period, and the Lka and the RKa respectively represent the average value of the rotation angle of the left knee and the average value of the rotation angle of the right knee;

and carrying out motion classification on the motion human body in the motion video according to the characteristic vector M to identify a walking or running result.

Human walking gait is a typical cycle with the knee angle shown in figure 2. The main difference between the two movements is found by observing the running and walking postures of people in the video, namely the movement of the lower limbs, particularly the knee joints, so that the main characteristic points of the knee joints are selected to distinguish the two movement postures. The expression points of the rotary joints are shown in fig. 3, assuming that the direction from the origin to the Y-axis is camera shooting, and then left elbow and knee rotation information is selected.

The age groups with different gait characteristics of normal and healthy adults perform comparison tests on the performance data of different groups of samples in the gait periods, and the test results can analyze that the rotation angle characteristics of knee joints of adults with different ages and different sexes are consistent in the walking process, which is shown in a table 1. The reliability of the selection of the knee joint as the main reference joint point in the present patent is thus also demonstrated.

TABLE 1

Taking the knee joint as an example, the process of how to acquire the joint point quaternion in each frame of image and generating the feature point parameterized time series is explained. The quaternion can visually represent the posture information, the fixed length is very suitable for linear interpolation between key frames, the quaternion is used for human body posture recognition, a posture database is gradually established, and the method can be used for virtual character posture simulation control.

The process of calculating the joint quaternion of the knee joint is as follows:

taking the coordinates (x) of the joint point at the top of the thigh_hip，y_hip，z_hip) Knee joint coordinate (x)_knee，y_knee，z_knee) Foot joint coordinate (x)_ankle，y_ankle，z_ankle) And calculating to obtain a vector V1 ═ x_knee-x_hip，y_knee-y_hip，z_knee-z_hip]Vector V2 ═ x_ankle-x_knee，y_ankk-y_knee，z_ankle-z_knee]；

Vector V1 and vector V2 are normalized:

vector V1_NEWVector V2_NEWMake the cross product to obtain the rotationRotation axis vector axis [ axis (1), axis (2), axis (3)]，

axis(1)＝(y_knee-y_hip)(z_ankle-z_knee)-(y_ankle-y_knee)(z_knee-z_hip)

axis(2)＝(z_knee-z_hip)(x_ankle-x_knee)-(z_ankle-z_knee)(x_knee-x_hip)

axis(3)＝(x_knee-x_hip)(y_ankle-y_knee)-(y_knee-y_hip)(x_ankle-x_knee).

which represent the rotation of the knee joint point in the X-Y plane, Y-Z plane, Z-X plane, respectively. The process of calculating the joint quaternion for the elbow joint is similar to the process described above.

FIG. 4 shows a control for displaying quaternion information of human skeleton and characteristic joints in real time by using WPF model in VS, wherein the testers are respectively in walking and running postures, and quaternions such as quaternion in each figure

Shown in Table 2, wherein L represents the left side and R represents the right side.

TABLE 2

As can be seen from table 2, the quaternion data included in the three feature points selected in each frame of picture has 12 parameters, the acquisition speed of the Kinect sensor is 30 frames/second, the adult step frequency is about 95-125 steps/minute, the time required for obtaining a walking cycle is 0.96-1.263 seconds, and the Kinect sensor acquires 28-38 continuous motion frames in a walking cycle. That is, if the whole parameterized time series is used, there are 336-456 parameters as feature vectors of a certain sample in each walking cycle to use the support vector machine for gesture recognition. During the course of the experiments, it has proved that this method is capable of gesture recognition, but it is obvious that it is subject to a high computational complexity. In order to reduce the computational complexity, each sample only takes a feature vector with five parameters to perform posture distinguishing and identifying work, and good experimental effects are also obtained. Namely, five parameters are selected from the joint quaternion to form a feature vector M [ Lkga, RKga, Ea, Lka, RKa ], wherein the first two parameters Lkga and RKga respectively represent that the average value of the left knee rotation angle slope and the average value of the right knee rotation angle slope are an important measure for selecting the slopes, because the average is an effective measure for greatly reducing the number of the parameters, but if the average is simply taken, the time is ignored, and the time is an important reference variable for the walking and running postures with the periodic characteristics. The third parameter Ea represents the rotation average value of the left elbow or the right elbow in the sampling period, and the main function is to assist in distinguishing the motion postures of the jogging and fast walking which are extremely similar to the motion postures of the two lower limbs. LKa, RKa represent the average of the left knee rotation angle and the average of the right knee rotation angle.

Since the rotation information of the selected feature points is hardly influenced by age and sex factors, the gait posture of the adult female is selected as the training sample in the invention patent. In the samples, 40 normal walks, 20 fast walks, 40 slow runs and 20 normal runs were performed, and in 120 samples, the testers faced the camera with their side bodies, and the side body directions were random. Each sample is about 4s, and since the walking rate is random, each sample contains at least 1.5 gait cycles at random. The training samples are shown in fig. 5, and the training joint quaternion data are shown in table 3.

TABLE 3

The test specimens were completely independent of the trainee in the training set, including 25 walkings and 25 runs. The test specimens are shown in fig. 6.

In the invention, a Kinect for windows V2 depth sensor is used as an experimental instrument, and the sensors are all arranged in a direction perpendicular to the walking direction of a tester. Video image processing, depth image, skeleton and rotation information acquisition and processing are carried out by using a VS programming tool and matching with a WPF module in a C # environment, and an MATLAB platform is used for SVM model training and behavior recognition.

And randomly selecting a sample in the attitude test sample database as a test sample for identification. The multi-target human skeleton under test is shown in fig. 7. The result shows that the perfect performance of the support vector machine algorithm on the binary classification problem and the simplified and accurate expression of the joint quaternion on the human motion behaviors are adopted, the joint quaternion is used as a characteristic to identify the walking and running two simple motion behaviors in an experimental environment, and the average accuracy is more than 98%.

Detecting and segmenting each moving body in the moving video comprises the following steps:

segmenting the motion video to obtain each video frame;

establishing a three-dimensional human body skeleton of each human body image;

and calculating the distance of the key frame according to the three-dimensional human body frameworks of the two adjacent video frames, and segmenting the motion video according to the distance of the key frame to obtain segmented frames and corresponding three-dimensional human body frameworks.

And (3) taking the absolute value of the knee joint angle difference of the skeleton images of two adjacent video frames to obtain the time distance:

DIFFKneeAngle(i)＝|KneeAngle(i+1)-KneeAngle(i)|；

obtaining the space distance by taking the absolute value of the difference of the three-dimensional coordinate X-axis coordinate values of the spine midpoint of the skeleton images of two adjacent video frames:

DIFFSpinZ(i)＝|SpinZ(i+1)-SpinZ(i)|；

using a normalization method:

converting the time distance and the space distance to (0, 1) respectively to obtain diffkneeagle (i) and diffSpinZ (i);

calculating a key frame distance DBKF ═ diffkneeagle (i) + diffspinz (i), and making an interframe distance curve according to the key frame distance DBKF, as shown in fig. 10, obtaining an extreme value P of the interframe distance curve;

setting a filter value F, and calculating an extreme amplification value G ═ P-F²100, the extreme points far away from the filtering value F can be greatly amplified for filtering the extreme points near the filtering value F to reduce the number of frames for subsequent segmentation;

setting a segmentation threshold, when the absolute value of the difference value between two adjacent extreme value amplification values G of the inter-frame distance curve is greater than or equal to the segmentation threshold, using the difference value as a segmentation point of continuous action, obtaining a segmentation frame skeleton image according to the segmentation point, and obtaining the segmentation frame skeleton image as shown in FIG. 11.

The human body region is divided by two modes: the first mode is that a wavelet analysis method is adopted to predict and position a face area to be detected in a video frame through the pre-detected face color distribution, and then a region growing method is adopted to segment all human body images;

the second mode is to complete the construction of the human body image model, and then further segment the human body image by using a human body region matching method according to the human body image model and combining an image segmentation method.

The second way includes: constructing a three-dimensional human body model according to the representations of the head, the trunk and the limbs of the body in a naive Bayes frame by utilizing a plurality of experimental samples in advance; then, the three-dimensional human body model is used for segmenting the human body region in the image; finally, obtaining the human body contour boundary by using a method for solving the optimized parameters; and realizing human body segmentation by using a Grab Cut segmentation algorithm on the basis of a human body database.

The invention also comprises training the joint quaternion of each segmented frame through a hidden Markov model and identifying abnormal behaviors. The normal behavior is defined as walking or running, and the abnormal behavior corresponding to the normal behavior is defined as unreasonable or unusual human body motion behavior such as jumping or squatting, and the abnormal behavior can be effectively recognized by the hidden markov model, as shown in fig. 8.

A human body classification device based on joint quaternion comprises a video acquisition module, a segmentation module, a classification identification module and an output module;

the segmentation module is used for segmenting the motion video to obtain segmented frames and corresponding three-dimensional human body skeletons; and sending the data to a classification identification module and an output module;

the classification identification module identifies walking or running results according to the method and sends the walking or running results to the output module, and the output module is used for displaying and outputting the identification results and the three-dimensional human body skeleton of the segmented frames.

The video acquisition module is a Kinect depth camera.

The Kinect depth camera has the advantages that the requirement on the ambient illumination condition is not high; the method is insensitive to color and texture, can solve the ambiguity of the gesture outline, and simplifies the preprocessing link of background removal. The inter-frame distance of the depth image is defined, a motion human body behavior segmentation algorithm based on the video inter-frame Distance (DBKF) is provided, the inter-frame joint rotation difference of the human body is constructed to be a time distance based on rotation data in a human body three-dimensional space, the spine key inter-frame position difference is constructed to be a space distance, the simple and convenient continuous motion segmentation of a single motion target is realized, and the accuracy and the robustness are high.

The above is only a preferred embodiment of the present invention, and it should be noted that it is obvious to those skilled in the art that several variations and modifications can be made without departing from the structure of the present invention, which will not affect the effect of the implementation of the present invention and the utility of the patent.

Claims

1. A human body classification method based on joint quaternion is characterized by comprising the following steps:

selecting five parameters from the joint quaternion to form a characteristic vector M [ LKga, RKga, Ea, Lka, RKa ], wherein the LKga and the RKga respectively represent an average value of a left knee rotation angle slope and an average value of a right knee rotation angle slope, Ea represents a rotation average value of an elbow joint in a sampling period, and the Lka and the RKa respectively represent an average value of a left knee rotation angle and an average value of a right knee rotation angle;

2. The human body classification method based on the joint quaternion as claimed in claim 1, wherein the process of calculating the joint quaternion of the knee joint is as follows:

Vector V1 and vector V2 are normalized:

axis(1)＝(y_knee-y_hip)(z_ankle-z_knee)-(y_ankle-y_knee)(z_knee-z_hip)

axis(2)＝(z_knee-z_hip)(x_ankle-x_knee)-(z_ankle-z_knee)(x_knee-x_hip)

axis(3)＝(x_knee-x_hip)(y_ankle-y_knee)-(y_knee-y_hip)(x_ankle-x_knee)；

Y(θ)，

3. The method of claim 1, wherein the one of the elbow joints is an elbow joint near the image capturing device.

4. The method of claim 1, wherein the detecting and segmenting each moving body in the moving video comprises:

segmenting the motion video to obtain each video frame;

establishing a three-dimensional human body skeleton of each human body image;

5. The method of claim 1, further comprising training the joint quaternion of each segmented frame by a hidden Markov model and identifying abnormal behavior.

6. A human body classification device based on joint quaternion is characterized by comprising a video acquisition module, a segmentation module, a classification identification module and an output module;

the classification recognition module recognizes walking or running results according to the method of claim 1 and sends the results to the output module, and the output module is used for displaying and outputting the recognition results and the three-dimensional human body skeleton of the segmented frames.

7. The device for classifying human bodies according to claim 6, wherein the video acquisition module is a Kinect depth camera.