CN113642379B - Human body posture prediction method and system based on attention mechanism fusion multi-flow diagram - Google Patents

Human body posture prediction method and system based on attention mechanism fusion multi-flow diagram Download PDF

Info

Publication number
CN113642379B
CN113642379B CN202110539624.0A CN202110539624A CN113642379B CN 113642379 B CN113642379 B CN 113642379B CN 202110539624 A CN202110539624 A CN 202110539624A CN 113642379 B CN113642379 B CN 113642379B
Authority
CN
China
Prior art keywords
human body
sequence
graph
output
neural network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110539624.0A
Other languages
Chinese (zh)
Other versions
CN113642379A (en
Inventor
袁丁
曹哲
魏晓东
尹继豪
张雪怡
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beihang University
Original Assignee
Beihang University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beihang University filed Critical Beihang University
Priority to CN202110539624.0A priority Critical patent/CN113642379B/en
Publication of CN113642379A publication Critical patent/CN113642379A/en
Application granted granted Critical
Publication of CN113642379B publication Critical patent/CN113642379B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to a human body posture prediction method and a system based on a multi-flow graph neural network fused by an attention mechanism, wherein the method comprises the following steps: s1: acquiring a three-dimensional position data sequence of a key joint of a human body for training, and dividing the three-dimensional position data sequence into an input sequence and an output sequence according to the lengths of the preset input sequence and the preset output sequence; constructing graph data according to the input sequence; s2: constructing a multi-flow graph neural network model based on attention mechanism fusion; inputting the graph data into the model for training to obtain a trained model; s3: and acquiring a three-dimensional position data sequence of a human body key joint for prediction, constructing graph data, inputting a trained multi-flow graph neural network model based on an attention mechanism, and obtaining a predicted value of the human body posture. The method provided by the invention constructs a plurality of graph models based on the position data and the structural characteristics of the joints of the human body, realizes modeling of a human body movement system, predicts the posture of the human body and achieves higher accuracy.

Description

Human body posture prediction method and system based on attention mechanism fusion multi-flow diagram
Technical Field
The invention relates to the field of computer vision and image processing, in particular to a human body posture prediction method and system based on a multi-flow-graph neural network fused by an attention mechanism.
Background
In recent years, with the wide use of consumer-level depth cameras with lower cost, the low-cost and real-time acquisition of three-dimensional motion gestures of a human body becomes possible, so that the human body gesture prediction becomes a hot spot problem of intersection of graphics and computer vision, and has wide application prospects and rich application scenes in the fields of robots, medical fields and automatic driving technologies.
In the robot field, the related technology of the exoskeleton robot is a great hot spot for research, has very important application in aerospace, is considered to be key supporting equipment for carrying out on-orbit construction, maintenance and lunar surface and Mars surface activities, and plays an extremely important role in completing an aerospace cabin-leaving task and disposing emergency faults. The human body posture prediction algorithm judges and predicts the movement intention of the astronaut through identifying and analyzing the human body movement law, and assists the exoskeleton robot to complete force compliance control, so that force mutation in the exoskeleton robot control is reduced, and the adaptability of the astronaut to the movement of the astronaut is effectively improved.
Motion capture technology has been known as early as fifty years ago, and is widely used in entertainment industries such as video and games, and also in medical and health industries to help patients with movement disorders. However, at present, people still cannot separate from real motion capture data to generate realistic cloud data which accords with the structural characteristics of human bodies and the motion balance state, but the current mainstream motion capture technology is still based on an inertial sensor or a complex optical sensing system, has the disadvantages of high manufacturing cost, professional field and the like, and has a great distance from 'flying into common people' and benefiting patients who need to rely on auxiliary instruments to complete daily exercises. The human body posture prediction algorithm is applied to the motion generation technology, so that the utilization rate of motion capture data can be greatly improved, and the post-processing workload of the motion data can be reduced. With the increasingly popular time-of-flight cameras, the human body posture prediction algorithm can be more easily applied to the field of human-computer interaction, for example, the human body posture prediction algorithm guides the subject to complete correct actions according to the collected actions of the subject, so as to assist the rehabilitation of the patient suffering from the limb cognitive defect caused by the neurodegenerative disease.
Therefore, the human body posture prediction has important application value, but as the motion of the human body has more than 200 degrees of freedom, and the degrees of freedom of all joints of the human body have great relativity, the traditional methods such as inverse kinematics, dynamics and the like are very difficult to simulate the motion of the human body, an accurate human body motion model cannot be built, the complexity of the motion law of the human body is very high, and particularly, the balance coordination mechanism of the motion of the human body is difficult to simulate. This presents great difficulties for human motion prediction based on conventional statistical learning methods.
In recent years, with the development of computer hardware and machine learning algorithms, human body prediction methods based on deep learning have been proposed. In the deep learning model-based method, researchers do not need to pay attention to the cognitive abilities of different behaviors such as complex kinematic constraints and parameters, only need to construct a deep neural network model, then a great amount of exercise data are trained, certain potential exercise characteristics in the exercise data of the human body can be learned, and the next action track of a person can be predicted according to the potential exercise characteristics. Early deep learning models were mainly based on recurrent and convolutional neural networks, but had certain drawbacks: the cyclic neural network emphasizes the time sequence relation of the sequence and ignores the space information; the convolutional neural network works to make single frame skeleton data construct one-dimensional vector, consider the sequence as two-dimensional matrix, focus on the position change of single joint along with time, neglect the correlation among all joints of human body, and can not fully utilize the topological structure information of human body.
As a deep learning model which has been paid attention to in recent two years, a graph neural network is a neural network which is specially used for processing graph data, and is applied to various research fields such as recommendation systems, reasoning proof, chemistry, traffic, brain-like intelligence and the like. The latest work proposes a method based on a graph neural network, and three-dimensional bones of a human body are represented by graphs, so that priori structural information of the human body bones can be well utilized. The conventional method based on the graph neural network simply constructs an adjacency matrix according to the space adjacency relation of human joints, however, the generation mode of human actions is not considered, so that the connection of human joints in motion cannot be fully represented. Human body movement is accomplished by the cooperation of a number of joints in the kinematic chain that are interrelated, rather than by only a few adjacent joints. And ergonomic based studies have shown that human kinematic chains typically contain three degrees of freedom joints and two degrees of freedom joints, and that these two joints tend to alternate in the kinematic chain. Joints with different degrees of freedom in the kinematic chain have different stress characteristics in motion. Therefore, constructing the adjacency matrix according to the spatial adjacency relationship of the human joint alone does not fully utilize the prior information on the structure and kinematics of the human joint.
Therefore, how to fully utilize the prior information on the structure and the kinematics of the human joint becomes a urgent problem
Disclosure of Invention
In order to solve the technical problems, the invention provides a human body posture prediction method and a human body posture prediction system based on a multi-flow-graph neural network fused by an attention mechanism.
The technical scheme of the invention is as follows: a human body posture prediction method based on a multi-flow graph neural network fused by an attention mechanism comprises the following steps:
step S1: acquiring a three-dimensional position data sequence of a key joint of a human body for training, and dividing the three-dimensional position data sequence into an input sequence and an output sequence according to the lengths of the preset input sequence and the preset output sequence; constructing graph data according to the input sequence;
step S2: constructing a multi-flow graph neural network model based on attention mechanism fusion; inputting the graph data into a multi-flow graph neural network model based on an attention mechanism for training, and obtaining a trained multi-flow graph neural network model based on the attention mechanism;
step S3: and acquiring a three-dimensional position data sequence of a human body key joint for prediction, constructing graph data, inputting a trained multi-flow graph neural network model based on an attention mechanism, and obtaining a predicted value of the human body posture.
Compared with the prior art, the invention has the following advantages:
1. according to the human body posture prediction method based on the multi-flow graph neural network, provided by the invention, a plurality of graph models are constructed based on the human body joint position data and the structural characteristics, so that modeling of a human body movement system is realized, the human body posture is predicted, and higher accuracy is achieved. Meanwhile, the multi-flow graph neural network is used for extracting joint movement characteristics, and global information is fused by using the attention model.
2. The invention overcomes the defect of larger error of the existing human body posture prediction method for severe movement and long-term movement, and obtains better experimental results; the method has the advantages of simple network structure and real-time operation, can realize real-time prediction of the human body posture, and is beneficial to practical application. Compared with other existing human body posture prediction methods, the method provided by the invention has better performance when performing medium-long motion prediction of 500-1000 milliseconds and when performing intense motion.
Drawings
FIG. 1 is a flowchart of a human body posture prediction method based on a multi-flow graph neural network fused by an attention mechanism in an embodiment of the invention;
fig. 2 is a block diagram of step S1 in a human body posture prediction method based on a multi-flow graph neural network fused by an attention mechanism in an embodiment of the present invention: acquiring a three-dimensional position data sequence of a key joint of a human body for training, and dividing the three-dimensional position data sequence into an input sequence and an output sequence according to the lengths of the preset input sequence and the preset output sequence; constructing a flow chart of graph data according to the input sequence;
FIG. 3 is a schematic structural diagram of a multi-flow graph neural network model based on attention mechanism fusion in an embodiment of the invention;
fig. 4 shows a step S2 in a human body posture prediction method based on a multi-flow-graph neural network fused by an attention mechanism in the embodiment of the present invention: constructing a multi-flow graph neural network model based on attention mechanism fusion; inputting the graph data into a multi-flow graph neural network model based on an attention mechanism for training, and obtaining a trained flow chart of the multi-flow graph neural network model based on the attention mechanism;
FIG. 5 is a schematic diagram of a structure of an attention module in a multi-flow graph neural network based on attention mechanism fusion in an embodiment of the present invention;
fig. 6 shows a step S3 in a human body posture prediction method based on a multi-flow-graph neural network fused by an attention mechanism in the embodiment of the present invention: acquiring a three-dimensional position data sequence of a human body key joint for prediction, constructing graph data, inputting a trained multi-flow graph neural network model based on an attention mechanism, and obtaining a flow chart of a predicted value of a human body posture;
fig. 7 is a block diagram of a human body posture prediction system based on a multi-flow-graph neural network fused by an attention mechanism in an embodiment of the invention.
Detailed Description
The invention provides a human body posture prediction method based on a multi-flow-graph neural network fused by an attention mechanism, which aims at solving the problem that the prior art cannot fully utilize the prior information on the structure and kinematics of human joints, and provides a human body posture prediction method based on a multi-flow-graph neural network fused by an attention mechanism, so that modeling of a human body movement system is realized, the human body posture is predicted, and higher accuracy is achieved.
The present invention will be further described in detail below with reference to the accompanying drawings by way of specific embodiments in order to make the objects, technical solutions and advantages of the present invention more apparent.
Example 1
As shown in fig. 1, the human body posture prediction method based on the attention mechanism fusion multi-flow graph neural network provided by the embodiment of the invention comprises the following steps:
step S1: acquiring a three-dimensional position data sequence of a key joint of a human body for training, and dividing the three-dimensional position data sequence into an input sequence and an output sequence according to the lengths of the preset input sequence and the preset output sequence; constructing graph data according to the input sequence;
step S2: constructing a multi-flow graph neural network model based on attention mechanism fusion; inputting the graph data into a multi-flow graph neural network model based on an attention mechanism for training, and obtaining a trained multi-flow graph neural network model based on the attention mechanism;
step S3: and acquiring a three-dimensional position data sequence of a human body key joint for prediction, constructing graph data, inputting a trained multi-flow graph neural network model based on an attention mechanism, and obtaining a predicted value of the human body posture.
As shown in fig. 2, step S1 is as follows: acquiring a three-dimensional position data sequence of a key joint of a human body for training, and dividing the three-dimensional position data sequence into an input sequence and an output sequence according to the lengths of the preset input sequence and the preset output sequence; constructing graph data according to the input sequence, wherein the graph data specifically comprises:
step S11: acquiring a data sequence f multiplied by n multiplied by 3 of the three-dimensional position of a key joint of a human body for training, wherein f represents the number of frames of the data sequence, and n represents the number of joints;
the training data sequence used in the embodiment of the invention can be a data sequence in a public data set (such as a Human 3.6m data set and a CMU Mocap data set) or a Human motion data sequence acquired based on an RGB-D camera or other equipment. The obtained key joints or key points of the human body generally comprise at least 20 positions such as head, neck, chest, waist, left and right shoulder joints, left and right elbow joints, left and right wrist joints, left and right hand parts, left and right hip joints, left and right knee joints, left and right ankle joints, left and right foot parts and the like. A three-dimensional position data sequence can be expressed as an fxnx 3 matrix, where f represents the number of frames of the sequence, n represents the number of joints included in the data, and the three-dimensional coordinates of a joint in a world coordinate system in a certain frame are noted as (X W ,Y W ,Z W ). For joints which are not captured due to occlusion or other reasons in the acquisition process, the three-dimensional sitting of the joints is marked as (0, 0) in order to ensure the size of the matrix to be fixed and facilitate subsequent processing.
Step S12: taking the first t frame data of a data sequence as an input sequence and the (f-t) frame data as the output sequence; wherein the input sequence is denoted as t×n×3; the output sequence is expressed as (f-t) ×n×3;
and dividing the three-dimensional position data sequence into an input sequence and an output sequence according to the lengths of the preset input sequence and the preset output sequence. For example, for a three-dimensional position data sequence of critical human joints with the size of f×n×3 for training, setting the previous t frame data for input, and generating a t×n×3 input sequence; the desired output is (f-t) frames, and the following (f-t) frame data of the three-dimensional position data sequence is taken as the desired output, namely, an output sequence of (f-t) multiplied by n multiplied by 3 is generated.
Step S13: constructing a node matrix V and an adjacent matrix A according to an input sequence t multiplied by n multiplied by 3, and constructing a full connection graph G by the node matrix V and the adjacent matrix A all =(V,A all ) High degree of freedom articulation graph G mobile =(V,A mobile ) And a low degree of freedom articulation graph G stable =(V,A stable ) The method comprises the steps of carrying out a first treatment on the surface of the Wherein A is all Full-connection adjacency matrix representing all inter-joint relations, A mobile High-degree-of-freedom joint adjacency matrix representing relationship between high-degree-of-freedom joints, A stable A low degree of freedom joint adjacency matrix representing low degree of freedom inter-joint links.
The input sequence of txn×3 obtained in step S12 is arranged to be nxx (tx3) according to the joint, and a matrix V is constructed. Each joint is taken as each node of the graph data, namely, the characteristic of each node in the graph data can be represented by a matrix of t multiplied by 3. The degree of freedom of the human body joint means the number of directions in which the joint can move, for example, the degree of freedom of the joint such as a ball joint and a condyloid joint is 3, and the degree of freedom of the joint such as a saddle joint and a flexion joint is 2. These joints have different kinematic and dynamic characteristics. According to the degrees of freedom of the human body joints, three different adjacency matrices can be constructed: full-connection adjacency matrix A respectively representing all inter-joint connection relations all High-degree-of-freedom joint adjacency matrix A representing relationship between high-degree-of-freedom joints mobile Low degree of freedom joint adjacency matrix A representing low degree of freedom inter-joint connections stable The method comprises the steps of carrying out a first treatment on the surface of the The adjacency matrix is used to represent nodes in the graph dataThe embodiment of the invention constructs the adjacency matrix according to the following mode:
for the ith joint of n joints, if its degree of freedom is 3, i.e. the joint can perform movements in yaw, pitch, rotation angles in three dimensions, it will be considered a high degree of freedom joint, hof i =1, otherwise hof i =0, wherein hof is high degrees of freedom.
Thus, for a full-connection adjacency matrix A of size n all For A all Element A of (3) all(i,j) The following formula (1) is defined:
A all(i,j) =1 (1)
for a high-freedom joint adjacent matrix A with the size of n multiplied by n mobile For A mobile Element A of (3) mobile ( i,j) The following formula (2) is defined:
for a low-freedom-degree joint adjacent matrix A with the size of n multiplied by n stable For A stable Element A of (3) stable ( i,j) The following formula (3) is defined:
the number of nodes in the three groups of graph data is consistent with the corresponding characteristic of each node, and the difference is that the adjacent matrixes corresponding to the three groups of graph data are A respectively all 、A mobile And A stable . The input sequence of n×t×3 is recorded as node matrix V, and according to the above-mentioned three adjacent matrices, the three graph data which can be constructed are respectively full-connection graph G all =(V,A all ) High degree of freedom articulation graph G mobile =(V,A mobile ) Low degree of freedom articulation graph G stable =(V,A stable )。
As shown in fig. 3, the attention mechanism based fusion multi-flow graph neural network in the embodiment of the invention adopts the structures of an encoder and a decoder, the graph data constructed in the step S1 is input into the encoder for feature extraction, the processing result is an hidden variable H, the hidden variable H is input into the decoder, and the output of the decoder is the predicted value of the human body posture. And taking the average error of the predicted value and the true value as a loss function, and training the parameters of the model by using a gradient descent method. The specific training steps are as follows:
as shown in fig. 4, in one embodiment, step S2 described above: constructing a multi-flow graph neural network model based on attention mechanism fusion; inputting the graph data into a multi-flow graph neural network model based on an attention mechanism for training, and obtaining a trained multi-flow graph neural network model based on the attention mechanism, wherein the method specifically comprises the following steps of:
step S21: the multi-flow graph neural network based on attention mechanism fusion adopts an encoder and decoder structure, wherein the encoder comprises a plurality of encoding modules; the graph data input encoder performs feature extraction, and the output of the encoding module is shown in the following formula (4):
Out i =Att i (GCN alli (G all ),GCN stablei (G stable ),GCN mobilei (G mobile )) (4)
wherein, a plurality of coding modules are connected in series, and the output Out of the ith coding module is output i =(G alli ,G stablei ,G mobilei ) Comprising three graph data; wherein GCN (·) represents a single graph neural network layer; att i (·) represents the ith attention module;
step S22: a single graph neural network layer, whose input g= (V, a), output X, is expressed as the following formula (5):
X=ReLU(AVW+VU) (5)
wherein W and U are trainable weight matrixes, the size of the weight matrixes is (t multiplied by 3) multiplied by D, and D is the characteristic dimension of the output of the neural network layer of the expected graph;
step S23: the input of the attention module is G 1 =(V 1 ,A all ),G 2 =(V 2 ,A stable ),G 3 =(V 3 ,A mobile ) The output is: x' = (G) out1 ,G out2 ,G out3 ) As shown in fig. 5, the attention module structure is schematically represented by the following formulas (6) to (8):
V mid4 =GCN 2 (CONCAT(V mid1 ,V mid2 ,V mid3 )) (7)
wherein CONCAT (·) represents a matrix stitching operation in a feature dimension; softMax (·) represents the normalized exponential function; w (W) 1,1 W 1,2 ……W 3,2 Etc. represent trainable parameters of different linear layers; v (V) mid1 、V mid2 、V mid3 、V mid4 Is an intermediate variable.
The above is one of a plurality of coding modules connected in series in the encoder, and the model structures of the modules are the same, but the parameters are independent from each other, so that the model structures can be independently trained.
Step S24: the output hidden variable H of the encoder is shown in the following formula (9):
H=λ 1 V allf2 V stablef3 V mobilef (9)
wherein the output of the last coding module is Out f =(G allf ,G stablef ,G mobilef ),G allf =(V allf ,A all ),G stablef =(V stablef ,A stable ),G mobilef =(V mobilef ,A mobile );λ 1 、λ 2 、λ 3 Respectively configurable parameters.
Based on the attention mechanism, a decoder in the multi-flow graph neural network is fused, the graph gating loop network is used for recursively predicting, the input of the decoder is an output hidden variable H of the encoder, and the decoder is an n multiplied by d matrix, wherein d is a feature dimension.
Step S25: the hidden variable H is input to a decoder, and the human body posture is predicted by the decoder, and is expressed as the following formula (10):
Out T+1 =Out T +f pred (GRU(Out T ,H T )) (10)
wherein, at the moment T, the posture of the human body is Out T ;f pred (. Cndot.) is a graph representing a multi-layer perceptron, GRU (-) represents a graph gating loop network, H T Is the hidden variable at this time; out is provided with T+1 Is a predicted value for the human posture at time t+1.
At the initial moment, i.e. t=0, out 0 Initialized to the last frame of the input sequence, i.e. the last n 3 matrix in the t x n x 3 input sequence, H 0 Initialized to the hidden variable H.
The operation of the graph-gated loop network can be expressed by the following formulas (11) to (14):
r T =σ(r in (Out T )+r hid (A all H T W H )) (11)
u T =σ(u in (Out T )+u hid (A all H T W H )) (12)
c T =tanh(c in (Out T )+r T ⊙c hid (A all H T W H )) (13)
H T+1 =u T ⊙H T +(1-u T )⊙c T (14)
wherein r is in ,r hid ,u in ,u hid ,c in ,c hid Representing a trainable linear layer; w (W) H Representing a trainable weight matrix; a is that all Is an adjacency matrix of the full connection graph; h T+1 Gating the output of the cyclic network for the graph, i.e. H T+1 =GRU(Out T ,H T )。
When the hidden variable H is input, the decoder takes the predicted value of the last moment as input due to the recursion characteristic of the graph gating loop network, can recursively generate the predicted value of the human body gesture action sequence with any frame length, and outputs (f-t) frame data which is recorded as Output.
Step S26: the parameters were trained using the gradient descent method, and the loss function was set as shown in the following equation (15):
wherein the Output sequence of the decoder is data of (f-t) frame, and the Output size of the data is (f-t) multiplied by n multiplied by 3, and the Output is gt Is the desired output.
As shown in fig. 6, in one embodiment, step S3 described above: acquiring a three-dimensional position data sequence of a human body key joint for prediction, constructing graph data, inputting a trained multi-flow graph neural network model based on an attention mechanism, and obtaining a predicted value of a human body posture, wherein the method comprises the following steps:
step S31: according to the step S1, a three-dimensional position data sequence of a key joint of a human body for prediction is obtained, and the graph data is constructed;
according to the same manner as the three-dimensional position data sequence of the human critical joint for training in step S1, the three-dimensional position data sequence of the human critical joint is first acquired, expressed as a txnx3 matrix, arranged as nxx (tx3) according to the joint, and expressed as a matrix V. The three constructed graph data are respectively full-connection graph G all =(V,A all ) High degree of freedom articulation graph G mobile =(V,A mobile ) Low degree of freedom articulation graph G stable =(V,A stable )。
Step S32: inputting a multi-flow graph neural network model based on the attention mechanism which is trained based on the step S2, and extracting features by an encoder to obtain an invisible variable H as a result; and inputting the H into a decoder, and outputting to obtain the predicted value of the human body posture.
As shown in the following Table 1, through experimental verification, the embodiment of the invention can be used for predicting various actions on the disclosed data set CMU Mocap to obtain higher precision.
Table 1 mean joint angle error comparison of the method of the present invention with other methods on CMU Mocap dataset
* The subscripts represent the prediction accuracy ranking of the different methods, with the prediction duration units being milliseconds.
As shown in table 2, it was experimentally verified that the average accuracy of all actions was optimized over the published dataset Human 3.6M using the present method.
Table 2 mean joint angle error comparison of the method of the present invention with other methods on a Human 3.6M dataset
* The subscripts represent the prediction accuracy ranking of the different methods, with the prediction duration units being milliseconds.
According to the human body posture prediction method based on the multi-flow graph neural network, provided by the invention, a plurality of graph models are constructed based on the human body joint position data and the structural characteristics, so that modeling of a human body movement system is realized, the human body posture is predicted, and higher accuracy is achieved. And extracting joint motion characteristics by using a multi-flow graph neural network, and fusing global information by using an attention model. The invention overcomes the defect of larger error of the existing human body posture prediction method for severe movement and long-term movement, and obtains better experimental results; the method has the advantages of simple network structure and real-time operation, can realize real-time prediction of the human body posture, and is beneficial to practical application. Compared with other existing human body posture prediction methods, the method provided by the invention has better performance when performing medium-long motion prediction of 500-1000 milliseconds and when performing intense motion.
Example two
As shown in fig. 7, an embodiment of the present invention provides a human body posture prediction system based on a multi-flow graph neural network fused by an attention mechanism, which includes the following modules:
the training data acquisition module 41 is configured to acquire a three-dimensional position data sequence of a key joint of a human body for training, and divide the three-dimensional position data sequence into an input sequence and an output sequence according to lengths of a preset input sequence and an output sequence; constructing graph data according to the input sequence;
model training module 42 for constructing a fused multi-flow graph neural network model based on an attention mechanism; inputting the graph data into a multi-flow graph neural network model based on an attention mechanism for training, and obtaining a trained multi-flow graph neural network model based on the attention mechanism;
the human body posture prediction module 43 is configured to obtain a three-dimensional position data sequence of key joints of a human body for prediction, construct graph data, and input a trained neural network model based on a concentration mechanism fusion multi-flow graph to obtain a predicted value of the human body posture.
The above examples are provided for the purpose of describing the present invention only and are not intended to limit the scope of the present invention. The scope of the invention is defined by the appended claims. Various equivalents and modifications that do not depart from the spirit and principles of the invention are intended to be included within the scope of the invention.

Claims (3)

1. A human body posture prediction method based on a multi-flow graph neural network fused by an attention mechanism is characterized by comprising the following steps:
step S1: acquiring a three-dimensional position data sequence of a key joint of a human body for training, and dividing the three-dimensional position data sequence into an input sequence and an output sequence according to the lengths of the preset input sequence and the preset output sequence; constructing graph data according to the input sequence, wherein the graph data specifically comprises:
step S11: acquiring a data sequence f multiplied by n multiplied by 3 of the three-dimensional position of a key joint of a human body for training, wherein f represents the number of frames of the data sequence, and n represents the number of joints;
step S12: taking the first t frame data of the data sequence as the input sequence and the (f-t) frame data as the output sequence; wherein the input sequence is expressed as t×n×3; the output sequence is expressed as (f-t) ×n×3;
step S13: constructing a node matrix V and an adjacent matrix A according to the input sequence t multiplied by n multiplied by 3, and constructing a full connection graph G according to the node matrix V and the adjacent matrix A all =(V,A all ) High degree of freedom articulation graph G mobile =(V,A mobile ) And a low degree of freedom articulation graph G stable =(V,A stable ) The method comprises the steps of carrying out a first treatment on the surface of the Wherein A is all Full-connection adjacency matrix representing all inter-joint relations, A mobile High-degree-of-freedom joint adjacency matrix representing relationship between high-degree-of-freedom joints, A stable A low degree of freedom joint adjacency matrix representing low degree of freedom inter-joint links;
step S2: constructing a multi-flow graph neural network model based on attention mechanism fusion; inputting the graph data into the attention-mechanism-based fusion multi-flow graph neural network model for training, and obtaining the trained attention-mechanism-based fusion multi-flow graph neural network model, which specifically comprises the following steps:
step S21: the attention mechanism based fusion multi-flow graph neural network adopts an encoder and decoder structure, wherein the encoder comprises a plurality of encoding modules; inputting the graph data into the encoder for feature extraction, wherein the output of the encoding module is shown in the following formula (4):
Out i =Att i (GCN all1 (G all ),GCN stable1 (G stable ),GCN mobile1 (G mobile )) (4)
wherein the output Out of the ith coding module i =(G all1 ,G stable1 ,G mobile1 ) Comprising three graph data; GCN (·) represents a single graph neural network layer; att i (·) represents the ith attention module;
step S22: the input of the single graph neural network layer is g= (V, a), the output is X, and the input is expressed as the following formula (5):
X=ReLU(AVW+VU) (5)
wherein W and U are trainable weight matrixes, the size of the weight matrixes is (t multiplied by 3) multiplied by D, and D is the characteristic dimension of the output of the neural network layer of the expected graph;
step S23: the input of the attention module is G 1 =(V 1 ,A all ),G 2 =(V 2 ,A stable ),G 3 =(V 3 ,A mobile ) The output is X' = (G out1 ,G out2 ,G out3 ) The relationship between the two is represented by the following formulas (6) to (8):
V mid4 =GCN 2 (CONCAT(V mid1 ,V mid2 ,V mid3 )) (7)
wherein CONCAT (·) represents a matrix stitching operation in a feature dimension; softMax (·) represents the normalized exponential function; w (W) 1,1 W 1,2 ……W 3,2 Trainable parameters representing different linear layers; v (V) mid1 、V mid2 、V mid3 、V mid4 Is an intermediate variable;
step S24: the output hidden variable H of the encoder is shown in the following formula (9):
H=λ 1 V allf2 V stablef3 V mobilef (9)
wherein the output of the last coding module is Out f =(G allf ,G stablef ,G mobilef ),G allf =(V allf ,A all ),G stablef =(V stablef ,A stable ),G m o bilef =(V mobilef ,A mobile );λ 1 、λ 2 、λ 3 Respectively configurable parameters;
step S25: inputting the hidden variable H to the decoder, and predicting the human body posture using the decoder, which is expressed as the following formula (10):
Out T+1 =Out T +f pred (GRU(Out T ,H T )) (10)
wherein, at the moment T, the posture of the human body is Out T ;f pred (. Cndot.) is a graph representing a multi-layer perceptron, GRU (-) represents a graph gating loop network, H T Is the hidden variable at this time;
step S26: the parameters were trained using the gradient descent method, and the loss function was set as follows:
wherein the (f-t) frame Output from the decoder outputs the sequence of data with Output size of (f-t) times n×3, output gt Is the desired output;
step S3: and acquiring a three-dimensional position data sequence of a human body key joint for prediction, constructing the graph data, inputting the trained multi-flow graph neural network model based on the attention mechanism, and obtaining a predicted value of the human body posture.
2. The human body posture prediction method based on the attention mechanism fusion multi-flow graph neural network according to claim 1, wherein the step S3: acquiring a three-dimensional position data sequence of a human body key joint for prediction, constructing the graph data, inputting the trained multi-flow graph neural network model based on the attention mechanism to obtain a predicted value of the human body posture, and comprising the following steps:
step S31: according to the step S1, a three-dimensional position data sequence of a key joint of a human body for prediction is obtained, and the graph data is constructed;
step S32: inputting the multi-flow graph neural network model based on the attention mechanism fusion trained in the step S2, and extracting features by the encoder to obtain a result which is an invisible variable H; and inputting the H into the decoder, and outputting a predicted value of the human body posture.
3. A human body posture prediction system based on a multi-flow graph neural network fused by an attention mechanism is characterized by comprising the following modules:
the training data acquisition module is used for acquiring a three-dimensional position data sequence of a key joint of a human body for training, and dividing the three-dimensional position data sequence into an input sequence and an output sequence according to the lengths of the preset input sequence and the preset output sequence; constructing graph data according to the input sequence, wherein the graph data specifically comprises:
step S11: acquiring a data sequence f multiplied by n multiplied by 3 of the three-dimensional position of a key joint of a human body for training, wherein f represents the number of frames of the data sequence, and n represents the number of joints;
step S12: taking the first t frame data of the data sequence as the input sequence and the (f-t) frame data as the output sequence; wherein the input sequence is expressed as t×n×3; the output sequence is expressed as (f-t) ×n×3;
step S13: constructing a node matrix V and an adjacent matrix A according to the input sequence t multiplied by n multiplied by 3, and constructing a full connection graph G according to the node matrix V and the adjacent matrix A all =(V,A all ) High degree of freedom articulation graph G mobile =(V,A mobile ) And a low degree of freedom articulation graph G stable =(V,A stable ) The method comprises the steps of carrying out a first treatment on the surface of the Wherein A is all Full-connection adjacency matrix representing all inter-joint relations, A mobile High-degree-of-freedom joint adjacency matrix representing relationship between high-degree-of-freedom joints, A stable A low degree of freedom joint adjacency matrix representing low degree of freedom inter-joint links;
the model training module is used for constructing a multi-flow graph neural network model based on attention mechanism fusion; inputting the graph data into the attention-mechanism-based fusion multi-flow graph neural network model for training, and obtaining the trained attention-mechanism-based fusion multi-flow graph neural network model, which specifically comprises the following steps:
step S21: the attention mechanism based fusion multi-flow graph neural network adopts an encoder and decoder structure, wherein the encoder comprises a plurality of encoding modules; inputting the graph data into the encoder for feature extraction, wherein the output of the encoding module is shown in the following formula (4):
Out i =Att i (GCN all1 (G all ),GCN stable1 (G stable ),GCN mobile1 (G mobile )) (4)
wherein the output Out of the ith coding module i =(G all1 ,G stable1 ,G mobile1 ) Comprising three graph data; GCN (·) represents a single graph neural network layer; att i (·) represents the ith attention module;
step S22: the input of the single graph neural network layer is g= (V, a), the output is X, and the input is expressed as the following formula (5):
X=ReLU(AVW+VU) (5)
wherein W and U are trainable weight matrixes, the size of the weight matrixes is (t multiplied by 3) multiplied by D, and D is the characteristic dimension of the output of the neural network layer of the expected graph;
step S23: the input of the attention module is G 1 =(V 1 ,A all ),G 2 =(V 2 ,A stable ),G 3 =(V 3 ,A mobile ) The output is X' = (G out1 ,G out2 ,G out3 ) The relationship between the two is represented by the following formulas (6) to (8):
V mid4 =GCN 2 (CONCAT(V mid1 ,V mid2 ,V mid3 )) (7)
wherein CONCAT (& gt) represents matrix splicing operation in characteristic dimensionPerforming; softMax (·) represents the normalized exponential function; w (W) 1,1 W 1,2 ……W 3,2 Etc. represent trainable parameters of different linear layers; v (V) mid1 、V mid2 、V mid3 、V mid4 Is an intermediate variable;
step S24: the output hidden variable H of the encoder is shown in the following formula (9):
H=λ 1 V allf2 V stablef3 V mobilef (9)
wherein the output of the last coding module is Out f =(G allf ,G stablef ,G mobilef ),G allf =(V allf ,A all ),G stablef =(V stablef ,A stable ),G mobilef =(V mobilef ,A mobile );λ 1 、λ 2 、λ 3 Respectively configurable parameters;
step S25: inputting the hidden variable H to the decoder, and predicting the human body posture using the decoder, which is expressed as the following formula (10):
Out T+1 =Out T +f pred (GRU(Out T ,H T )) (10)
wherein, at the moment T, the posture of the human body is Out T f pred (. Cndot.) is a graph representing a multi-layer perceptron, GRU (-) represents a graph gating loop network, H T Is the hidden variable at this time;
step S26: the parameters were trained using the gradient descent method, and the loss function was set as follows:
wherein the (f-t) frame Output from the decoder outputs the sequence of data with Output size of (f-t) times n×3, output gt Is the desired output;
the human body posture prediction module is used for acquiring a three-dimensional position data sequence of a human body key joint for prediction, constructing the graph data, inputting the trained neural network model based on the attention mechanism fusion multi-flow graph, and obtaining a predicted value of the human body posture.
CN202110539624.0A 2021-05-18 2021-05-18 Human body posture prediction method and system based on attention mechanism fusion multi-flow diagram Active CN113642379B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110539624.0A CN113642379B (en) 2021-05-18 2021-05-18 Human body posture prediction method and system based on attention mechanism fusion multi-flow diagram

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110539624.0A CN113642379B (en) 2021-05-18 2021-05-18 Human body posture prediction method and system based on attention mechanism fusion multi-flow diagram

Publications (2)

Publication Number Publication Date
CN113642379A CN113642379A (en) 2021-11-12
CN113642379B true CN113642379B (en) 2024-03-01

Family

ID=78415817

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110539624.0A Active CN113642379B (en) 2021-05-18 2021-05-18 Human body posture prediction method and system based on attention mechanism fusion multi-flow diagram

Country Status (1)

Country Link
CN (1) CN113642379B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114419105B (en) * 2022-03-14 2022-07-15 深圳市海清视讯科技有限公司 Multi-target pedestrian trajectory prediction model training method, prediction method and device
CN114943324B (en) * 2022-05-26 2023-10-13 中国科学院深圳先进技术研究院 Neural network training method, human motion recognition method and device, and storage medium
CN115407874B (en) * 2022-08-18 2023-07-28 中国兵器工业标准化研究所 VR maintenance training operation proficiency prediction method based on neural network
CN115862338B (en) * 2023-03-01 2023-05-16 天津大学 Airport traffic flow prediction method, airport traffic flow prediction system, electronic equipment and medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020258611A1 (en) * 2019-06-28 2020-12-30 山东科技大学 Lymph node ct detection system employing recurrent spatio-temporal attention mechanism
CN112183862A (en) * 2020-09-29 2021-01-05 长春理工大学 Traffic flow prediction method and system for urban road network

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020258611A1 (en) * 2019-06-28 2020-12-30 山东科技大学 Lymph node ct detection system employing recurrent spatio-temporal attention mechanism
CN112183862A (en) * 2020-09-29 2021-01-05 长春理工大学 Traffic flow prediction method and system for urban road network

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于分层注意力的信息级联预测模型;张志扬;张凤荔;陈学勤;王瑞锦;;计算机科学(第06期);全文 *

Also Published As

Publication number Publication date
CN113642379A (en) 2021-11-12

Similar Documents

Publication Publication Date Title
CN113642379B (en) Human body posture prediction method and system based on attention mechanism fusion multi-flow diagram
US10445930B1 (en) Markerless motion capture using machine learning and training with biomechanical data
Xu et al. Monocular 3d pose estimation via pose grammar and data augmentation
CN109635820A (en) The construction method of Parkinson's disease bradykinesia video detection model based on deep neural network
CN110599573A (en) Method for realizing real-time human face interactive animation based on monocular camera
CN111223168B (en) Target object control method, device, storage medium and computer equipment
CN109934881A (en) Image encoding method, the method for action recognition and computer equipment
CN113239892A (en) Monocular human body three-dimensional attitude estimation method based on data enhancement architecture
CN111191630A (en) Performance action identification method suitable for intelligent interactive viewing scene
Hossain et al. Estimation of lower extremity joint moments and 3d ground reaction forces using imu sensors in multiple walking conditions: A deep learning approach
Agarwal et al. FitMe: a fitness application for accurate pose estimation using deep learning
CN113240714B (en) Human motion intention prediction method based on context awareness network
Chowdhury et al. Assessment of rehabilitation exercises from depth sensor data
Réby et al. Graph transformer for physical rehabilitation evaluation
CN113283373A (en) Method for enhancing detection of limb motion parameters by depth camera
Hwang et al. A deep learning approach for seamless integration of cognitive skills for humanoid robots
CN115294228B (en) Multi-figure human body posture generation method and device based on modal guidance
CN116030533A (en) High-speed motion capturing and identifying method and system for motion scene
CN114494341A (en) Real-time completion method for optical motion capture mark points by fusing time-space constraints
Chen et al. Spatiotemporal consistency learning from momentum cues for human motion prediction
CN114638064A (en) Method for simulating animal gait by four-footed bionic robot based on vision
Yuan et al. Spatial transformer network with transfer learning for small-scale fine-grained skeleton-based tai chi action recognition
Joo Sensing, Measuring, and Modeling Social Signals in Nonverbal Communication
Lopes Avatar Motion Reconstruction for Virtual Reality using a Machine Learning Approach
Priyadarshani et al. Attention-enhanced graph convolutional network for assessing rehabilitation exercises

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant