CN113642379B

CN113642379B - Human body posture prediction method and system based on attention mechanism fusion multi-flow diagram

Info

Publication number: CN113642379B
Application number: CN202110539624.0A
Authority: CN
Inventors: 袁丁; 曹哲; 魏晓东; 尹继豪; 张雪怡
Original assignee: Beihang University
Current assignee: Beihang University
Priority date: 2021-05-18
Filing date: 2021-05-18
Publication date: 2024-03-01
Anticipated expiration: 2041-05-18
Also published as: CN113642379A

Abstract

The invention relates to a human body posture prediction method and a system based on a multi-flow graph neural network fused by an attention mechanism, wherein the method comprises the following steps: s1: acquiring a three-dimensional position data sequence of a key joint of a human body for training, and dividing the three-dimensional position data sequence into an input sequence and an output sequence according to the lengths of the preset input sequence and the preset output sequence; constructing graph data according to the input sequence; s2: constructing a multi-flow graph neural network model based on attention mechanism fusion; inputting the graph data into the model for training to obtain a trained model; s3: and acquiring a three-dimensional position data sequence of a human body key joint for prediction, constructing graph data, inputting a trained multi-flow graph neural network model based on an attention mechanism, and obtaining a predicted value of the human body posture. The method provided by the invention constructs a plurality of graph models based on the position data and the structural characteristics of the joints of the human body, realizes modeling of a human body movement system, predicts the posture of the human body and achieves higher accuracy.

Description

Human body posture prediction method and system based on attention mechanism fusion multi-flow diagram

Technical Field

The invention relates to the field of computer vision and image processing, in particular to a human body posture prediction method and system based on a multi-flow-graph neural network fused by an attention mechanism.

Background

In recent years, with the wide use of consumer-level depth cameras with lower cost, the low-cost and real-time acquisition of three-dimensional motion gestures of a human body becomes possible, so that the human body gesture prediction becomes a hot spot problem of intersection of graphics and computer vision, and has wide application prospects and rich application scenes in the fields of robots, medical fields and automatic driving technologies.

In the robot field, the related technology of the exoskeleton robot is a great hot spot for research, has very important application in aerospace, is considered to be key supporting equipment for carrying out on-orbit construction, maintenance and lunar surface and Mars surface activities, and plays an extremely important role in completing an aerospace cabin-leaving task and disposing emergency faults. The human body posture prediction algorithm judges and predicts the movement intention of the astronaut through identifying and analyzing the human body movement law, and assists the exoskeleton robot to complete force compliance control, so that force mutation in the exoskeleton robot control is reduced, and the adaptability of the astronaut to the movement of the astronaut is effectively improved.

Motion capture technology has been known as early as fifty years ago, and is widely used in entertainment industries such as video and games, and also in medical and health industries to help patients with movement disorders. However, at present, people still cannot separate from real motion capture data to generate realistic cloud data which accords with the structural characteristics of human bodies and the motion balance state, but the current mainstream motion capture technology is still based on an inertial sensor or a complex optical sensing system, has the disadvantages of high manufacturing cost, professional field and the like, and has a great distance from 'flying into common people' and benefiting patients who need to rely on auxiliary instruments to complete daily exercises. The human body posture prediction algorithm is applied to the motion generation technology, so that the utilization rate of motion capture data can be greatly improved, and the post-processing workload of the motion data can be reduced. With the increasingly popular time-of-flight cameras, the human body posture prediction algorithm can be more easily applied to the field of human-computer interaction, for example, the human body posture prediction algorithm guides the subject to complete correct actions according to the collected actions of the subject, so as to assist the rehabilitation of the patient suffering from the limb cognitive defect caused by the neurodegenerative disease.

Therefore, the human body posture prediction has important application value, but as the motion of the human body has more than 200 degrees of freedom, and the degrees of freedom of all joints of the human body have great relativity, the traditional methods such as inverse kinematics, dynamics and the like are very difficult to simulate the motion of the human body, an accurate human body motion model cannot be built, the complexity of the motion law of the human body is very high, and particularly, the balance coordination mechanism of the motion of the human body is difficult to simulate. This presents great difficulties for human motion prediction based on conventional statistical learning methods.

In recent years, with the development of computer hardware and machine learning algorithms, human body prediction methods based on deep learning have been proposed. In the deep learning model-based method, researchers do not need to pay attention to the cognitive abilities of different behaviors such as complex kinematic constraints and parameters, only need to construct a deep neural network model, then a great amount of exercise data are trained, certain potential exercise characteristics in the exercise data of the human body can be learned, and the next action track of a person can be predicted according to the potential exercise characteristics. Early deep learning models were mainly based on recurrent and convolutional neural networks, but had certain drawbacks: the cyclic neural network emphasizes the time sequence relation of the sequence and ignores the space information; the convolutional neural network works to make single frame skeleton data construct one-dimensional vector, consider the sequence as two-dimensional matrix, focus on the position change of single joint along with time, neglect the correlation among all joints of human body, and can not fully utilize the topological structure information of human body.

As a deep learning model which has been paid attention to in recent two years, a graph neural network is a neural network which is specially used for processing graph data, and is applied to various research fields such as recommendation systems, reasoning proof, chemistry, traffic, brain-like intelligence and the like. The latest work proposes a method based on a graph neural network, and three-dimensional bones of a human body are represented by graphs, so that priori structural information of the human body bones can be well utilized. The conventional method based on the graph neural network simply constructs an adjacency matrix according to the space adjacency relation of human joints, however, the generation mode of human actions is not considered, so that the connection of human joints in motion cannot be fully represented. Human body movement is accomplished by the cooperation of a number of joints in the kinematic chain that are interrelated, rather than by only a few adjacent joints. And ergonomic based studies have shown that human kinematic chains typically contain three degrees of freedom joints and two degrees of freedom joints, and that these two joints tend to alternate in the kinematic chain. Joints with different degrees of freedom in the kinematic chain have different stress characteristics in motion. Therefore, constructing the adjacency matrix according to the spatial adjacency relationship of the human joint alone does not fully utilize the prior information on the structure and kinematics of the human joint.

Therefore, how to fully utilize the prior information on the structure and the kinematics of the human joint becomes a urgent problem

Disclosure of Invention

In order to solve the technical problems, the invention provides a human body posture prediction method and a human body posture prediction system based on a multi-flow-graph neural network fused by an attention mechanism.

The technical scheme of the invention is as follows: a human body posture prediction method based on a multi-flow graph neural network fused by an attention mechanism comprises the following steps:

step S1: acquiring a three-dimensional position data sequence of a key joint of a human body for training, and dividing the three-dimensional position data sequence into an input sequence and an output sequence according to the lengths of the preset input sequence and the preset output sequence; constructing graph data according to the input sequence;

step S2: constructing a multi-flow graph neural network model based on attention mechanism fusion; inputting the graph data into a multi-flow graph neural network model based on an attention mechanism for training, and obtaining a trained multi-flow graph neural network model based on the attention mechanism;

step S3: and acquiring a three-dimensional position data sequence of a human body key joint for prediction, constructing graph data, inputting a trained multi-flow graph neural network model based on an attention mechanism, and obtaining a predicted value of the human body posture.

Compared with the prior art, the invention has the following advantages:

1. according to the human body posture prediction method based on the multi-flow graph neural network, provided by the invention, a plurality of graph models are constructed based on the human body joint position data and the structural characteristics, so that modeling of a human body movement system is realized, the human body posture is predicted, and higher accuracy is achieved. Meanwhile, the multi-flow graph neural network is used for extracting joint movement characteristics, and global information is fused by using the attention model.

2. The invention overcomes the defect of larger error of the existing human body posture prediction method for severe movement and long-term movement, and obtains better experimental results; the method has the advantages of simple network structure and real-time operation, can realize real-time prediction of the human body posture, and is beneficial to practical application. Compared with other existing human body posture prediction methods, the method provided by the invention has better performance when performing medium-long motion prediction of 500-1000 milliseconds and when performing intense motion.

Drawings

FIG. 1 is a flowchart of a human body posture prediction method based on a multi-flow graph neural network fused by an attention mechanism in an embodiment of the invention;

fig. 2 is a block diagram of step S1 in a human body posture prediction method based on a multi-flow graph neural network fused by an attention mechanism in an embodiment of the present invention: acquiring a three-dimensional position data sequence of a key joint of a human body for training, and dividing the three-dimensional position data sequence into an input sequence and an output sequence according to the lengths of the preset input sequence and the preset output sequence; constructing a flow chart of graph data according to the input sequence;

FIG. 3 is a schematic structural diagram of a multi-flow graph neural network model based on attention mechanism fusion in an embodiment of the invention;

fig. 4 shows a step S2 in a human body posture prediction method based on a multi-flow-graph neural network fused by an attention mechanism in the embodiment of the present invention: constructing a multi-flow graph neural network model based on attention mechanism fusion; inputting the graph data into a multi-flow graph neural network model based on an attention mechanism for training, and obtaining a trained flow chart of the multi-flow graph neural network model based on the attention mechanism;

FIG. 5 is a schematic diagram of a structure of an attention module in a multi-flow graph neural network based on attention mechanism fusion in an embodiment of the present invention;

fig. 6 shows a step S3 in a human body posture prediction method based on a multi-flow-graph neural network fused by an attention mechanism in the embodiment of the present invention: acquiring a three-dimensional position data sequence of a human body key joint for prediction, constructing graph data, inputting a trained multi-flow graph neural network model based on an attention mechanism, and obtaining a flow chart of a predicted value of a human body posture;

fig. 7 is a block diagram of a human body posture prediction system based on a multi-flow-graph neural network fused by an attention mechanism in an embodiment of the invention.

Detailed Description

The invention provides a human body posture prediction method based on a multi-flow-graph neural network fused by an attention mechanism, which aims at solving the problem that the prior art cannot fully utilize the prior information on the structure and kinematics of human joints, and provides a human body posture prediction method based on a multi-flow-graph neural network fused by an attention mechanism, so that modeling of a human body movement system is realized, the human body posture is predicted, and higher accuracy is achieved.

The present invention will be further described in detail below with reference to the accompanying drawings by way of specific embodiments in order to make the objects, technical solutions and advantages of the present invention more apparent.

Example 1

As shown in fig. 1, the human body posture prediction method based on the attention mechanism fusion multi-flow graph neural network provided by the embodiment of the invention comprises the following steps:

As shown in fig. 2, step S1 is as follows: acquiring a three-dimensional position data sequence of a key joint of a human body for training, and dividing the three-dimensional position data sequence into an input sequence and an output sequence according to the lengths of the preset input sequence and the preset output sequence; constructing graph data according to the input sequence, wherein the graph data specifically comprises:

step S11: acquiring a data sequence f multiplied by n multiplied by 3 of the three-dimensional position of a key joint of a human body for training, wherein f represents the number of frames of the data sequence, and n represents the number of joints;

the training data sequence used in the embodiment of the invention can be a data sequence in a public data set (such as a Human 3.6m data set and a CMU Mocap data set) or a Human motion data sequence acquired based on an RGB-D camera or other equipment. The obtained key joints or key points of the human body generally comprise at least 20 positions such as head, neck, chest, waist, left and right shoulder joints, left and right elbow joints, left and right wrist joints, left and right hand parts, left and right hip joints, left and right knee joints, left and right ankle joints, left and right foot parts and the like. A three-dimensional position data sequence can be expressed as an fxnx 3 matrix, where f represents the number of frames of the sequence, n represents the number of joints included in the data, and the three-dimensional coordinates of a joint in a world coordinate system in a certain frame are noted as (X _W ,Y _W ,Z _W ). For joints which are not captured due to occlusion or other reasons in the acquisition process, the three-dimensional sitting of the joints is marked as (0, 0) in order to ensure the size of the matrix to be fixed and facilitate subsequent processing.

Step S12: taking the first t frame data of a data sequence as an input sequence and the (f-t) frame data as the output sequence; wherein the input sequence is denoted as t×n×3; the output sequence is expressed as (f-t) ×n×3;

and dividing the three-dimensional position data sequence into an input sequence and an output sequence according to the lengths of the preset input sequence and the preset output sequence. For example, for a three-dimensional position data sequence of critical human joints with the size of f×n×3 for training, setting the previous t frame data for input, and generating a t×n×3 input sequence; the desired output is (f-t) frames, and the following (f-t) frame data of the three-dimensional position data sequence is taken as the desired output, namely, an output sequence of (f-t) multiplied by n multiplied by 3 is generated.

Step S13: constructing a node matrix V and an adjacent matrix A according to an input sequence t multiplied by n multiplied by 3, and constructing a full connection graph G by the node matrix V and the adjacent matrix A _all ＝(V,A _all ) High degree of freedom articulation graph G _mobile ＝(V,A _mobile ) And a low degree of freedom articulation graph G _stable ＝(V,A _stable ) The method comprises the steps of carrying out a first treatment on the surface of the Wherein A is _all Full-connection adjacency matrix representing all inter-joint relations, A _mobile High-degree-of-freedom joint adjacency matrix representing relationship between high-degree-of-freedom joints, A _stable A low degree of freedom joint adjacency matrix representing low degree of freedom inter-joint links.

The input sequence of txn×3 obtained in step S12 is arranged to be nxx (tx3) according to the joint, and a matrix V is constructed. Each joint is taken as each node of the graph data, namely, the characteristic of each node in the graph data can be represented by a matrix of t multiplied by 3. The degree of freedom of the human body joint means the number of directions in which the joint can move, for example, the degree of freedom of the joint such as a ball joint and a condyloid joint is 3, and the degree of freedom of the joint such as a saddle joint and a flexion joint is 2. These joints have different kinematic and dynamic characteristics. According to the degrees of freedom of the human body joints, three different adjacency matrices can be constructed: full-connection adjacency matrix A respectively representing all inter-joint connection relations _all High-degree-of-freedom joint adjacency matrix A representing relationship between high-degree-of-freedom joints _mobile Low degree of freedom joint adjacency matrix A representing low degree of freedom inter-joint connections _stable The method comprises the steps of carrying out a first treatment on the surface of the The adjacency matrix is used to represent nodes in the graph dataThe embodiment of the invention constructs the adjacency matrix according to the following mode:

for the ith joint of n joints, if its degree of freedom is 3, i.e. the joint can perform movements in yaw, pitch, rotation angles in three dimensions, it will be considered a high degree of freedom joint, hof _i =1, otherwise hof _i =0, wherein hof is high degrees of freedom.

Thus, for a full-connection adjacency matrix A of size n _all For A _all Element A of (3) _all(i,j) The following formula (1) is defined:

A _all(i,j) ＝1 (1)

for a high-freedom joint adjacent matrix A with the size of n multiplied by n _mobile For A _mobile Element A of (3) _mobile ( _i,j) The following formula (2) is defined:

for a low-freedom-degree joint adjacent matrix A with the size of n multiplied by n _stable For A _stable Element A of (3) _stable ( _i,j) The following formula (3) is defined:

the number of nodes in the three groups of graph data is consistent with the corresponding characteristic of each node, and the difference is that the adjacent matrixes corresponding to the three groups of graph data are A respectively _all 、A _mobile And A _stable . The input sequence of n×t×3 is recorded as node matrix V, and according to the above-mentioned three adjacent matrices, the three graph data which can be constructed are respectively full-connection graph G _all ＝(V,A _all ) High degree of freedom articulation graph G _mobile ＝(V,A _mobile ) Low degree of freedom articulation graph G _stable ＝(V,A _stable )。

As shown in fig. 3, the attention mechanism based fusion multi-flow graph neural network in the embodiment of the invention adopts the structures of an encoder and a decoder, the graph data constructed in the step S1 is input into the encoder for feature extraction, the processing result is an hidden variable H, the hidden variable H is input into the decoder, and the output of the decoder is the predicted value of the human body posture. And taking the average error of the predicted value and the true value as a loss function, and training the parameters of the model by using a gradient descent method. The specific training steps are as follows:

as shown in fig. 4, in one embodiment, step S2 described above: constructing a multi-flow graph neural network model based on attention mechanism fusion; inputting the graph data into a multi-flow graph neural network model based on an attention mechanism for training, and obtaining a trained multi-flow graph neural network model based on the attention mechanism, wherein the method specifically comprises the following steps of:

step S21: the multi-flow graph neural network based on attention mechanism fusion adopts an encoder and decoder structure, wherein the encoder comprises a plurality of encoding modules; the graph data input encoder performs feature extraction, and the output of the encoding module is shown in the following formula (4):

Out _i ＝Att _i (GCN _alli (G _all ),GCN _stablei (G _stable ),GCN _mobilei (G _mobile )) (4)

wherein, a plurality of coding modules are connected in series, and the output Out of the ith coding module is output _i ＝(G _alli ,G _stablei ,G _mobilei ) Comprising three graph data; wherein GCN (·) represents a single graph neural network layer; att _i (·) represents the ith attention module;

step S22: a single graph neural network layer, whose input g= (V, a), output X, is expressed as the following formula (5):

X＝ReLU(AVW+VU) (5)

wherein W and U are trainable weight matrixes, the size of the weight matrixes is (t multiplied by 3) multiplied by D, and D is the characteristic dimension of the output of the neural network layer of the expected graph;

step S23: the input of the attention module is G ₁ ＝(V ₁ ,A _all ),G ₂ ＝(V ₂ ,A _stable ),G ₃ ＝(V ₃ ,A _mobile ) The output is: x' = (G) _out1 ,G _out2 ,G _out3 ) As shown in fig. 5, the attention module structure is schematically represented by the following formulas (6) to (8):

V _mid4 ＝GCN ₂ (CONCAT(V _mid1 ,V _mid2 ,V _mid3 )) (7)

wherein CONCAT (·) represents a matrix stitching operation in a feature dimension; softMax (·) represents the normalized exponential function; w (W) _1,1 W _1,2 ……W _3,2 Etc. represent trainable parameters of different linear layers; v (V) _mid1 、V _mid2 、V _mid3 、V _mid4 Is an intermediate variable.

The above is one of a plurality of coding modules connected in series in the encoder, and the model structures of the modules are the same, but the parameters are independent from each other, so that the model structures can be independently trained.

Step S24: the output hidden variable H of the encoder is shown in the following formula (9):

H＝λ ₁ V _allf +λ ₂ V _stablef +λ ₃ V _mobilef (9)

wherein the output of the last coding module is Out _f ＝(G _allf ,G _stablef ,G _mobilef )，G _allf ＝(V _allf ,A _all )，G _stablef ＝(V _stablef ,A _stable )，G _mobilef ＝(V _mobilef ,A _mobile )；λ ₁ 、λ ₂ 、λ ₃ Respectively configurable parameters.

Based on the attention mechanism, a decoder in the multi-flow graph neural network is fused, the graph gating loop network is used for recursively predicting, the input of the decoder is an output hidden variable H of the encoder, and the decoder is an n multiplied by d matrix, wherein d is a feature dimension.

Step S25: the hidden variable H is input to a decoder, and the human body posture is predicted by the decoder, and is expressed as the following formula (10):

Out _T+1 ＝Out _T +f _pred (GRU(Out _T ,H _T )) (10)

wherein, at the moment T, the posture of the human body is Out _T ；f _pred (. Cndot.) is a graph representing a multi-layer perceptron, GRU (-) represents a graph gating loop network, H _T Is the hidden variable at this time; out is provided with _T+1 Is a predicted value for the human posture at time t+1.

At the initial moment, i.e. t=0, out ₀ Initialized to the last frame of the input sequence, i.e. the last n 3 matrix in the t x n x 3 input sequence, H ₀ Initialized to the hidden variable H.

The operation of the graph-gated loop network can be expressed by the following formulas (11) to (14):

r _T ＝σ(r _in (Out _T )+r _hid (A _all H _T W _H )) (11)

u _T ＝σ(u _in (Out _T )+u _hid (A _all H _T W _H )) (12)

c _T ＝tanh(c _in (Out _T )+r _T ⊙c _hid (A _all H _T W _H )) (13)

H _T+1 ＝u _T ⊙H _T +(1-u _T )⊙c _T (14)

wherein r is _in ,r _hid ,u _in ,u _hid ,c _in ,c _hid Representing a trainable linear layer; w (W) _H Representing a trainable weight matrix; a is that _all Is an adjacency matrix of the full connection graph; h _T+1 Gating the output of the cyclic network for the graph, i.e. H _T+1 ＝GRU(Out _T ,H _T )。

When the hidden variable H is input, the decoder takes the predicted value of the last moment as input due to the recursion characteristic of the graph gating loop network, can recursively generate the predicted value of the human body gesture action sequence with any frame length, and outputs (f-t) frame data which is recorded as Output.

Step S26: the parameters were trained using the gradient descent method, and the loss function was set as shown in the following equation (15):

wherein the Output sequence of the decoder is data of (f-t) frame, and the Output size of the data is (f-t) multiplied by n multiplied by 3, and the Output is _gt Is the desired output.

As shown in fig. 6, in one embodiment, step S3 described above: acquiring a three-dimensional position data sequence of a human body key joint for prediction, constructing graph data, inputting a trained multi-flow graph neural network model based on an attention mechanism, and obtaining a predicted value of a human body posture, wherein the method comprises the following steps:

step S31: according to the step S1, a three-dimensional position data sequence of a key joint of a human body for prediction is obtained, and the graph data is constructed;

according to the same manner as the three-dimensional position data sequence of the human critical joint for training in step S1, the three-dimensional position data sequence of the human critical joint is first acquired, expressed as a txnx3 matrix, arranged as nxx (tx3) according to the joint, and expressed as a matrix V. The three constructed graph data are respectively full-connection graph G _all ＝(V,A _all ) High degree of freedom articulation graph G _mobile ＝(V,A _mobile ) Low degree of freedom articulation graph G _stable ＝(V,A _stable )。

Step S32: inputting a multi-flow graph neural network model based on the attention mechanism which is trained based on the step S2, and extracting features by an encoder to obtain an invisible variable H as a result; and inputting the H into a decoder, and outputting to obtain the predicted value of the human body posture.

As shown in the following Table 1, through experimental verification, the embodiment of the invention can be used for predicting various actions on the disclosed data set CMU Mocap to obtain higher precision.

Table 1 mean joint angle error comparison of the method of the present invention with other methods on CMU Mocap dataset

* The subscripts represent the prediction accuracy ranking of the different methods, with the prediction duration units being milliseconds.

As shown in table 2, it was experimentally verified that the average accuracy of all actions was optimized over the published dataset Human 3.6M using the present method.

Table 2 mean joint angle error comparison of the method of the present invention with other methods on a Human 3.6M dataset

According to the human body posture prediction method based on the multi-flow graph neural network, provided by the invention, a plurality of graph models are constructed based on the human body joint position data and the structural characteristics, so that modeling of a human body movement system is realized, the human body posture is predicted, and higher accuracy is achieved. And extracting joint motion characteristics by using a multi-flow graph neural network, and fusing global information by using an attention model. The invention overcomes the defect of larger error of the existing human body posture prediction method for severe movement and long-term movement, and obtains better experimental results; the method has the advantages of simple network structure and real-time operation, can realize real-time prediction of the human body posture, and is beneficial to practical application. Compared with other existing human body posture prediction methods, the method provided by the invention has better performance when performing medium-long motion prediction of 500-1000 milliseconds and when performing intense motion.

Example two

As shown in fig. 7, an embodiment of the present invention provides a human body posture prediction system based on a multi-flow graph neural network fused by an attention mechanism, which includes the following modules:

the training data acquisition module 41 is configured to acquire a three-dimensional position data sequence of a key joint of a human body for training, and divide the three-dimensional position data sequence into an input sequence and an output sequence according to lengths of a preset input sequence and an output sequence; constructing graph data according to the input sequence;

model training module 42 for constructing a fused multi-flow graph neural network model based on an attention mechanism; inputting the graph data into a multi-flow graph neural network model based on an attention mechanism for training, and obtaining a trained multi-flow graph neural network model based on the attention mechanism;

the human body posture prediction module 43 is configured to obtain a three-dimensional position data sequence of key joints of a human body for prediction, construct graph data, and input a trained neural network model based on a concentration mechanism fusion multi-flow graph to obtain a predicted value of the human body posture.

The above examples are provided for the purpose of describing the present invention only and are not intended to limit the scope of the present invention. The scope of the invention is defined by the appended claims. Various equivalents and modifications that do not depart from the spirit and principles of the invention are intended to be included within the scope of the invention.

Claims

1. A human body posture prediction method based on a multi-flow graph neural network fused by an attention mechanism is characterized by comprising the following steps:

step S1: acquiring a three-dimensional position data sequence of a key joint of a human body for training, and dividing the three-dimensional position data sequence into an input sequence and an output sequence according to the lengths of the preset input sequence and the preset output sequence; constructing graph data according to the input sequence, wherein the graph data specifically comprises:

step S12: taking the first t frame data of the data sequence as the input sequence and the (f-t) frame data as the output sequence; wherein the input sequence is expressed as t×n×3; the output sequence is expressed as (f-t) ×n×3;

step S13: constructing a node matrix V and an adjacent matrix A according to the input sequence t multiplied by n multiplied by 3, and constructing a full connection graph G according to the node matrix V and the adjacent matrix A _all ＝(V,A _all ) High degree of freedom articulation graph G _mobile ＝(V,A _mobile ) And a low degree of freedom articulation graph G _stable ＝(V,A _stable ) The method comprises the steps of carrying out a first treatment on the surface of the Wherein A is _all Full-connection adjacency matrix representing all inter-joint relations, A _mobile High-degree-of-freedom joint adjacency matrix representing relationship between high-degree-of-freedom joints, A _stable A low degree of freedom joint adjacency matrix representing low degree of freedom inter-joint links;

step S2: constructing a multi-flow graph neural network model based on attention mechanism fusion; inputting the graph data into the attention-mechanism-based fusion multi-flow graph neural network model for training, and obtaining the trained attention-mechanism-based fusion multi-flow graph neural network model, which specifically comprises the following steps:

step S21: the attention mechanism based fusion multi-flow graph neural network adopts an encoder and decoder structure, wherein the encoder comprises a plurality of encoding modules; inputting the graph data into the encoder for feature extraction, wherein the output of the encoding module is shown in the following formula (4):

Out _i ＝Att _i (GCN _all1 (G _all ),GCN _stable1 (G _stable ),GCN _mobile1 (G _mobile )) (4)

wherein the output Out of the ith coding module _i ＝(G _all1 ,G _stable1 ,G _mobile1 ) Comprising three graph data; GCN (·) represents a single graph neural network layer; att _i (·) represents the ith attention module;

step S22: the input of the single graph neural network layer is g= (V, a), the output is X, and the input is expressed as the following formula (5):

X＝ReLU(AVW+VU) (5)

step S23: the input of the attention module is G ₁ ＝(V ₁ ,A _all ),G ₂ ＝(V ₂ ,A _stable ),G ₃ ＝(V ₃ ,A _mobile ) The output is X' = (G _out1 ,G _out2 ,G _out3 ) The relationship between the two is represented by the following formulas (6) to (8):

V _mid4 ＝GCN ₂ (CONCAT(V _mid1 ,V _mid2 ,V _mid3 )) (7)

wherein CONCAT (·) represents a matrix stitching operation in a feature dimension; softMax (·) represents the normalized exponential function; w (W) _1,1 W _1,2 ……W _3,2 Trainable parameters representing different linear layers; v (V) _mid1 、V _mid2 、V _mid3 、V _mid4 Is an intermediate variable;

H＝λ ₁ V _allf +λ ₂ V _stablef +λ ₃ V _mobilef (9)

wherein the output of the last coding module is Out _f ＝(G _allf ,G _stablef ,G _mobilef )，G _allf ＝(V _allf ,A _all )，G _stablef ＝(V _stablef ,A _stable )，G _m o _bilef ＝(V _mobilef ,A _mobile )；λ ₁ 、λ ₂ 、λ ₃ Respectively configurable parameters;

step S25: inputting the hidden variable H to the decoder, and predicting the human body posture using the decoder, which is expressed as the following formula (10):

Out _T+1 ＝Out _T +f _pred (GRU(Out _T ,H _T )) (10)

wherein, at the moment T, the posture of the human body is Out _T ；f _pred (. Cndot.) is a graph representing a multi-layer perceptron, GRU (-) represents a graph gating loop network, H _T Is the hidden variable at this time;

step S26: the parameters were trained using the gradient descent method, and the loss function was set as follows:

wherein the (f-t) frame Output from the decoder outputs the sequence of data with Output size of (f-t) times n×3, output _gt Is the desired output;

step S3: and acquiring a three-dimensional position data sequence of a human body key joint for prediction, constructing the graph data, inputting the trained multi-flow graph neural network model based on the attention mechanism, and obtaining a predicted value of the human body posture.

2. The human body posture prediction method based on the attention mechanism fusion multi-flow graph neural network according to claim 1, wherein the step S3: acquiring a three-dimensional position data sequence of a human body key joint for prediction, constructing the graph data, inputting the trained multi-flow graph neural network model based on the attention mechanism to obtain a predicted value of the human body posture, and comprising the following steps:

step S32: inputting the multi-flow graph neural network model based on the attention mechanism fusion trained in the step S2, and extracting features by the encoder to obtain a result which is an invisible variable H; and inputting the H into the decoder, and outputting a predicted value of the human body posture.

3. A human body posture prediction system based on a multi-flow graph neural network fused by an attention mechanism is characterized by comprising the following modules:

the training data acquisition module is used for acquiring a three-dimensional position data sequence of a key joint of a human body for training, and dividing the three-dimensional position data sequence into an input sequence and an output sequence according to the lengths of the preset input sequence and the preset output sequence; constructing graph data according to the input sequence, wherein the graph data specifically comprises:

the model training module is used for constructing a multi-flow graph neural network model based on attention mechanism fusion; inputting the graph data into the attention-mechanism-based fusion multi-flow graph neural network model for training, and obtaining the trained attention-mechanism-based fusion multi-flow graph neural network model, which specifically comprises the following steps:

X＝ReLU(AVW+VU) (5)

V _mid4 ＝GCN ₂ (CONCAT(V _mid1 ,V _mid2 ,V _mid3 )) (7)

wherein CONCAT (& gt) represents matrix splicing operation in characteristic dimensionPerforming; softMax (·) represents the normalized exponential function; w (W) _1,1 W _1,2 ……W _3,2 Etc. represent trainable parameters of different linear layers; v (V) _mid1 、V _mid2 、V _mid3 、V _mid4 Is an intermediate variable;

H＝λ ₁ V _allf +λ ₂ V _stablef +λ ₃ V _mobilef (9)

wherein the output of the last coding module is Out _f ＝(G _allf ,G _stablef ,G _mobilef )，G _allf ＝(V _allf ,A _all )，G _stablef ＝(V _stablef ,A _stable )，G _mobilef ＝(V _mobilef ,A _mobile )；λ ₁ 、λ ₂ 、λ ₃ Respectively configurable parameters;

Out _T+1 ＝Out _T +f _pred (GRU(Out _T ,H _T )) (10)

wherein, at the moment T, the posture of the human body is Out _T f _pred (. Cndot.) is a graph representing a multi-layer perceptron, GRU (-) represents a graph gating loop network, H _T Is the hidden variable at this time;

the human body posture prediction module is used for acquiring a three-dimensional position data sequence of a human body key joint for prediction, constructing the graph data, inputting the trained neural network model based on the attention mechanism fusion multi-flow graph, and obtaining a predicted value of the human body posture.