CN113642379A

CN113642379A - Human body posture prediction method and system based on attention mechanism fusion multi-flow graph

Info

Publication number: CN113642379A
Application number: CN202110539624.0A
Authority: CN
Inventors: 袁丁; 曹哲; 魏晓东; 尹继豪; 张雪怡
Original assignee: Beihang University
Current assignee: Beihang University
Priority date: 2021-05-18
Filing date: 2021-05-18
Publication date: 2021-11-12
Anticipated expiration: 2041-05-18
Also published as: CN113642379B

Abstract

The invention relates to a human body posture prediction method and a human body posture prediction system based on attention mechanism fusion multi-flow graph neural network, wherein the method comprises the following steps: s1: acquiring a three-dimensional position data sequence of a human body key joint for training, and dividing the three-dimensional position data sequence into an input sequence and an output sequence according to the lengths of a preset input sequence and a preset output sequence; constructing graph data according to the input sequence; s2: constructing a neural network model fusing multi-flow graphs based on an attention mechanism; inputting graph data into the model for training to obtain a trained model; s3: the method comprises the steps of obtaining a three-dimensional position data sequence of a human body key joint for prediction, constructing graph data, inputting a trained attention-based system fusion multi-flow graph neural network model, and obtaining a predicted value of a human body posture. The method provided by the invention constructs a plurality of graph models based on the position data and the structural characteristics of the human joints, realizes the modeling of a human motion system, predicts the human posture and achieves higher accuracy.

Description

Human body posture prediction method and system based on attention mechanism fusion multi-flow graph

Technical Field

The invention relates to the field of computer vision and image processing, in particular to a human body posture prediction method and system based on attention mechanism fusion multi-flowgram neural network.

Background

In recent years, with the wide use of consumer-grade depth cameras with low cost, the low-cost and real-time acquisition and integration of human body three-dimensional motion gestures are possible, so that the human body gesture is predicted to be a hot spot problem of intersection of graphics and computer vision, and the method has wide application prospects and rich application scenes in the fields of robots, medical treatment and automatic driving technologies.

In the field of robots, the related technology of exoskeleton robots is a hot spot of research, has very important application in aerospace, is considered to be key support equipment for developing on-orbit construction, maintenance and lunar and mars surface activities, and plays an extremely important role in completing an aerospace outbound task and handling emergency faults. The human body posture prediction algorithm judges and predicts the movement intention of the astronaut by identifying and analyzing the human body movement rule, assists the exoskeleton robot to complete force compliance control, reduces force mutation in the exoskeleton robot control, and therefore effectively improves the adaptability of the space suit to the movement of the astronaut.

Motion capture technology has been known for fifty years ago, is widely applied to entertainment industries such as film and television, games and the like, and helps patients with dyskinesia in medical and health industries. However, at present, people still cannot generate vivid cloud data which accords with the structural characteristics and the motion balance state of the human body by separating from real motion capture data, but the current mainstream motion capture technology is still based on an inertial sensor or a complex optical sensing system, has the disadvantages of high cost, need of a professional field and the like, has a long distance from 'flying into the family of common people', benefits patients who need to finish daily motion by means of auxiliary instruments, and has a long distance. The human body posture prediction algorithm is applied to the motion generation technology, the utilization rate of motion capture data can be greatly improved, and the post-processing workload of the motion data is reduced. Through the increasingly popular time-of-flight cameras, the human posture prediction algorithm can be more easily applied to the field of human-computer interaction, for example, the human posture prediction algorithm guides a subject to complete correct actions according to collected actions of the subject, so that rehabilitation of patients suffering from limb cognitive defects caused by neurodegenerative diseases is assisted.

Therefore, human posture prediction has important application value, but because human motion has more than 200 degrees of freedom, and the degrees of freedom of each joint of the human body have great correlation, the traditional methods such as inverse kinematics, dynamics and the like are very difficult to simulate the human motion, an accurate human motion model cannot be established, the complexity of the human motion rule is very high, and particularly, the balance coordination mechanism of the human motion is difficult to simulate. This brings great difficulty to the human body action prediction based on the conventional statistical learning method.

In recent years, with the development of computer hardware and machine learning algorithms, a human body prediction method based on deep learning has been proposed. In the deep learning model-based method, a researcher only needs to construct a deep neural network model without paying attention to the cognitive abilities of different behaviors such as complex kinematic constraints and parameters, and then can learn some potential motion characteristics in human motion data through training of a large amount of motion data, and predict the next motion trajectory of a person according to the potential motion characteristics. Early deep learning models were mainly based on recurrent neural networks and convolutional neural networks, but had certain drawbacks: the recurrent neural network emphasizes the time sequence relation of the sequence and ignores the spatial information; the work of the convolutional neural network makes single frame skeleton data construct a one-dimensional vector, the sequence is regarded as a two-dimensional matrix, the position change of a single joint along with time is focused, the correlation among all joints of a human body is ignored, and the topological structure information of the human body can not be fully utilized.

As a deep learning model which has attracted attention in recent two years, a graph neural network is a neural network which is specially used for processing graph data, and is applied to various research fields such as recommendation systems, reasoning and proving, chemistry, traffic, brain-like intelligence and the like. Recent work proposes a method based on a graph neural network, which graphically represents the three-dimensional skeleton of a human body and can well utilize the prior structure information of the human skeleton. The conventional method based on the graph neural network simply constructs an adjacency matrix according to the spatial adjacency relation of human joints, but does not consider the generation mode of human motion, so that the connection of each joint of the human body in motion cannot be fully expressed. Human motion is accomplished by the cooperation of many joints that are related to each other on the kinematic chain, rather than by only a few adjacent joints. And ergonomics-based studies have shown that the human kinematic chain usually comprises three-degree-of-freedom joints and two-degree-of-freedom joints, and these two joints are often alternately distributed in the kinematic chain. Joints with different degrees of freedom in the kinematic chain have different force characteristics during movement. Therefore, constructing the adjacency matrix only in accordance with the spatial adjacency of the human joints does not sufficiently utilize the prior information on the structure and kinematics of the human joints.

Therefore, how to fully utilize the prior information on the structure and kinematics of the human joint becomes an urgent problem

Disclosure of Invention

In order to solve the technical problems, the invention provides a human body posture prediction method and system based on attention mechanism fusion multi-flowgram neural network.

The technical solution of the invention is as follows: a human body posture prediction method based on attention mechanism fusion multi-flow graph neural network comprises the following steps:

step S1: acquiring a three-dimensional position data sequence of a human body key joint for training, and dividing the three-dimensional position data sequence into an input sequence and an output sequence according to the lengths of a preset input sequence and a preset output sequence; constructing graph data according to the input sequence;

step S2: constructing a neural network model fusing multi-flow graphs based on an attention mechanism; inputting graph data into an attention-based system fusion multi-flowgram neural network model for training to obtain a trained attention-based system fusion multi-flowgram neural network model;

step S3: the method comprises the steps of obtaining a three-dimensional position data sequence of a human body key joint for prediction, constructing graph data, inputting a trained attention-based system fusion multi-flow graph neural network model, and obtaining a predicted value of a human body posture.

Compared with the prior art, the invention has the following advantages:

1. the human posture prediction method based on the multi-flow graph neural network provided by the invention constructs a plurality of graph models based on human joint position data and structural characteristics, realizes modeling on a human motion system, predicts the human posture and achieves higher accuracy. Meanwhile, joint movement characteristics are extracted by using a multi-flow graph neural network, and global information is fused by using an attention model.

2. The invention overcomes the defect that the existing human posture prediction method has larger errors in violent movement and long-term movement, and obtains better experimental results; the method has simple network structure and real-time operation, can realize the real-time prediction of the human body posture, and is beneficial to practical application. Compared with other existing human body posture prediction methods, the method provided by the invention has better performance when performing 500-1000 millisecond middle and long term motion prediction and violent exercise.

Drawings

FIG. 1 is a flowchart of a human body posture prediction method based on an attention mechanism fused with a multi-flow graph neural network according to an embodiment of the present invention;

fig. 2 is a flowchart of a human body posture prediction method based on an attention mechanism and fusion multi-flow graph neural network in an embodiment of the present invention, in which step S1: acquiring a three-dimensional position data sequence of a human body key joint for training, and dividing the three-dimensional position data sequence into an input sequence and an output sequence according to the lengths of a preset input sequence and the preset output sequence; constructing a flow chart of graph data according to the input sequence;

FIG. 3 is a schematic structural diagram of a multi-flow graph neural network model based on attention mechanism fusion in an embodiment of the present invention;

fig. 4 shows a human body posture prediction method based on an attention mechanism and a multi-flow graph neural network in an embodiment of the present invention, in which step S2: constructing a neural network model fusing multi-flow graphs based on an attention mechanism; inputting graph data into an attention mechanism-based fusion multi-flowsheet neural network model for training to obtain a trained attention mechanism-based fusion multi-flowsheet neural network model-based flow chart;

FIG. 5 is a schematic structural diagram of an attention module in a neural network based on attention mechanism fusion with multiple flowsheets according to an embodiment of the present invention;

fig. 6 shows a human body posture prediction method based on an attention mechanism and a multi-flow graph neural network in an embodiment of the present invention, in which step S3: acquiring a three-dimensional position data sequence of a human body key joint for prediction, constructing graph data, inputting a trained attention-based system fusion multi-flow graph neural network model, and obtaining a flow chart of a predicted value of a human body posture;

fig. 7 is a block diagram of a human body posture prediction system based on an attention mechanism and a multi-flow graph neural network.

Detailed Description

The invention provides a human body posture prediction method based on an attention mechanism fusion multi-flowgram neural network, which aims at the problem that the prior art cannot fully utilize the structure of human body joints and prior information on kinematics, provides a human body posture prediction method based on an attention mechanism fusion multi-flowgram neural network model, realizes the modeling of a human body motion system, predicts the human body posture and achieves higher accuracy.

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings.

Example one

As shown in fig. 1, a human body posture prediction method based on an attention mechanism and fusion multi-flow diagram neural network provided in an embodiment of the present invention includes the following steps:

As shown in fig. 2, the above step S1: acquiring a three-dimensional position data sequence of a human body key joint for training, and dividing the three-dimensional position data sequence into an input sequence and an output sequence according to the lengths of a preset input sequence and the preset output sequence; constructing graph data according to the input sequence, which specifically comprises the following steps:

step S11: acquiring a data sequence f multiplied by n multiplied by 3 of a three-dimensional position of a human key joint for training, wherein f represents the frame number of the data sequence, and n represents the number of joints;

the training data sequence used in the embodiment of the present invention may be a data sequence in a public data set (such as a Human 3.6m data set and a CMU Mocap data set), or a Human motion data sequence acquired by an RGB-D camera or other devices. The obtained key joints or key points of the human body generally comprise not less than 20 positions of the head, the neck, the chest, the waist, the left and right shoulder joints, the left and right elbow joints, the left and right wrist joints, the left and right hands, the left and right hip joints, the left and right knee joints, the left and right ankle joints, the left and right feet and the like. A three-dimensional position data sequence may be represented as an f x n x 3 matrix, where f represents the number of frames in the sequence, n represents the number of joints included in the data, and the three-dimensional coordinates of a joint in a frame in a world coordinate systemIs noted as (X)_W,Y_W,Z_W). For joints which are not captured due to occlusion or other reasons in the acquisition process, in order to ensure that the size of the matrix is fixed and the subsequent processing is convenient, the three-dimensional coordinate of the joint is marked as (0,0, 0).

Step S12: taking the first t frames of the data sequence as an input sequence, and taking (f-t) frames of the data sequence as the output sequence; wherein the input sequence is represented as t × n × 3; the output sequence is expressed as (f-t) x n x 3;

the three-dimensional position data sequence is divided into an input sequence and an output sequence according to the preset lengths of the input sequence and the output sequence. For example, for a three-dimensional position data sequence of a human body key joint for training, the size of which is f × n × 3, the previous t frames of data are set for input, and an input sequence of t × n × 3 is generated; the desired output is (f-t) frames, and the last (f-t) frame data of the three-dimensional position data sequence is taken as the desired output, i.e., an (f-t) × n × 3 output sequence is generated.

Step S13: constructing a node matrix V and an adjacent matrix A according to the input sequence t multiplied by n multiplied by 3, and constructing a full-connection graph G thereby_all＝(V,A_all) High degree of freedom articulation diagram G_mobile＝(V,A_mobile) And low degree of freedom articulation G_stable＝(V,A_stable) (ii) a Wherein A is_allFully connected adjacency matrix representing the relationship of all joints, A_mobileHigh-freedom joint adjacency matrix representing the relation of linkage between high-freedom joints, A_stableA low degree of freedom joint adjacency matrix representing a low degree of freedom inter-joint linkage.

The input sequence of t × n × 3 obtained in step S12 is arranged into n × (t × 3) in accordance with the joint, and a matrix V is constructed. Regarding each joint as each node of the graph data, i.e., the feature of each node in the graph data can be represented by a t × 3 matrix. The degree of freedom of the human joint indicates the number of directions in which the joint can move, and for example, the degree of freedom of a ball joint, a condyloid joint, or the like is 3, and the degree of freedom of a saddle joint, a hinge, or the like is 2. These joints have different kinematic and dynamic properties. According to the self of human body jointsFrom degrees, three different adjacency matrices can be constructed: respectively, a fully connected adjacency matrix A representing all inter-joint relation_allAnd a high-freedom joint adjacency matrix A representing the relation between the high-freedom joints_mobileAnd a low-degree-of-freedom joint adjacency matrix A representing the relationship between the low-degree-of-freedom joints_stable(ii) a The adjacency matrix is a matrix used for representing adjacency relation between nodes in graph data, and the adjacency matrix is constructed according to the following mode in the embodiment of the invention:

for the ith joint of the n joints, if the degree of freedom of the ith joint is 3, namely the joint can perform movements in three-dimensional space such as yaw angle, pitch angle and rotation angle, the ith joint is considered to be a high-degree-of-freedom joint, hof_i1, otherwise hof_i0, wherein hof is high grades of freedom.

Thus, for a fully connected adjacency matrix A of size n × n_allTo A, a_allElement A in (A)_all(i,j)The following formula (1) is defined:

A_all(i,j)＝1 (1)

for high-freedom joint adjacent matrix A with the size of n multiplied by n_mobileTo A, a_mobileElement A in (A)_mobile(i,j)Is defined as the following formula (2):

for a low-freedom joint adjacent matrix A with the size of n multiplied by n_stableTo A, a_stableElement A in (A)_stable(_i,j)Is defined as the following formula (3):

in the three groups of graph data, the number of nodes is consistent with the characteristics corresponding to each node, and the difference is that the adjacent matrixes corresponding to the three groups of graph data are A respectively_all、A_mobileAnd A_stable. Combining n × t × 3 input sequencesRecording as a node matrix V, and according to the three adjacent matrixes, the three constructed graph data are respectively a full-connection graph G_all＝(V,A_all) High degree of freedom articulation diagram G_mobile＝(V,A_mobile) Low degree of freedom articulation diagram G_stable＝(V,A_stable)。

As shown in fig. 3, the attention-based system fused multi-flow graph neural network in the embodiment of the present invention adopts an encoder and a decoder structure, the graph data constructed in step S1 is input into the encoder for feature extraction, the processing result is a hidden variable H, the hidden variable H is input into the decoder, and the output of the decoder is a predicted value of the human posture. And taking the average error of the predicted value and the true value as a loss function, and training the parameters of the model by using a gradient descent method. The specific training steps are as follows:

as shown in fig. 4, in one embodiment, the step S2: constructing a neural network model fusing multi-flow graphs based on an attention mechanism; inputting graph data into the attention-based system fusion multi-flowsheet neural network model for training to obtain the trained attention-based system fusion multi-flowsheet neural network model, which specifically comprises the following steps:

step S21: an encoder and a decoder structure are adopted on the basis of the attention mechanism fusion multi-flow graph neural network, wherein the encoder comprises a plurality of encoding modules; the graph data is input into the encoder for feature extraction, and the output of the encoding module is shown as the following formula (4):

Out_i＝Att_i(GCN_alli(G_all),GCN_stablei(G_stable),GCN_mobilei(G_mobile)) (4)

wherein, a plurality of coding modules are connected in series, and the output Out of the ith coding module_i＝(G_alli,G_stablei,G_mobilei) Contains three graph data; wherein GCN (-) represents a single graph neural network layer; att_i() represents the ith attention module;

step S22: a single graph neural network layer, whose input G ═ V, a, and output X, is represented by the following equation (5):

X＝ReLU(AVW+VU) (5)

wherein W and U are trainable weight matrices with a size of (t × 3) × D, D being a feature dimension of the output of the desired graph neural network layer;

step S23: the input to the attention Module is G₁＝(V₁,A_all),G₂＝(V₂,A_stable),G₃＝(V₃,A_mobile) The output is: x' ═ G (G)_out1,G_out2,G_out3) As shown in fig. 5, the structural diagram of the attention module shows the following formulas (6) to (8):

V_mid4＝GCN₂(CONCAT(V_mid1,V_mid2,V_mid3)) (7)

wherein CONCAT (-) represents a matrix splicing operation in a characteristic dimension; SoftMax (·) denotes a normalized exponential function; w_1,1W_1,2……W_3,2Etc. represent trainable parameters of different linear layers; v_mid1、V_mid2、V_mid3、V_mid4Is an intermediate variable.

The above is one of a plurality of serially connected encoding modules in an encoder, and the modules have the same model structure, but have independent parameters and can be independently trained.

Step S24: the output hidden variable H of the encoder is represented by the following equation (9):

H＝λ₁V_allf+λ₂V_stablef+λ₃V_mobilef (9)

wherein, the output of the last coding module is Out_f＝(G_allf,G_stablef,G_mobilef)，G_allf＝(V_allf,A_all)，G_stablef＝(V_stablef,A_stable)，G_mobilef＝(V_mobilef,A_mobile)；λ₁、λ₂、λ₃Respectively configurable parameters.

A decoder in a multi-flow graph neural network based on attention mechanism fusion uses a graph gated loop network to carry out recursive prediction, the input of the decoder is an output hidden variable H of an encoder, and the output hidden variable H is an n x d matrix, wherein d is a characteristic dimension.

Step S25: the hidden variable H is input into a decoder, and the human body posture is predicted by using the decoder, and the hidden variable H is expressed by the following formula (10):

Out_T+1＝Out_T+f_pred(GRU(Out_T,H_T)) (10)

wherein the posture of the human body at the time T is Out_T；f_pred(. H) is a graph representing a multi-layer perceptron, GRU (. H) represents a graph gated loop network, H_TIs the hidden variable at this time; out_T+1The predicted value of the human body posture at the time of T +1 is obtained.

At an initial time, i.e. at a time when T is 0, Out₀Initialisation is done to the last frame of the input sequence, i.e. the last n x 3 matrix, H, in the input sequence of t x n x 3₀Initialized to a hidden variable H.

The operation of the graph gated loop network can be expressed by the following equations (11) to (14):

r_T＝σ(r_in(Out_T)+r_hid(A_allH_TW_H)) (11)

u_T＝σ(u_in(Out_T)+u_hid(A_allH_TW_H)) (12)

c_T＝tanh(c_in(Out_T)+r_T⊙c_hid(A_allH_TW_H)) (13)

H_T+1＝u_T⊙H_T+(1-u_T)⊙c_T (14)

wherein r is_in,r_hid,u_in,u_hid,c_in,c_hidRepresenting a trainable linear layer; w_HRepresenting a trainable weight matrix; a. the_allAn adjacency matrix that is a fully-connected graph; h_T+1Gating the output of the loop network, i.e. H_T+1＝GRU(Out_T,H_T)。

After the hidden variable H is input, due to the recursive characteristic of the graph-gated loop network, the decoder takes the predicted value at the previous moment as input, can recursively generate the predicted value of the human posture action sequence with any number of frames, and outputs (f-t) frame data which is recorded as Output.

Step S26: the parameters are trained using a gradient descent method, and a loss function is set as the following equation (15):

wherein, the Output sequence of the decoder is data of (f-t) frame, the Output size of the data is (f-t) × n × 3, Output_gtIs the desired output.

As shown in fig. 6, in one embodiment, the step S3: acquiring a three-dimensional position data sequence of a human body key joint for prediction, constructing graph data, inputting a trained attention-based system fusion multi-flow graph neural network model, and obtaining a predicted value of a human body posture, wherein the predicted value comprises the following steps:

step S31: obtaining a three-dimensional position data sequence of human key joints for prediction according to step S1, and constructing the map data;

in the same manner as the three-dimensional position data sequence of the human key joint for training is acquired in step S1, the three-dimensional position data sequence of the human key joint is first acquired, expressed as a t × n × 3 matrix, arranged in n × (t × 3) according to the joint, and expressed as a matrix V. The constructed three graph data are respectively full-connected graphs G_all＝(V,A_all) High degree of freedom articulation diagram G_mobile＝(V,A_mobile) Low degree of freedom articulation diagram G_stable＝(V,A_stable)。

Step S32: inputting the attention-based mechanism fusion multi-flow graph neural network model trained based on the step S2, and performing feature extraction by an encoder to obtain a result as an invisible variable H; and inputting the H into a decoder, and outputting to obtain a predicted value of the human body posture.

As shown in table 1 below, it is verified through experiments that a high accuracy can be obtained by predicting various actions on the disclosed data set CMU Mocap using the embodiment of the present invention.

TABLE 1 mean joint angle error comparison of the method of the present invention to other methods on CMU Mocap dataset

The index represents the prediction accuracy ranking for the different methods, with prediction duration in milliseconds.

As shown in table 2, the average accuracy for all actions over the published data set Human 3.6M was optimized using the present method as verified experimentally.

TABLE 2 mean joint angle error of the method of the invention versus other methods on a Human 3.6M dataset

The human posture prediction method based on the multi-flow graph neural network provided by the invention constructs a plurality of graph models based on human joint position data and structural characteristics, realizes modeling on a human motion system, predicts the human posture and achieves higher accuracy. And extracting joint movement characteristics by using a multi-flow graph neural network, and fusing global information by using an attention model. The invention overcomes the defect that the existing human posture prediction method has larger errors in violent movement and long-term movement, and obtains better experimental results; the method has simple network structure and real-time operation, can realize the real-time prediction of the human body posture, and is beneficial to practical application. Compared with other existing human body posture prediction methods, the method provided by the invention has better performance when performing 500-1000 millisecond middle and long term motion prediction and violent exercise.

Example two

As shown in fig. 7, an embodiment of the present invention provides a human body posture prediction system based on an attention mechanism fused multiflow neural network, including the following modules:

a training data acquisition module 41, configured to acquire a three-dimensional position data sequence of a human body key joint for training, and divide the three-dimensional position data sequence into an input sequence and an output sequence according to lengths of a preset input sequence and a preset output sequence; constructing graph data according to the input sequence;

the model training module 42 is used for constructing a neural network model based on attention mechanism fusion multi-flow graph; inputting graph data into an attention-based system fusion multi-flowgram neural network model for training to obtain a trained attention-based system fusion multi-flowgram neural network model;

and the human body posture prediction module 43 is used for acquiring a three-dimensional position data sequence of the human body key joint for prediction, constructing graph data, and inputting the trained attention-based system fusion multi-flow graph neural network model to obtain a predicted value of the human body posture.

The above examples are provided only for the purpose of describing the present invention, and are not intended to limit the scope of the present invention. The scope of the invention is defined by the appended claims. Various equivalent substitutions and modifications can be made without departing from the spirit and principles of the invention, and are intended to be within the scope of the invention.

Claims

1. A human body posture prediction method based on attention mechanism fusion multi-flow graph neural network is characterized by comprising the following steps:

step S1: acquiring a three-dimensional position data sequence of a human body key joint for training, and dividing the three-dimensional position data sequence into an input sequence and an output sequence according to the lengths of a preset input sequence and the preset output sequence; constructing graph data according to the input sequence;

step S2: constructing a neural network model fusing multi-flow graphs based on an attention mechanism; inputting the graph data into the attention-based system fusion multi-flowsheet neural network model for training to obtain the trained attention-based system fusion multi-flowsheet neural network model;

step S3: and acquiring a three-dimensional position data sequence of the human body key joint for prediction, constructing the graph data, and inputting the trained attention-based mechanism fusion multi-flow graph neural network model to obtain a predicted value of the human body posture.

2. The method for predicting the posture of the human body based on the attention mechanism fusion multi-flow graph neural network according to claim 1, wherein the step S1: acquiring a three-dimensional position data sequence of a human body key joint for training, and dividing the three-dimensional position data sequence into an input sequence and an output sequence according to the lengths of a preset input sequence and the preset output sequence; constructing graph data according to the input sequence, which specifically comprises the following steps:

step S12: taking the first t frames of the data sequence as the input sequence, and (f-t) frames as the output sequence; wherein the input sequence is represented as t × n × 3; the output sequence is expressed as (f-t) x n x 3;

step S13: constructing a node matrix V and an adjacent matrix A according to the input sequence t multiplied by n multiplied by 3, and constructing a full-connection graph G thereby_all＝(V,A_all) High degree of freedom articulation diagram G_mobile＝(V,A_mobile) And low degree of freedom articulation G_stable＝(V,A_stable) (ii) a Wherein A is_allFully connected adjacency matrix representing the relationship of all joints, A_mobileTo representHigh-freedom joint adjacency matrix of high-freedom inter-joint relation, A_stableA low degree of freedom joint adjacency matrix representing a low degree of freedom inter-joint linkage.

3. The method for predicting the posture of the human body based on the attention mechanism fusion multi-flow graph neural network according to claim 1, wherein the step S2: constructing a neural network model fusing multi-flow graphs based on an attention mechanism; inputting the graph data into the attention-based mechanism fusion multi-flowsheet neural network model for training to obtain the trained attention-based mechanism fusion multi-flowsheet neural network model, which specifically comprises the following steps:

step S21: the attention-based mechanism fusion multi-flow graph neural network adopts an encoder and a decoder structure, wherein the encoder comprises a plurality of encoding modules; inputting the graph data into the encoder for feature extraction, wherein the output of the encoding module is shown as the following formula (4):

Out_i＝Att_i(GCN_all1(G_all),GCN_stable1(G_stable),GCN_mobile1(G_mobile)) (4)

wherein, the output Out of the ith coding module_i＝(G_all1,G_stable1,G_mobile1) Contains three graph data; GCN (-) represents a single graph neural network layer; att_i() represents the ith attention module;

step S22: the input of the single graph neural network layer is G ═ (V, a), the output is X, and the following formula (5) is expressed:

X＝ReLU(AVW+VU) (5)

step S23: the input of the attention module is G₁＝(V₁,A_all),G₂＝(V₂,A_stable),G₃＝(V₃,A_mobile) The output is X' ═ G_out1,G_out2,G_out3)，The relationship between them is expressed by the following equations (6) to (8):

V_mid4＝GCN₂(CONCAT(V_mid1,V_mid2,V_mid3)) (7)

wherein CONCAT (-) represents a matrix splicing operation in a characteristic dimension; SoftMax (·) denotes a normalized exponential function; w_1,1W_1,2……W_3,2Etc. represent trainable parameters of different linear layers; v_mid1、V_mid2、V_mid3、V_mid4Is an intermediate variable;

step S24: the output hidden variable H of the encoder is represented by the following formula (9):

H＝λ₁V_allf+λ₂V_stablef+λ₃V_mobilef (9)

wherein the output of the last encoding module is Out_f＝(G_allf,G_stablef,G_mobilef)，G_allf＝(V_allf,A_all)，G_stablef＝(V_stablef,A_stable)，G_mobilef＝(V_mobilef,A_mobile)；λ₁、λ₂、λ₃Respectively configurable parameters;

step S25: inputting the hidden variable H into the decoder, and predicting the human body posture by using the decoder, wherein the formula (10) is represented as follows:

Out_T+1＝Out_T+f_pred(GRU(Out_T,H_T)) (10)

wherein, at the time of T, the posture of the human body is Out_T；f_pred(. cndot.) is a graph gate representing a multi-layer perceptron, GRU (. cndot.) represents a graph gateControl loop network, H_TIs the hidden variable at this time;

step S26: the parameters are trained using a gradient descent method, and a loss function is set as the following formula:

wherein the (f-t) frame Output from the decoder outputs the data of the sequence, and the Output size of the data is (f-t) × n × 3, Output_gtIs the desired output.

4. The method for predicting the posture of the human body based on the attention mechanism fusion multi-flow graph neural network according to claim 1, wherein the step S3: acquiring a three-dimensional position data sequence of a human body key joint for prediction, constructing graph data, inputting the trained attention-based mechanism fusion multi-flow graph neural network model, and obtaining a predicted value of a human body posture, wherein the predicted value comprises the following steps:

step S32: inputting the attention-based mechanism fusion multi-flow graph neural network model trained based on the step S2, and performing feature extraction by the encoder to obtain a result as an invisible variable H; and inputting H into the decoder, and outputting to obtain a predicted value of the human body posture.

5. A human body posture prediction system based on attention mechanism fusion multi-flow graph neural network is characterized by comprising the following modules:

the training data acquisition module is used for acquiring a three-dimensional position data sequence of a human body key joint for training and dividing the three-dimensional position data sequence into an input sequence and an output sequence according to the length of a preset input sequence and the length of the preset output sequence; constructing graph data according to the input sequence;

the model training module is used for constructing a neural network model based on an attention mechanism fusion multi-flow graph; inputting the graph data into the attention-based system fusion multi-flowsheet neural network model for training to obtain the trained attention-based system fusion multi-flowsheet neural network model;

and the human body posture prediction module is used for acquiring a three-dimensional position data sequence of a human body key joint for prediction, constructing the graph data, and inputting the trained attention-based system fusion multi-flow graph neural network model to obtain a predicted value of the human body posture.