CN113642379A - Human body posture prediction method and system based on attention mechanism fusion multi-flow graph - Google Patents

Human body posture prediction method and system based on attention mechanism fusion multi-flow graph Download PDF

Info

Publication number
CN113642379A
CN113642379A CN202110539624.0A CN202110539624A CN113642379A CN 113642379 A CN113642379 A CN 113642379A CN 202110539624 A CN202110539624 A CN 202110539624A CN 113642379 A CN113642379 A CN 113642379A
Authority
CN
China
Prior art keywords
sequence
neural network
human body
attention
graph
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110539624.0A
Other languages
Chinese (zh)
Other versions
CN113642379B (en
Inventor
袁丁
曹哲
魏晓东
尹继豪
张雪怡
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beihang University
Original Assignee
Beihang University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beihang University filed Critical Beihang University
Priority to CN202110539624.0A priority Critical patent/CN113642379B/en
Publication of CN113642379A publication Critical patent/CN113642379A/en
Application granted granted Critical
Publication of CN113642379B publication Critical patent/CN113642379B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to a human body posture prediction method and a human body posture prediction system based on attention mechanism fusion multi-flow graph neural network, wherein the method comprises the following steps: s1: acquiring a three-dimensional position data sequence of a human body key joint for training, and dividing the three-dimensional position data sequence into an input sequence and an output sequence according to the lengths of a preset input sequence and a preset output sequence; constructing graph data according to the input sequence; s2: constructing a neural network model fusing multi-flow graphs based on an attention mechanism; inputting graph data into the model for training to obtain a trained model; s3: the method comprises the steps of obtaining a three-dimensional position data sequence of a human body key joint for prediction, constructing graph data, inputting a trained attention-based system fusion multi-flow graph neural network model, and obtaining a predicted value of a human body posture. The method provided by the invention constructs a plurality of graph models based on the position data and the structural characteristics of the human joints, realizes the modeling of a human motion system, predicts the human posture and achieves higher accuracy.

Description

Human body posture prediction method and system based on attention mechanism fusion multi-flow graph
Technical Field
The invention relates to the field of computer vision and image processing, in particular to a human body posture prediction method and system based on attention mechanism fusion multi-flowgram neural network.
Background
In recent years, with the wide use of consumer-grade depth cameras with low cost, the low-cost and real-time acquisition and integration of human body three-dimensional motion gestures are possible, so that the human body gesture is predicted to be a hot spot problem of intersection of graphics and computer vision, and the method has wide application prospects and rich application scenes in the fields of robots, medical treatment and automatic driving technologies.
In the field of robots, the related technology of exoskeleton robots is a hot spot of research, has very important application in aerospace, is considered to be key support equipment for developing on-orbit construction, maintenance and lunar and mars surface activities, and plays an extremely important role in completing an aerospace outbound task and handling emergency faults. The human body posture prediction algorithm judges and predicts the movement intention of the astronaut by identifying and analyzing the human body movement rule, assists the exoskeleton robot to complete force compliance control, reduces force mutation in the exoskeleton robot control, and therefore effectively improves the adaptability of the space suit to the movement of the astronaut.
Motion capture technology has been known for fifty years ago, is widely applied to entertainment industries such as film and television, games and the like, and helps patients with dyskinesia in medical and health industries. However, at present, people still cannot generate vivid cloud data which accords with the structural characteristics and the motion balance state of the human body by separating from real motion capture data, but the current mainstream motion capture technology is still based on an inertial sensor or a complex optical sensing system, has the disadvantages of high cost, need of a professional field and the like, has a long distance from 'flying into the family of common people', benefits patients who need to finish daily motion by means of auxiliary instruments, and has a long distance. The human body posture prediction algorithm is applied to the motion generation technology, the utilization rate of motion capture data can be greatly improved, and the post-processing workload of the motion data is reduced. Through the increasingly popular time-of-flight cameras, the human posture prediction algorithm can be more easily applied to the field of human-computer interaction, for example, the human posture prediction algorithm guides a subject to complete correct actions according to collected actions of the subject, so that rehabilitation of patients suffering from limb cognitive defects caused by neurodegenerative diseases is assisted.
Therefore, human posture prediction has important application value, but because human motion has more than 200 degrees of freedom, and the degrees of freedom of each joint of the human body have great correlation, the traditional methods such as inverse kinematics, dynamics and the like are very difficult to simulate the human motion, an accurate human motion model cannot be established, the complexity of the human motion rule is very high, and particularly, the balance coordination mechanism of the human motion is difficult to simulate. This brings great difficulty to the human body action prediction based on the conventional statistical learning method.
In recent years, with the development of computer hardware and machine learning algorithms, a human body prediction method based on deep learning has been proposed. In the deep learning model-based method, a researcher only needs to construct a deep neural network model without paying attention to the cognitive abilities of different behaviors such as complex kinematic constraints and parameters, and then can learn some potential motion characteristics in human motion data through training of a large amount of motion data, and predict the next motion trajectory of a person according to the potential motion characteristics. Early deep learning models were mainly based on recurrent neural networks and convolutional neural networks, but had certain drawbacks: the recurrent neural network emphasizes the time sequence relation of the sequence and ignores the spatial information; the work of the convolutional neural network makes single frame skeleton data construct a one-dimensional vector, the sequence is regarded as a two-dimensional matrix, the position change of a single joint along with time is focused, the correlation among all joints of a human body is ignored, and the topological structure information of the human body can not be fully utilized.
As a deep learning model which has attracted attention in recent two years, a graph neural network is a neural network which is specially used for processing graph data, and is applied to various research fields such as recommendation systems, reasoning and proving, chemistry, traffic, brain-like intelligence and the like. Recent work proposes a method based on a graph neural network, which graphically represents the three-dimensional skeleton of a human body and can well utilize the prior structure information of the human skeleton. The conventional method based on the graph neural network simply constructs an adjacency matrix according to the spatial adjacency relation of human joints, but does not consider the generation mode of human motion, so that the connection of each joint of the human body in motion cannot be fully expressed. Human motion is accomplished by the cooperation of many joints that are related to each other on the kinematic chain, rather than by only a few adjacent joints. And ergonomics-based studies have shown that the human kinematic chain usually comprises three-degree-of-freedom joints and two-degree-of-freedom joints, and these two joints are often alternately distributed in the kinematic chain. Joints with different degrees of freedom in the kinematic chain have different force characteristics during movement. Therefore, constructing the adjacency matrix only in accordance with the spatial adjacency of the human joints does not sufficiently utilize the prior information on the structure and kinematics of the human joints.
Therefore, how to fully utilize the prior information on the structure and kinematics of the human joint becomes an urgent problem
Disclosure of Invention
In order to solve the technical problems, the invention provides a human body posture prediction method and system based on attention mechanism fusion multi-flowgram neural network.
The technical solution of the invention is as follows: a human body posture prediction method based on attention mechanism fusion multi-flow graph neural network comprises the following steps:
step S1: acquiring a three-dimensional position data sequence of a human body key joint for training, and dividing the three-dimensional position data sequence into an input sequence and an output sequence according to the lengths of a preset input sequence and a preset output sequence; constructing graph data according to the input sequence;
step S2: constructing a neural network model fusing multi-flow graphs based on an attention mechanism; inputting graph data into an attention-based system fusion multi-flowgram neural network model for training to obtain a trained attention-based system fusion multi-flowgram neural network model;
step S3: the method comprises the steps of obtaining a three-dimensional position data sequence of a human body key joint for prediction, constructing graph data, inputting a trained attention-based system fusion multi-flow graph neural network model, and obtaining a predicted value of a human body posture.
Compared with the prior art, the invention has the following advantages:
1. the human posture prediction method based on the multi-flow graph neural network provided by the invention constructs a plurality of graph models based on human joint position data and structural characteristics, realizes modeling on a human motion system, predicts the human posture and achieves higher accuracy. Meanwhile, joint movement characteristics are extracted by using a multi-flow graph neural network, and global information is fused by using an attention model.
2. The invention overcomes the defect that the existing human posture prediction method has larger errors in violent movement and long-term movement, and obtains better experimental results; the method has simple network structure and real-time operation, can realize the real-time prediction of the human body posture, and is beneficial to practical application. Compared with other existing human body posture prediction methods, the method provided by the invention has better performance when performing 500-1000 millisecond middle and long term motion prediction and violent exercise.
Drawings
FIG. 1 is a flowchart of a human body posture prediction method based on an attention mechanism fused with a multi-flow graph neural network according to an embodiment of the present invention;
fig. 2 is a flowchart of a human body posture prediction method based on an attention mechanism and fusion multi-flow graph neural network in an embodiment of the present invention, in which step S1: acquiring a three-dimensional position data sequence of a human body key joint for training, and dividing the three-dimensional position data sequence into an input sequence and an output sequence according to the lengths of a preset input sequence and the preset output sequence; constructing a flow chart of graph data according to the input sequence;
FIG. 3 is a schematic structural diagram of a multi-flow graph neural network model based on attention mechanism fusion in an embodiment of the present invention;
fig. 4 shows a human body posture prediction method based on an attention mechanism and a multi-flow graph neural network in an embodiment of the present invention, in which step S2: constructing a neural network model fusing multi-flow graphs based on an attention mechanism; inputting graph data into an attention mechanism-based fusion multi-flowsheet neural network model for training to obtain a trained attention mechanism-based fusion multi-flowsheet neural network model-based flow chart;
FIG. 5 is a schematic structural diagram of an attention module in a neural network based on attention mechanism fusion with multiple flowsheets according to an embodiment of the present invention;
fig. 6 shows a human body posture prediction method based on an attention mechanism and a multi-flow graph neural network in an embodiment of the present invention, in which step S3: acquiring a three-dimensional position data sequence of a human body key joint for prediction, constructing graph data, inputting a trained attention-based system fusion multi-flow graph neural network model, and obtaining a flow chart of a predicted value of a human body posture;
fig. 7 is a block diagram of a human body posture prediction system based on an attention mechanism and a multi-flow graph neural network.
Detailed Description
The invention provides a human body posture prediction method based on an attention mechanism fusion multi-flowgram neural network, which aims at the problem that the prior art cannot fully utilize the structure of human body joints and prior information on kinematics, provides a human body posture prediction method based on an attention mechanism fusion multi-flowgram neural network model, realizes the modeling of a human body motion system, predicts the human body posture and achieves higher accuracy.
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings.
Example one
As shown in fig. 1, a human body posture prediction method based on an attention mechanism and fusion multi-flow diagram neural network provided in an embodiment of the present invention includes the following steps:
step S1: acquiring a three-dimensional position data sequence of a human body key joint for training, and dividing the three-dimensional position data sequence into an input sequence and an output sequence according to the lengths of a preset input sequence and a preset output sequence; constructing graph data according to the input sequence;
step S2: constructing a neural network model fusing multi-flow graphs based on an attention mechanism; inputting graph data into an attention-based system fusion multi-flowgram neural network model for training to obtain a trained attention-based system fusion multi-flowgram neural network model;
step S3: the method comprises the steps of obtaining a three-dimensional position data sequence of a human body key joint for prediction, constructing graph data, inputting a trained attention-based system fusion multi-flow graph neural network model, and obtaining a predicted value of a human body posture.
As shown in fig. 2, the above step S1: acquiring a three-dimensional position data sequence of a human body key joint for training, and dividing the three-dimensional position data sequence into an input sequence and an output sequence according to the lengths of a preset input sequence and the preset output sequence; constructing graph data according to the input sequence, which specifically comprises the following steps:
step S11: acquiring a data sequence f multiplied by n multiplied by 3 of a three-dimensional position of a human key joint for training, wherein f represents the frame number of the data sequence, and n represents the number of joints;
the training data sequence used in the embodiment of the present invention may be a data sequence in a public data set (such as a Human 3.6m data set and a CMU Mocap data set), or a Human motion data sequence acquired by an RGB-D camera or other devices. The obtained key joints or key points of the human body generally comprise not less than 20 positions of the head, the neck, the chest, the waist, the left and right shoulder joints, the left and right elbow joints, the left and right wrist joints, the left and right hands, the left and right hip joints, the left and right knee joints, the left and right ankle joints, the left and right feet and the like. A three-dimensional position data sequence may be represented as an f x n x 3 matrix, where f represents the number of frames in the sequence, n represents the number of joints included in the data, and the three-dimensional coordinates of a joint in a frame in a world coordinate systemIs noted as (X)W,YW,ZW). For joints which are not captured due to occlusion or other reasons in the acquisition process, in order to ensure that the size of the matrix is fixed and the subsequent processing is convenient, the three-dimensional coordinate of the joint is marked as (0,0, 0).
Step S12: taking the first t frames of the data sequence as an input sequence, and taking (f-t) frames of the data sequence as the output sequence; wherein the input sequence is represented as t × n × 3; the output sequence is expressed as (f-t) x n x 3;
the three-dimensional position data sequence is divided into an input sequence and an output sequence according to the preset lengths of the input sequence and the output sequence. For example, for a three-dimensional position data sequence of a human body key joint for training, the size of which is f × n × 3, the previous t frames of data are set for input, and an input sequence of t × n × 3 is generated; the desired output is (f-t) frames, and the last (f-t) frame data of the three-dimensional position data sequence is taken as the desired output, i.e., an (f-t) × n × 3 output sequence is generated.
Step S13: constructing a node matrix V and an adjacent matrix A according to the input sequence t multiplied by n multiplied by 3, and constructing a full-connection graph G therebyall=(V,Aall) High degree of freedom articulation diagram Gmobile=(V,Amobile) And low degree of freedom articulation Gstable=(V,Astable) (ii) a Wherein A isallFully connected adjacency matrix representing the relationship of all joints, AmobileHigh-freedom joint adjacency matrix representing the relation of linkage between high-freedom joints, AstableA low degree of freedom joint adjacency matrix representing a low degree of freedom inter-joint linkage.
The input sequence of t × n × 3 obtained in step S12 is arranged into n × (t × 3) in accordance with the joint, and a matrix V is constructed. Regarding each joint as each node of the graph data, i.e., the feature of each node in the graph data can be represented by a t × 3 matrix. The degree of freedom of the human joint indicates the number of directions in which the joint can move, and for example, the degree of freedom of a ball joint, a condyloid joint, or the like is 3, and the degree of freedom of a saddle joint, a hinge, or the like is 2. These joints have different kinematic and dynamic properties. According to the self of human body jointsFrom degrees, three different adjacency matrices can be constructed: respectively, a fully connected adjacency matrix A representing all inter-joint relationallAnd a high-freedom joint adjacency matrix A representing the relation between the high-freedom jointsmobileAnd a low-degree-of-freedom joint adjacency matrix A representing the relationship between the low-degree-of-freedom jointsstable(ii) a The adjacency matrix is a matrix used for representing adjacency relation between nodes in graph data, and the adjacency matrix is constructed according to the following mode in the embodiment of the invention:
for the ith joint of the n joints, if the degree of freedom of the ith joint is 3, namely the joint can perform movements in three-dimensional space such as yaw angle, pitch angle and rotation angle, the ith joint is considered to be a high-degree-of-freedom joint, hofi1, otherwise hofi0, wherein hof is high grades of freedom.
Thus, for a fully connected adjacency matrix A of size n × nallTo A, aallElement A in (A)all(i,j)The following formula (1) is defined:
Aall(i,j)=1 (1)
for high-freedom joint adjacent matrix A with the size of n multiplied by nmobileTo A, amobileElement A in (A)mobile(i,j)Is defined as the following formula (2):
Figure BDA0003071167210000061
for a low-freedom joint adjacent matrix A with the size of n multiplied by nstableTo A, astableElement A in (A)stable(i,j)Is defined as the following formula (3):
Figure BDA0003071167210000062
in the three groups of graph data, the number of nodes is consistent with the characteristics corresponding to each node, and the difference is that the adjacent matrixes corresponding to the three groups of graph data are A respectivelyall、AmobileAnd Astable. Combining n × t × 3 input sequencesRecording as a node matrix V, and according to the three adjacent matrixes, the three constructed graph data are respectively a full-connection graph Gall=(V,Aall) High degree of freedom articulation diagram Gmobile=(V,Amobile) Low degree of freedom articulation diagram Gstable=(V,Astable)。
As shown in fig. 3, the attention-based system fused multi-flow graph neural network in the embodiment of the present invention adopts an encoder and a decoder structure, the graph data constructed in step S1 is input into the encoder for feature extraction, the processing result is a hidden variable H, the hidden variable H is input into the decoder, and the output of the decoder is a predicted value of the human posture. And taking the average error of the predicted value and the true value as a loss function, and training the parameters of the model by using a gradient descent method. The specific training steps are as follows:
as shown in fig. 4, in one embodiment, the step S2: constructing a neural network model fusing multi-flow graphs based on an attention mechanism; inputting graph data into the attention-based system fusion multi-flowsheet neural network model for training to obtain the trained attention-based system fusion multi-flowsheet neural network model, which specifically comprises the following steps:
step S21: an encoder and a decoder structure are adopted on the basis of the attention mechanism fusion multi-flow graph neural network, wherein the encoder comprises a plurality of encoding modules; the graph data is input into the encoder for feature extraction, and the output of the encoding module is shown as the following formula (4):
Outi=Atti(GCNalli(Gall),GCNstablei(Gstable),GCNmobilei(Gmobile)) (4)
wherein, a plurality of coding modules are connected in series, and the output Out of the ith coding modulei=(Galli,Gstablei,Gmobilei) Contains three graph data; wherein GCN (-) represents a single graph neural network layer; atti() represents the ith attention module;
step S22: a single graph neural network layer, whose input G ═ V, a, and output X, is represented by the following equation (5):
X=ReLU(AVW+VU) (5)
wherein W and U are trainable weight matrices with a size of (t × 3) × D, D being a feature dimension of the output of the desired graph neural network layer;
step S23: the input to the attention Module is G1=(V1,Aall),G2=(V2,Astable),G3=(V3,Amobile) The output is: x' ═ G (G)out1,Gout2,Gout3) As shown in fig. 5, the structural diagram of the attention module shows the following formulas (6) to (8):
Figure BDA0003071167210000071
Vmid4=GCN2(CONCAT(Vmid1,Vmid2,Vmid3)) (7)
Figure BDA0003071167210000072
wherein CONCAT (-) represents a matrix splicing operation in a characteristic dimension; SoftMax (·) denotes a normalized exponential function; w1,1W1,2……W3,2Etc. represent trainable parameters of different linear layers; vmid1、Vmid2、Vmid3、Vmid4Is an intermediate variable.
The above is one of a plurality of serially connected encoding modules in an encoder, and the modules have the same model structure, but have independent parameters and can be independently trained.
Step S24: the output hidden variable H of the encoder is represented by the following equation (9):
H=λ1Vallf2Vstablef3Vmobilef (9)
wherein, the output of the last coding module is Outf=(Gallf,Gstablef,Gmobilef),Gallf=(Vallf,Aall),Gstablef=(Vstablef,Astable),Gmobilef=(Vmobilef,Amobile);λ1、λ2、λ3Respectively configurable parameters.
A decoder in a multi-flow graph neural network based on attention mechanism fusion uses a graph gated loop network to carry out recursive prediction, the input of the decoder is an output hidden variable H of an encoder, and the output hidden variable H is an n x d matrix, wherein d is a characteristic dimension.
Step S25: the hidden variable H is input into a decoder, and the human body posture is predicted by using the decoder, and the hidden variable H is expressed by the following formula (10):
OutT+1=OutT+fpred(GRU(OutT,HT)) (10)
wherein the posture of the human body at the time T is OutT;fpred(. H) is a graph representing a multi-layer perceptron, GRU (. H) represents a graph gated loop network, HTIs the hidden variable at this time; outT+1The predicted value of the human body posture at the time of T +1 is obtained.
At an initial time, i.e. at a time when T is 0, Out0Initialisation is done to the last frame of the input sequence, i.e. the last n x 3 matrix, H, in the input sequence of t x n x 30Initialized to a hidden variable H.
The operation of the graph gated loop network can be expressed by the following equations (11) to (14):
rT=σ(rin(OutT)+rhid(AallHTWH)) (11)
uT=σ(uin(OutT)+uhid(AallHTWH)) (12)
cT=tanh(cin(OutT)+rT⊙chid(AallHTWH)) (13)
HT+1=uT⊙HT+(1-uT)⊙cT (14)
wherein r isin,rhid,uin,uhid,cin,chidRepresenting a trainable linear layer; wHRepresenting a trainable weight matrix; a. theallAn adjacency matrix that is a fully-connected graph; hT+1Gating the output of the loop network, i.e. HT+1=GRU(OutT,HT)。
After the hidden variable H is input, due to the recursive characteristic of the graph-gated loop network, the decoder takes the predicted value at the previous moment as input, can recursively generate the predicted value of the human posture action sequence with any number of frames, and outputs (f-t) frame data which is recorded as Output.
Step S26: the parameters are trained using a gradient descent method, and a loss function is set as the following equation (15):
Figure BDA0003071167210000081
wherein, the Output sequence of the decoder is data of (f-t) frame, the Output size of the data is (f-t) × n × 3, OutputgtIs the desired output.
As shown in fig. 6, in one embodiment, the step S3: acquiring a three-dimensional position data sequence of a human body key joint for prediction, constructing graph data, inputting a trained attention-based system fusion multi-flow graph neural network model, and obtaining a predicted value of a human body posture, wherein the predicted value comprises the following steps:
step S31: obtaining a three-dimensional position data sequence of human key joints for prediction according to step S1, and constructing the map data;
in the same manner as the three-dimensional position data sequence of the human key joint for training is acquired in step S1, the three-dimensional position data sequence of the human key joint is first acquired, expressed as a t × n × 3 matrix, arranged in n × (t × 3) according to the joint, and expressed as a matrix V. The constructed three graph data are respectively full-connected graphs Gall=(V,Aall) High degree of freedom articulation diagram Gmobile=(V,Amobile) Low degree of freedom articulation diagram Gstable=(V,Astable)。
Step S32: inputting the attention-based mechanism fusion multi-flow graph neural network model trained based on the step S2, and performing feature extraction by an encoder to obtain a result as an invisible variable H; and inputting the H into a decoder, and outputting to obtain a predicted value of the human body posture.
As shown in table 1 below, it is verified through experiments that a high accuracy can be obtained by predicting various actions on the disclosed data set CMU Mocap using the embodiment of the present invention.
TABLE 1 mean joint angle error comparison of the method of the present invention to other methods on CMU Mocap dataset
Figure BDA0003071167210000091
The index represents the prediction accuracy ranking for the different methods, with prediction duration in milliseconds.
As shown in table 2, the average accuracy for all actions over the published data set Human 3.6M was optimized using the present method as verified experimentally.
TABLE 2 mean joint angle error of the method of the invention versus other methods on a Human 3.6M dataset
Figure BDA0003071167210000092
The index represents the prediction accuracy ranking for the different methods, with prediction duration in milliseconds.
The human posture prediction method based on the multi-flow graph neural network provided by the invention constructs a plurality of graph models based on human joint position data and structural characteristics, realizes modeling on a human motion system, predicts the human posture and achieves higher accuracy. And extracting joint movement characteristics by using a multi-flow graph neural network, and fusing global information by using an attention model. The invention overcomes the defect that the existing human posture prediction method has larger errors in violent movement and long-term movement, and obtains better experimental results; the method has simple network structure and real-time operation, can realize the real-time prediction of the human body posture, and is beneficial to practical application. Compared with other existing human body posture prediction methods, the method provided by the invention has better performance when performing 500-1000 millisecond middle and long term motion prediction and violent exercise.
Example two
As shown in fig. 7, an embodiment of the present invention provides a human body posture prediction system based on an attention mechanism fused multiflow neural network, including the following modules:
a training data acquisition module 41, configured to acquire a three-dimensional position data sequence of a human body key joint for training, and divide the three-dimensional position data sequence into an input sequence and an output sequence according to lengths of a preset input sequence and a preset output sequence; constructing graph data according to the input sequence;
the model training module 42 is used for constructing a neural network model based on attention mechanism fusion multi-flow graph; inputting graph data into an attention-based system fusion multi-flowgram neural network model for training to obtain a trained attention-based system fusion multi-flowgram neural network model;
and the human body posture prediction module 43 is used for acquiring a three-dimensional position data sequence of the human body key joint for prediction, constructing graph data, and inputting the trained attention-based system fusion multi-flow graph neural network model to obtain a predicted value of the human body posture.
The above examples are provided only for the purpose of describing the present invention, and are not intended to limit the scope of the present invention. The scope of the invention is defined by the appended claims. Various equivalent substitutions and modifications can be made without departing from the spirit and principles of the invention, and are intended to be within the scope of the invention.

Claims (5)

1. A human body posture prediction method based on attention mechanism fusion multi-flow graph neural network is characterized by comprising the following steps:
step S1: acquiring a three-dimensional position data sequence of a human body key joint for training, and dividing the three-dimensional position data sequence into an input sequence and an output sequence according to the lengths of a preset input sequence and the preset output sequence; constructing graph data according to the input sequence;
step S2: constructing a neural network model fusing multi-flow graphs based on an attention mechanism; inputting the graph data into the attention-based system fusion multi-flowsheet neural network model for training to obtain the trained attention-based system fusion multi-flowsheet neural network model;
step S3: and acquiring a three-dimensional position data sequence of the human body key joint for prediction, constructing the graph data, and inputting the trained attention-based mechanism fusion multi-flow graph neural network model to obtain a predicted value of the human body posture.
2. The method for predicting the posture of the human body based on the attention mechanism fusion multi-flow graph neural network according to claim 1, wherein the step S1: acquiring a three-dimensional position data sequence of a human body key joint for training, and dividing the three-dimensional position data sequence into an input sequence and an output sequence according to the lengths of a preset input sequence and the preset output sequence; constructing graph data according to the input sequence, which specifically comprises the following steps:
step S11: acquiring a data sequence f multiplied by n multiplied by 3 of a three-dimensional position of a human key joint for training, wherein f represents the frame number of the data sequence, and n represents the number of joints;
step S12: taking the first t frames of the data sequence as the input sequence, and (f-t) frames as the output sequence; wherein the input sequence is represented as t × n × 3; the output sequence is expressed as (f-t) x n x 3;
step S13: constructing a node matrix V and an adjacent matrix A according to the input sequence t multiplied by n multiplied by 3, and constructing a full-connection graph G therebyall=(V,Aall) High degree of freedom articulation diagram Gmobile=(V,Amobile) And low degree of freedom articulation Gstable=(V,Astable) (ii) a Wherein A isallFully connected adjacency matrix representing the relationship of all joints, AmobileTo representHigh-freedom joint adjacency matrix of high-freedom inter-joint relation, AstableA low degree of freedom joint adjacency matrix representing a low degree of freedom inter-joint linkage.
3. The method for predicting the posture of the human body based on the attention mechanism fusion multi-flow graph neural network according to claim 1, wherein the step S2: constructing a neural network model fusing multi-flow graphs based on an attention mechanism; inputting the graph data into the attention-based mechanism fusion multi-flowsheet neural network model for training to obtain the trained attention-based mechanism fusion multi-flowsheet neural network model, which specifically comprises the following steps:
step S21: the attention-based mechanism fusion multi-flow graph neural network adopts an encoder and a decoder structure, wherein the encoder comprises a plurality of encoding modules; inputting the graph data into the encoder for feature extraction, wherein the output of the encoding module is shown as the following formula (4):
Outi=Atti(GCNall1(Gall),GCNstable1(Gstable),GCNmobile1(Gmobile)) (4)
wherein, the output Out of the ith coding modulei=(Gall1,Gstable1,Gmobile1) Contains three graph data; GCN (-) represents a single graph neural network layer; atti() represents the ith attention module;
step S22: the input of the single graph neural network layer is G ═ (V, a), the output is X, and the following formula (5) is expressed:
X=ReLU(AVW+VU) (5)
wherein W and U are trainable weight matrices with a size of (t × 3) × D, D being a feature dimension of the output of the desired graph neural network layer;
step S23: the input of the attention module is G1=(V1,Aall),G2=(V2,Astable),G3=(V3,Amobile) The output is X' ═ Gout1,Gout2,Gout3),The relationship between them is expressed by the following equations (6) to (8):
Figure FDA0003071167200000021
Vmid4=GCN2(CONCAT(Vmid1,Vmid2,Vmid3)) (7)
Figure FDA0003071167200000022
wherein CONCAT (-) represents a matrix splicing operation in a characteristic dimension; SoftMax (·) denotes a normalized exponential function; w1,1W1,2……W3,2Etc. represent trainable parameters of different linear layers; vmid1、Vmid2、Vmid3、Vmid4Is an intermediate variable;
step S24: the output hidden variable H of the encoder is represented by the following formula (9):
H=λ1Vallf2Vstablef3Vmobilef (9)
wherein the output of the last encoding module is Outf=(Gallf,Gstablef,Gmobilef),Gallf=(Vallf,Aall),Gstablef=(Vstablef,Astable),Gmobilef=(Vmobilef,Amobile);λ1、λ2、λ3Respectively configurable parameters;
step S25: inputting the hidden variable H into the decoder, and predicting the human body posture by using the decoder, wherein the formula (10) is represented as follows:
OutT+1=OutT+fpred(GRU(OutT,HT)) (10)
wherein, at the time of T, the posture of the human body is OutT;fpred(. cndot.) is a graph gate representing a multi-layer perceptron, GRU (. cndot.) represents a graph gateControl loop network, HTIs the hidden variable at this time;
step S26: the parameters are trained using a gradient descent method, and a loss function is set as the following formula:
Figure FDA0003071167200000031
wherein the (f-t) frame Output from the decoder outputs the data of the sequence, and the Output size of the data is (f-t) × n × 3, OutputgtIs the desired output.
4. The method for predicting the posture of the human body based on the attention mechanism fusion multi-flow graph neural network according to claim 1, wherein the step S3: acquiring a three-dimensional position data sequence of a human body key joint for prediction, constructing graph data, inputting the trained attention-based mechanism fusion multi-flow graph neural network model, and obtaining a predicted value of a human body posture, wherein the predicted value comprises the following steps:
step S31: obtaining a three-dimensional position data sequence of human key joints for prediction according to step S1, and constructing the map data;
step S32: inputting the attention-based mechanism fusion multi-flow graph neural network model trained based on the step S2, and performing feature extraction by the encoder to obtain a result as an invisible variable H; and inputting H into the decoder, and outputting to obtain a predicted value of the human body posture.
5. A human body posture prediction system based on attention mechanism fusion multi-flow graph neural network is characterized by comprising the following modules:
the training data acquisition module is used for acquiring a three-dimensional position data sequence of a human body key joint for training and dividing the three-dimensional position data sequence into an input sequence and an output sequence according to the length of a preset input sequence and the length of the preset output sequence; constructing graph data according to the input sequence;
the model training module is used for constructing a neural network model based on an attention mechanism fusion multi-flow graph; inputting the graph data into the attention-based system fusion multi-flowsheet neural network model for training to obtain the trained attention-based system fusion multi-flowsheet neural network model;
and the human body posture prediction module is used for acquiring a three-dimensional position data sequence of a human body key joint for prediction, constructing the graph data, and inputting the trained attention-based system fusion multi-flow graph neural network model to obtain a predicted value of the human body posture.
CN202110539624.0A 2021-05-18 2021-05-18 Human body posture prediction method and system based on attention mechanism fusion multi-flow diagram Active CN113642379B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110539624.0A CN113642379B (en) 2021-05-18 2021-05-18 Human body posture prediction method and system based on attention mechanism fusion multi-flow diagram

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110539624.0A CN113642379B (en) 2021-05-18 2021-05-18 Human body posture prediction method and system based on attention mechanism fusion multi-flow diagram

Publications (2)

Publication Number Publication Date
CN113642379A true CN113642379A (en) 2021-11-12
CN113642379B CN113642379B (en) 2024-03-01

Family

ID=78415817

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110539624.0A Active CN113642379B (en) 2021-05-18 2021-05-18 Human body posture prediction method and system based on attention mechanism fusion multi-flow diagram

Country Status (1)

Country Link
CN (1) CN113642379B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114419105A (en) * 2022-03-14 2022-04-29 深圳市海清视讯科技有限公司 Multi-target pedestrian trajectory prediction model training method, prediction method and device
CN114926860A (en) * 2022-05-12 2022-08-19 哈尔滨工业大学 Three-dimensional human body attitude estimation method based on millimeter wave radar
CN114943324A (en) * 2022-05-26 2022-08-26 中国科学院深圳先进技术研究院 Neural network training method, human motion recognition method and device, and storage medium
CN115407874A (en) * 2022-08-18 2022-11-29 中国兵器工业标准化研究所 Neural network-based VR maintenance training operation proficiency prediction method
CN115862338A (en) * 2023-03-01 2023-03-28 天津大学 Airport traffic flow prediction method, system, electronic device and medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020258611A1 (en) * 2019-06-28 2020-12-30 山东科技大学 Lymph node ct detection system employing recurrent spatio-temporal attention mechanism
CN112183862A (en) * 2020-09-29 2021-01-05 长春理工大学 Traffic flow prediction method and system for urban road network

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020258611A1 (en) * 2019-06-28 2020-12-30 山东科技大学 Lymph node ct detection system employing recurrent spatio-temporal attention mechanism
CN112183862A (en) * 2020-09-29 2021-01-05 长春理工大学 Traffic flow prediction method and system for urban road network

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
张志扬;张凤荔;陈学勤;王瑞锦;: "基于分层注意力的信息级联预测模型", 计算机科学, no. 06 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114419105A (en) * 2022-03-14 2022-04-29 深圳市海清视讯科技有限公司 Multi-target pedestrian trajectory prediction model training method, prediction method and device
CN114926860A (en) * 2022-05-12 2022-08-19 哈尔滨工业大学 Three-dimensional human body attitude estimation method based on millimeter wave radar
CN114943324A (en) * 2022-05-26 2022-08-26 中国科学院深圳先进技术研究院 Neural network training method, human motion recognition method and device, and storage medium
CN114943324B (en) * 2022-05-26 2023-10-13 中国科学院深圳先进技术研究院 Neural network training method, human motion recognition method and device, and storage medium
CN115407874A (en) * 2022-08-18 2022-11-29 中国兵器工业标准化研究所 Neural network-based VR maintenance training operation proficiency prediction method
CN115862338A (en) * 2023-03-01 2023-03-28 天津大学 Airport traffic flow prediction method, system, electronic device and medium

Also Published As

Publication number Publication date
CN113642379B (en) 2024-03-01

Similar Documents

Publication Publication Date Title
CN113642379B (en) Human body posture prediction method and system based on attention mechanism fusion multi-flow diagram
Irshad et al. Hierarchical cross-modal agent for robotics vision-and-language navigation
Piergiovanni et al. Learning real-world robot policies by dreaming
CN112906604A (en) Behavior identification method, device and system based on skeleton and RGB frame fusion
CN111191630B (en) Performance action recognition method suitable for intelligent interactive viewing scene
CN113239897B (en) Human body action evaluation method based on space-time characteristic combination regression
CN115659275A (en) Real-time accurate trajectory prediction method and system in unstructured human-computer interaction environment
CN113239892A (en) Monocular human body three-dimensional attitude estimation method based on data enhancement architecture
CN112446253A (en) Skeleton behavior identification method and device
CN113240714B (en) Human motion intention prediction method based on context awareness network
Agarwal et al. FitMe: a fitness application for accurate pose estimation using deep learning
Chowdhury et al. Assessment of rehabilitation exercises from depth sensor data
Réby et al. Graph transformer for physical rehabilitation evaluation
Ramachandruni et al. Attentive task-net: Self supervised task-attention network for imitation learning using video demonstration
Hwang et al. A deep learning approach for seamless integration of cognitive skills for humanoid robots
CN117037216A (en) Badminton motion prediction method and device oriented to human skeleton
CN112153242A (en) Virtual photography method based on camera behavior learning and sample driving
CN116030533A (en) High-speed motion capturing and identifying method and system for motion scene
Dang et al. Imitation learning-based algorithm for drone cinematography system
Mocanu et al. Human activity recognition with convolution neural network using tiago robot
Postnikov et al. Conditioned human trajectory prediction using iterative attention blocks
Aslan et al. End-to-end learning from demonstation for object manipulation of robotis-Op3 humanoid robot
Ogata et al. Prediction and imitation of other's motions by reusing own forward-inverse model in robots
Hristov et al. Multi-view RGB-D System for Person Specific Activity Recognition in the context of holographic communication
Ko et al. Imitative neural mechanism-based behavior intention recognition system in human–robot interaction

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant