CN113642379A - Human body posture prediction method and system based on attention mechanism fusion multi-flow graph - Google Patents
Human body posture prediction method and system based on attention mechanism fusion multi-flow graph Download PDFInfo
- Publication number
- CN113642379A CN113642379A CN202110539624.0A CN202110539624A CN113642379A CN 113642379 A CN113642379 A CN 113642379A CN 202110539624 A CN202110539624 A CN 202110539624A CN 113642379 A CN113642379 A CN 113642379A
- Authority
- CN
- China
- Prior art keywords
- sequence
- neural network
- human body
- attention
- graph
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 230000004927 fusion Effects 0.000 title claims abstract description 49
- 238000000034 method Methods 0.000 title claims abstract description 46
- 230000007246 mechanism Effects 0.000 title claims abstract description 43
- 238000003062 neural network model Methods 0.000 claims abstract description 39
- 238000013528 artificial neural network Methods 0.000 claims abstract description 38
- 238000012549 training Methods 0.000 claims abstract description 30
- 239000011159 matrix material Substances 0.000 claims description 35
- 238000010586 diagram Methods 0.000 claims description 11
- 238000000605 extraction Methods 0.000 claims description 5
- 238000011478 gradient descent method Methods 0.000 claims description 3
- NAWXUBYGYWOOIX-SFHVURJKSA-N (2s)-2-[[4-[2-(2,4-diaminoquinazolin-6-yl)ethyl]benzoyl]amino]-4-methylidenepentanedioic acid Chemical compound C1=CC2=NC(N)=NC(N)=C2C=C1CCC1=CC=C(C(=O)N[C@@H](CC(=C)C(O)=O)C(O)=O)C=C1 NAWXUBYGYWOOIX-SFHVURJKSA-N 0.000 claims description 2
- 230000009471 action Effects 0.000 description 6
- 238000005516 engineering process Methods 0.000 description 5
- 230000007774 longterm Effects 0.000 description 4
- 238000012545 processing Methods 0.000 description 4
- 230000008901 benefit Effects 0.000 description 3
- 238000013136 deep learning model Methods 0.000 description 3
- 230000007547 defect Effects 0.000 description 3
- 230000006870 function Effects 0.000 description 3
- 230000009286 beneficial effect Effects 0.000 description 2
- 238000013527 convolutional neural network Methods 0.000 description 2
- 230000000306 recurrent effect Effects 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 208000012661 Dyskinesia Diseases 0.000 description 1
- 210000000544 articulatio talocruralis Anatomy 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 230000003930 cognitive ability Effects 0.000 description 1
- 230000001149 cognitive effect Effects 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000007796 conventional method Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 210000002310 elbow joint Anatomy 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 210000003414 extremity Anatomy 0.000 description 1
- 210000002683 foot Anatomy 0.000 description 1
- 210000004247 hand Anatomy 0.000 description 1
- 230000036541 health Effects 0.000 description 1
- 210000001624 hip Anatomy 0.000 description 1
- 210000004394 hip joint Anatomy 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 210000000629 knee joint Anatomy 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000012423 maintenance Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000035772 mutation Effects 0.000 description 1
- 230000004770 neurodegeneration Effects 0.000 description 1
- 208000015122 neurodegenerative disease Diseases 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000012805 post-processing Methods 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 210000000323 shoulder joint Anatomy 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 210000003857 wrist joint Anatomy 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Image Analysis (AREA)
Abstract
The invention relates to a human body posture prediction method and a human body posture prediction system based on attention mechanism fusion multi-flow graph neural network, wherein the method comprises the following steps: s1: acquiring a three-dimensional position data sequence of a human body key joint for training, and dividing the three-dimensional position data sequence into an input sequence and an output sequence according to the lengths of a preset input sequence and a preset output sequence; constructing graph data according to the input sequence; s2: constructing a neural network model fusing multi-flow graphs based on an attention mechanism; inputting graph data into the model for training to obtain a trained model; s3: the method comprises the steps of obtaining a three-dimensional position data sequence of a human body key joint for prediction, constructing graph data, inputting a trained attention-based system fusion multi-flow graph neural network model, and obtaining a predicted value of a human body posture. The method provided by the invention constructs a plurality of graph models based on the position data and the structural characteristics of the human joints, realizes the modeling of a human motion system, predicts the human posture and achieves higher accuracy.
Description
Technical Field
The invention relates to the field of computer vision and image processing, in particular to a human body posture prediction method and system based on attention mechanism fusion multi-flowgram neural network.
Background
In recent years, with the wide use of consumer-grade depth cameras with low cost, the low-cost and real-time acquisition and integration of human body three-dimensional motion gestures are possible, so that the human body gesture is predicted to be a hot spot problem of intersection of graphics and computer vision, and the method has wide application prospects and rich application scenes in the fields of robots, medical treatment and automatic driving technologies.
In the field of robots, the related technology of exoskeleton robots is a hot spot of research, has very important application in aerospace, is considered to be key support equipment for developing on-orbit construction, maintenance and lunar and mars surface activities, and plays an extremely important role in completing an aerospace outbound task and handling emergency faults. The human body posture prediction algorithm judges and predicts the movement intention of the astronaut by identifying and analyzing the human body movement rule, assists the exoskeleton robot to complete force compliance control, reduces force mutation in the exoskeleton robot control, and therefore effectively improves the adaptability of the space suit to the movement of the astronaut.
Motion capture technology has been known for fifty years ago, is widely applied to entertainment industries such as film and television, games and the like, and helps patients with dyskinesia in medical and health industries. However, at present, people still cannot generate vivid cloud data which accords with the structural characteristics and the motion balance state of the human body by separating from real motion capture data, but the current mainstream motion capture technology is still based on an inertial sensor or a complex optical sensing system, has the disadvantages of high cost, need of a professional field and the like, has a long distance from 'flying into the family of common people', benefits patients who need to finish daily motion by means of auxiliary instruments, and has a long distance. The human body posture prediction algorithm is applied to the motion generation technology, the utilization rate of motion capture data can be greatly improved, and the post-processing workload of the motion data is reduced. Through the increasingly popular time-of-flight cameras, the human posture prediction algorithm can be more easily applied to the field of human-computer interaction, for example, the human posture prediction algorithm guides a subject to complete correct actions according to collected actions of the subject, so that rehabilitation of patients suffering from limb cognitive defects caused by neurodegenerative diseases is assisted.
Therefore, human posture prediction has important application value, but because human motion has more than 200 degrees of freedom, and the degrees of freedom of each joint of the human body have great correlation, the traditional methods such as inverse kinematics, dynamics and the like are very difficult to simulate the human motion, an accurate human motion model cannot be established, the complexity of the human motion rule is very high, and particularly, the balance coordination mechanism of the human motion is difficult to simulate. This brings great difficulty to the human body action prediction based on the conventional statistical learning method.
In recent years, with the development of computer hardware and machine learning algorithms, a human body prediction method based on deep learning has been proposed. In the deep learning model-based method, a researcher only needs to construct a deep neural network model without paying attention to the cognitive abilities of different behaviors such as complex kinematic constraints and parameters, and then can learn some potential motion characteristics in human motion data through training of a large amount of motion data, and predict the next motion trajectory of a person according to the potential motion characteristics. Early deep learning models were mainly based on recurrent neural networks and convolutional neural networks, but had certain drawbacks: the recurrent neural network emphasizes the time sequence relation of the sequence and ignores the spatial information; the work of the convolutional neural network makes single frame skeleton data construct a one-dimensional vector, the sequence is regarded as a two-dimensional matrix, the position change of a single joint along with time is focused, the correlation among all joints of a human body is ignored, and the topological structure information of the human body can not be fully utilized.
As a deep learning model which has attracted attention in recent two years, a graph neural network is a neural network which is specially used for processing graph data, and is applied to various research fields such as recommendation systems, reasoning and proving, chemistry, traffic, brain-like intelligence and the like. Recent work proposes a method based on a graph neural network, which graphically represents the three-dimensional skeleton of a human body and can well utilize the prior structure information of the human skeleton. The conventional method based on the graph neural network simply constructs an adjacency matrix according to the spatial adjacency relation of human joints, but does not consider the generation mode of human motion, so that the connection of each joint of the human body in motion cannot be fully expressed. Human motion is accomplished by the cooperation of many joints that are related to each other on the kinematic chain, rather than by only a few adjacent joints. And ergonomics-based studies have shown that the human kinematic chain usually comprises three-degree-of-freedom joints and two-degree-of-freedom joints, and these two joints are often alternately distributed in the kinematic chain. Joints with different degrees of freedom in the kinematic chain have different force characteristics during movement. Therefore, constructing the adjacency matrix only in accordance with the spatial adjacency of the human joints does not sufficiently utilize the prior information on the structure and kinematics of the human joints.
Therefore, how to fully utilize the prior information on the structure and kinematics of the human joint becomes an urgent problem
Disclosure of Invention
In order to solve the technical problems, the invention provides a human body posture prediction method and system based on attention mechanism fusion multi-flowgram neural network.
The technical solution of the invention is as follows: a human body posture prediction method based on attention mechanism fusion multi-flow graph neural network comprises the following steps:
step S1: acquiring a three-dimensional position data sequence of a human body key joint for training, and dividing the three-dimensional position data sequence into an input sequence and an output sequence according to the lengths of a preset input sequence and a preset output sequence; constructing graph data according to the input sequence;
step S2: constructing a neural network model fusing multi-flow graphs based on an attention mechanism; inputting graph data into an attention-based system fusion multi-flowgram neural network model for training to obtain a trained attention-based system fusion multi-flowgram neural network model;
step S3: the method comprises the steps of obtaining a three-dimensional position data sequence of a human body key joint for prediction, constructing graph data, inputting a trained attention-based system fusion multi-flow graph neural network model, and obtaining a predicted value of a human body posture.
Compared with the prior art, the invention has the following advantages:
1. the human posture prediction method based on the multi-flow graph neural network provided by the invention constructs a plurality of graph models based on human joint position data and structural characteristics, realizes modeling on a human motion system, predicts the human posture and achieves higher accuracy. Meanwhile, joint movement characteristics are extracted by using a multi-flow graph neural network, and global information is fused by using an attention model.
2. The invention overcomes the defect that the existing human posture prediction method has larger errors in violent movement and long-term movement, and obtains better experimental results; the method has simple network structure and real-time operation, can realize the real-time prediction of the human body posture, and is beneficial to practical application. Compared with other existing human body posture prediction methods, the method provided by the invention has better performance when performing 500-1000 millisecond middle and long term motion prediction and violent exercise.
Drawings
FIG. 1 is a flowchart of a human body posture prediction method based on an attention mechanism fused with a multi-flow graph neural network according to an embodiment of the present invention;
fig. 2 is a flowchart of a human body posture prediction method based on an attention mechanism and fusion multi-flow graph neural network in an embodiment of the present invention, in which step S1: acquiring a three-dimensional position data sequence of a human body key joint for training, and dividing the three-dimensional position data sequence into an input sequence and an output sequence according to the lengths of a preset input sequence and the preset output sequence; constructing a flow chart of graph data according to the input sequence;
FIG. 3 is a schematic structural diagram of a multi-flow graph neural network model based on attention mechanism fusion in an embodiment of the present invention;
fig. 4 shows a human body posture prediction method based on an attention mechanism and a multi-flow graph neural network in an embodiment of the present invention, in which step S2: constructing a neural network model fusing multi-flow graphs based on an attention mechanism; inputting graph data into an attention mechanism-based fusion multi-flowsheet neural network model for training to obtain a trained attention mechanism-based fusion multi-flowsheet neural network model-based flow chart;
FIG. 5 is a schematic structural diagram of an attention module in a neural network based on attention mechanism fusion with multiple flowsheets according to an embodiment of the present invention;
fig. 6 shows a human body posture prediction method based on an attention mechanism and a multi-flow graph neural network in an embodiment of the present invention, in which step S3: acquiring a three-dimensional position data sequence of a human body key joint for prediction, constructing graph data, inputting a trained attention-based system fusion multi-flow graph neural network model, and obtaining a flow chart of a predicted value of a human body posture;
fig. 7 is a block diagram of a human body posture prediction system based on an attention mechanism and a multi-flow graph neural network.
Detailed Description
The invention provides a human body posture prediction method based on an attention mechanism fusion multi-flowgram neural network, which aims at the problem that the prior art cannot fully utilize the structure of human body joints and prior information on kinematics, provides a human body posture prediction method based on an attention mechanism fusion multi-flowgram neural network model, realizes the modeling of a human body motion system, predicts the human body posture and achieves higher accuracy.
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings.
Example one
As shown in fig. 1, a human body posture prediction method based on an attention mechanism and fusion multi-flow diagram neural network provided in an embodiment of the present invention includes the following steps:
step S1: acquiring a three-dimensional position data sequence of a human body key joint for training, and dividing the three-dimensional position data sequence into an input sequence and an output sequence according to the lengths of a preset input sequence and a preset output sequence; constructing graph data according to the input sequence;
step S2: constructing a neural network model fusing multi-flow graphs based on an attention mechanism; inputting graph data into an attention-based system fusion multi-flowgram neural network model for training to obtain a trained attention-based system fusion multi-flowgram neural network model;
step S3: the method comprises the steps of obtaining a three-dimensional position data sequence of a human body key joint for prediction, constructing graph data, inputting a trained attention-based system fusion multi-flow graph neural network model, and obtaining a predicted value of a human body posture.
As shown in fig. 2, the above step S1: acquiring a three-dimensional position data sequence of a human body key joint for training, and dividing the three-dimensional position data sequence into an input sequence and an output sequence according to the lengths of a preset input sequence and the preset output sequence; constructing graph data according to the input sequence, which specifically comprises the following steps:
step S11: acquiring a data sequence f multiplied by n multiplied by 3 of a three-dimensional position of a human key joint for training, wherein f represents the frame number of the data sequence, and n represents the number of joints;
the training data sequence used in the embodiment of the present invention may be a data sequence in a public data set (such as a Human 3.6m data set and a CMU Mocap data set), or a Human motion data sequence acquired by an RGB-D camera or other devices. The obtained key joints or key points of the human body generally comprise not less than 20 positions of the head, the neck, the chest, the waist, the left and right shoulder joints, the left and right elbow joints, the left and right wrist joints, the left and right hands, the left and right hip joints, the left and right knee joints, the left and right ankle joints, the left and right feet and the like. A three-dimensional position data sequence may be represented as an f x n x 3 matrix, where f represents the number of frames in the sequence, n represents the number of joints included in the data, and the three-dimensional coordinates of a joint in a frame in a world coordinate systemIs noted as (X)W,YW,ZW). For joints which are not captured due to occlusion or other reasons in the acquisition process, in order to ensure that the size of the matrix is fixed and the subsequent processing is convenient, the three-dimensional coordinate of the joint is marked as (0,0, 0).
Step S12: taking the first t frames of the data sequence as an input sequence, and taking (f-t) frames of the data sequence as the output sequence; wherein the input sequence is represented as t × n × 3; the output sequence is expressed as (f-t) x n x 3;
the three-dimensional position data sequence is divided into an input sequence and an output sequence according to the preset lengths of the input sequence and the output sequence. For example, for a three-dimensional position data sequence of a human body key joint for training, the size of which is f × n × 3, the previous t frames of data are set for input, and an input sequence of t × n × 3 is generated; the desired output is (f-t) frames, and the last (f-t) frame data of the three-dimensional position data sequence is taken as the desired output, i.e., an (f-t) × n × 3 output sequence is generated.
Step S13: constructing a node matrix V and an adjacent matrix A according to the input sequence t multiplied by n multiplied by 3, and constructing a full-connection graph G therebyall=(V,Aall) High degree of freedom articulation diagram Gmobile=(V,Amobile) And low degree of freedom articulation Gstable=(V,Astable) (ii) a Wherein A isallFully connected adjacency matrix representing the relationship of all joints, AmobileHigh-freedom joint adjacency matrix representing the relation of linkage between high-freedom joints, AstableA low degree of freedom joint adjacency matrix representing a low degree of freedom inter-joint linkage.
The input sequence of t × n × 3 obtained in step S12 is arranged into n × (t × 3) in accordance with the joint, and a matrix V is constructed. Regarding each joint as each node of the graph data, i.e., the feature of each node in the graph data can be represented by a t × 3 matrix. The degree of freedom of the human joint indicates the number of directions in which the joint can move, and for example, the degree of freedom of a ball joint, a condyloid joint, or the like is 3, and the degree of freedom of a saddle joint, a hinge, or the like is 2. These joints have different kinematic and dynamic properties. According to the self of human body jointsFrom degrees, three different adjacency matrices can be constructed: respectively, a fully connected adjacency matrix A representing all inter-joint relationallAnd a high-freedom joint adjacency matrix A representing the relation between the high-freedom jointsmobileAnd a low-degree-of-freedom joint adjacency matrix A representing the relationship between the low-degree-of-freedom jointsstable(ii) a The adjacency matrix is a matrix used for representing adjacency relation between nodes in graph data, and the adjacency matrix is constructed according to the following mode in the embodiment of the invention:
for the ith joint of the n joints, if the degree of freedom of the ith joint is 3, namely the joint can perform movements in three-dimensional space such as yaw angle, pitch angle and rotation angle, the ith joint is considered to be a high-degree-of-freedom joint, hofi1, otherwise hofi0, wherein hof is high grades of freedom.
Thus, for a fully connected adjacency matrix A of size n × nallTo A, aallElement A in (A)all(i,j)The following formula (1) is defined:
Aall(i,j)=1 (1)
for high-freedom joint adjacent matrix A with the size of n multiplied by nmobileTo A, amobileElement A in (A)mobile(i,j)Is defined as the following formula (2):
for a low-freedom joint adjacent matrix A with the size of n multiplied by nstableTo A, astableElement A in (A)stable(i,j)Is defined as the following formula (3):
in the three groups of graph data, the number of nodes is consistent with the characteristics corresponding to each node, and the difference is that the adjacent matrixes corresponding to the three groups of graph data are A respectivelyall、AmobileAnd Astable. Combining n × t × 3 input sequencesRecording as a node matrix V, and according to the three adjacent matrixes, the three constructed graph data are respectively a full-connection graph Gall=(V,Aall) High degree of freedom articulation diagram Gmobile=(V,Amobile) Low degree of freedom articulation diagram Gstable=(V,Astable)。
As shown in fig. 3, the attention-based system fused multi-flow graph neural network in the embodiment of the present invention adopts an encoder and a decoder structure, the graph data constructed in step S1 is input into the encoder for feature extraction, the processing result is a hidden variable H, the hidden variable H is input into the decoder, and the output of the decoder is a predicted value of the human posture. And taking the average error of the predicted value and the true value as a loss function, and training the parameters of the model by using a gradient descent method. The specific training steps are as follows:
as shown in fig. 4, in one embodiment, the step S2: constructing a neural network model fusing multi-flow graphs based on an attention mechanism; inputting graph data into the attention-based system fusion multi-flowsheet neural network model for training to obtain the trained attention-based system fusion multi-flowsheet neural network model, which specifically comprises the following steps:
step S21: an encoder and a decoder structure are adopted on the basis of the attention mechanism fusion multi-flow graph neural network, wherein the encoder comprises a plurality of encoding modules; the graph data is input into the encoder for feature extraction, and the output of the encoding module is shown as the following formula (4):
Outi=Atti(GCNalli(Gall),GCNstablei(Gstable),GCNmobilei(Gmobile)) (4)
wherein, a plurality of coding modules are connected in series, and the output Out of the ith coding modulei=(Galli,Gstablei,Gmobilei) Contains three graph data; wherein GCN (-) represents a single graph neural network layer; atti() represents the ith attention module;
step S22: a single graph neural network layer, whose input G ═ V, a, and output X, is represented by the following equation (5):
X=ReLU(AVW+VU) (5)
wherein W and U are trainable weight matrices with a size of (t × 3) × D, D being a feature dimension of the output of the desired graph neural network layer;
step S23: the input to the attention Module is G1=(V1,Aall),G2=(V2,Astable),G3=(V3,Amobile) The output is: x' ═ G (G)out1,Gout2,Gout3) As shown in fig. 5, the structural diagram of the attention module shows the following formulas (6) to (8):
Vmid4=GCN2(CONCAT(Vmid1,Vmid2,Vmid3)) (7)
wherein CONCAT (-) represents a matrix splicing operation in a characteristic dimension; SoftMax (·) denotes a normalized exponential function; w1,1W1,2……W3,2Etc. represent trainable parameters of different linear layers; vmid1、Vmid2、Vmid3、Vmid4Is an intermediate variable.
The above is one of a plurality of serially connected encoding modules in an encoder, and the modules have the same model structure, but have independent parameters and can be independently trained.
Step S24: the output hidden variable H of the encoder is represented by the following equation (9):
H=λ1Vallf+λ2Vstablef+λ3Vmobilef (9)
wherein, the output of the last coding module is Outf=(Gallf,Gstablef,Gmobilef),Gallf=(Vallf,Aall),Gstablef=(Vstablef,Astable),Gmobilef=(Vmobilef,Amobile);λ1、λ2、λ3Respectively configurable parameters.
A decoder in a multi-flow graph neural network based on attention mechanism fusion uses a graph gated loop network to carry out recursive prediction, the input of the decoder is an output hidden variable H of an encoder, and the output hidden variable H is an n x d matrix, wherein d is a characteristic dimension.
Step S25: the hidden variable H is input into a decoder, and the human body posture is predicted by using the decoder, and the hidden variable H is expressed by the following formula (10):
OutT+1=OutT+fpred(GRU(OutT,HT)) (10)
wherein the posture of the human body at the time T is OutT;fpred(. H) is a graph representing a multi-layer perceptron, GRU (. H) represents a graph gated loop network, HTIs the hidden variable at this time; outT+1The predicted value of the human body posture at the time of T +1 is obtained.
At an initial time, i.e. at a time when T is 0, Out0Initialisation is done to the last frame of the input sequence, i.e. the last n x 3 matrix, H, in the input sequence of t x n x 30Initialized to a hidden variable H.
The operation of the graph gated loop network can be expressed by the following equations (11) to (14):
rT=σ(rin(OutT)+rhid(AallHTWH)) (11)
uT=σ(uin(OutT)+uhid(AallHTWH)) (12)
cT=tanh(cin(OutT)+rT⊙chid(AallHTWH)) (13)
HT+1=uT⊙HT+(1-uT)⊙cT (14)
wherein r isin,rhid,uin,uhid,cin,chidRepresenting a trainable linear layer; wHRepresenting a trainable weight matrix; a. theallAn adjacency matrix that is a fully-connected graph; hT+1Gating the output of the loop network, i.e. HT+1=GRU(OutT,HT)。
After the hidden variable H is input, due to the recursive characteristic of the graph-gated loop network, the decoder takes the predicted value at the previous moment as input, can recursively generate the predicted value of the human posture action sequence with any number of frames, and outputs (f-t) frame data which is recorded as Output.
Step S26: the parameters are trained using a gradient descent method, and a loss function is set as the following equation (15):
wherein, the Output sequence of the decoder is data of (f-t) frame, the Output size of the data is (f-t) × n × 3, OutputgtIs the desired output.
As shown in fig. 6, in one embodiment, the step S3: acquiring a three-dimensional position data sequence of a human body key joint for prediction, constructing graph data, inputting a trained attention-based system fusion multi-flow graph neural network model, and obtaining a predicted value of a human body posture, wherein the predicted value comprises the following steps:
step S31: obtaining a three-dimensional position data sequence of human key joints for prediction according to step S1, and constructing the map data;
in the same manner as the three-dimensional position data sequence of the human key joint for training is acquired in step S1, the three-dimensional position data sequence of the human key joint is first acquired, expressed as a t × n × 3 matrix, arranged in n × (t × 3) according to the joint, and expressed as a matrix V. The constructed three graph data are respectively full-connected graphs Gall=(V,Aall) High degree of freedom articulation diagram Gmobile=(V,Amobile) Low degree of freedom articulation diagram Gstable=(V,Astable)。
Step S32: inputting the attention-based mechanism fusion multi-flow graph neural network model trained based on the step S2, and performing feature extraction by an encoder to obtain a result as an invisible variable H; and inputting the H into a decoder, and outputting to obtain a predicted value of the human body posture.
As shown in table 1 below, it is verified through experiments that a high accuracy can be obtained by predicting various actions on the disclosed data set CMU Mocap using the embodiment of the present invention.
TABLE 1 mean joint angle error comparison of the method of the present invention to other methods on CMU Mocap dataset
The index represents the prediction accuracy ranking for the different methods, with prediction duration in milliseconds.
As shown in table 2, the average accuracy for all actions over the published data set Human 3.6M was optimized using the present method as verified experimentally.
TABLE 2 mean joint angle error of the method of the invention versus other methods on a Human 3.6M dataset
The index represents the prediction accuracy ranking for the different methods, with prediction duration in milliseconds.
The human posture prediction method based on the multi-flow graph neural network provided by the invention constructs a plurality of graph models based on human joint position data and structural characteristics, realizes modeling on a human motion system, predicts the human posture and achieves higher accuracy. And extracting joint movement characteristics by using a multi-flow graph neural network, and fusing global information by using an attention model. The invention overcomes the defect that the existing human posture prediction method has larger errors in violent movement and long-term movement, and obtains better experimental results; the method has simple network structure and real-time operation, can realize the real-time prediction of the human body posture, and is beneficial to practical application. Compared with other existing human body posture prediction methods, the method provided by the invention has better performance when performing 500-1000 millisecond middle and long term motion prediction and violent exercise.
Example two
As shown in fig. 7, an embodiment of the present invention provides a human body posture prediction system based on an attention mechanism fused multiflow neural network, including the following modules:
a training data acquisition module 41, configured to acquire a three-dimensional position data sequence of a human body key joint for training, and divide the three-dimensional position data sequence into an input sequence and an output sequence according to lengths of a preset input sequence and a preset output sequence; constructing graph data according to the input sequence;
the model training module 42 is used for constructing a neural network model based on attention mechanism fusion multi-flow graph; inputting graph data into an attention-based system fusion multi-flowgram neural network model for training to obtain a trained attention-based system fusion multi-flowgram neural network model;
and the human body posture prediction module 43 is used for acquiring a three-dimensional position data sequence of the human body key joint for prediction, constructing graph data, and inputting the trained attention-based system fusion multi-flow graph neural network model to obtain a predicted value of the human body posture.
The above examples are provided only for the purpose of describing the present invention, and are not intended to limit the scope of the present invention. The scope of the invention is defined by the appended claims. Various equivalent substitutions and modifications can be made without departing from the spirit and principles of the invention, and are intended to be within the scope of the invention.
Claims (5)
1. A human body posture prediction method based on attention mechanism fusion multi-flow graph neural network is characterized by comprising the following steps:
step S1: acquiring a three-dimensional position data sequence of a human body key joint for training, and dividing the three-dimensional position data sequence into an input sequence and an output sequence according to the lengths of a preset input sequence and the preset output sequence; constructing graph data according to the input sequence;
step S2: constructing a neural network model fusing multi-flow graphs based on an attention mechanism; inputting the graph data into the attention-based system fusion multi-flowsheet neural network model for training to obtain the trained attention-based system fusion multi-flowsheet neural network model;
step S3: and acquiring a three-dimensional position data sequence of the human body key joint for prediction, constructing the graph data, and inputting the trained attention-based mechanism fusion multi-flow graph neural network model to obtain a predicted value of the human body posture.
2. The method for predicting the posture of the human body based on the attention mechanism fusion multi-flow graph neural network according to claim 1, wherein the step S1: acquiring a three-dimensional position data sequence of a human body key joint for training, and dividing the three-dimensional position data sequence into an input sequence and an output sequence according to the lengths of a preset input sequence and the preset output sequence; constructing graph data according to the input sequence, which specifically comprises the following steps:
step S11: acquiring a data sequence f multiplied by n multiplied by 3 of a three-dimensional position of a human key joint for training, wherein f represents the frame number of the data sequence, and n represents the number of joints;
step S12: taking the first t frames of the data sequence as the input sequence, and (f-t) frames as the output sequence; wherein the input sequence is represented as t × n × 3; the output sequence is expressed as (f-t) x n x 3;
step S13: constructing a node matrix V and an adjacent matrix A according to the input sequence t multiplied by n multiplied by 3, and constructing a full-connection graph G therebyall=(V,Aall) High degree of freedom articulation diagram Gmobile=(V,Amobile) And low degree of freedom articulation Gstable=(V,Astable) (ii) a Wherein A isallFully connected adjacency matrix representing the relationship of all joints, AmobileTo representHigh-freedom joint adjacency matrix of high-freedom inter-joint relation, AstableA low degree of freedom joint adjacency matrix representing a low degree of freedom inter-joint linkage.
3. The method for predicting the posture of the human body based on the attention mechanism fusion multi-flow graph neural network according to claim 1, wherein the step S2: constructing a neural network model fusing multi-flow graphs based on an attention mechanism; inputting the graph data into the attention-based mechanism fusion multi-flowsheet neural network model for training to obtain the trained attention-based mechanism fusion multi-flowsheet neural network model, which specifically comprises the following steps:
step S21: the attention-based mechanism fusion multi-flow graph neural network adopts an encoder and a decoder structure, wherein the encoder comprises a plurality of encoding modules; inputting the graph data into the encoder for feature extraction, wherein the output of the encoding module is shown as the following formula (4):
Outi=Atti(GCNall1(Gall),GCNstable1(Gstable),GCNmobile1(Gmobile)) (4)
wherein, the output Out of the ith coding modulei=(Gall1,Gstable1,Gmobile1) Contains three graph data; GCN (-) represents a single graph neural network layer; atti() represents the ith attention module;
step S22: the input of the single graph neural network layer is G ═ (V, a), the output is X, and the following formula (5) is expressed:
X=ReLU(AVW+VU) (5)
wherein W and U are trainable weight matrices with a size of (t × 3) × D, D being a feature dimension of the output of the desired graph neural network layer;
step S23: the input of the attention module is G1=(V1,Aall),G2=(V2,Astable),G3=(V3,Amobile) The output is X' ═ Gout1,Gout2,Gout3),The relationship between them is expressed by the following equations (6) to (8):
Vmid4=GCN2(CONCAT(Vmid1,Vmid2,Vmid3)) (7)
wherein CONCAT (-) represents a matrix splicing operation in a characteristic dimension; SoftMax (·) denotes a normalized exponential function; w1,1W1,2……W3,2Etc. represent trainable parameters of different linear layers; vmid1、Vmid2、Vmid3、Vmid4Is an intermediate variable;
step S24: the output hidden variable H of the encoder is represented by the following formula (9):
H=λ1Vallf+λ2Vstablef+λ3Vmobilef (9)
wherein the output of the last encoding module is Outf=(Gallf,Gstablef,Gmobilef),Gallf=(Vallf,Aall),Gstablef=(Vstablef,Astable),Gmobilef=(Vmobilef,Amobile);λ1、λ2、λ3Respectively configurable parameters;
step S25: inputting the hidden variable H into the decoder, and predicting the human body posture by using the decoder, wherein the formula (10) is represented as follows:
OutT+1=OutT+fpred(GRU(OutT,HT)) (10)
wherein, at the time of T, the posture of the human body is OutT;fpred(. cndot.) is a graph gate representing a multi-layer perceptron, GRU (. cndot.) represents a graph gateControl loop network, HTIs the hidden variable at this time;
step S26: the parameters are trained using a gradient descent method, and a loss function is set as the following formula:
wherein the (f-t) frame Output from the decoder outputs the data of the sequence, and the Output size of the data is (f-t) × n × 3, OutputgtIs the desired output.
4. The method for predicting the posture of the human body based on the attention mechanism fusion multi-flow graph neural network according to claim 1, wherein the step S3: acquiring a three-dimensional position data sequence of a human body key joint for prediction, constructing graph data, inputting the trained attention-based mechanism fusion multi-flow graph neural network model, and obtaining a predicted value of a human body posture, wherein the predicted value comprises the following steps:
step S31: obtaining a three-dimensional position data sequence of human key joints for prediction according to step S1, and constructing the map data;
step S32: inputting the attention-based mechanism fusion multi-flow graph neural network model trained based on the step S2, and performing feature extraction by the encoder to obtain a result as an invisible variable H; and inputting H into the decoder, and outputting to obtain a predicted value of the human body posture.
5. A human body posture prediction system based on attention mechanism fusion multi-flow graph neural network is characterized by comprising the following modules:
the training data acquisition module is used for acquiring a three-dimensional position data sequence of a human body key joint for training and dividing the three-dimensional position data sequence into an input sequence and an output sequence according to the length of a preset input sequence and the length of the preset output sequence; constructing graph data according to the input sequence;
the model training module is used for constructing a neural network model based on an attention mechanism fusion multi-flow graph; inputting the graph data into the attention-based system fusion multi-flowsheet neural network model for training to obtain the trained attention-based system fusion multi-flowsheet neural network model;
and the human body posture prediction module is used for acquiring a three-dimensional position data sequence of a human body key joint for prediction, constructing the graph data, and inputting the trained attention-based system fusion multi-flow graph neural network model to obtain a predicted value of the human body posture.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110539624.0A CN113642379B (en) | 2021-05-18 | 2021-05-18 | Human body posture prediction method and system based on attention mechanism fusion multi-flow diagram |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110539624.0A CN113642379B (en) | 2021-05-18 | 2021-05-18 | Human body posture prediction method and system based on attention mechanism fusion multi-flow diagram |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113642379A true CN113642379A (en) | 2021-11-12 |
CN113642379B CN113642379B (en) | 2024-03-01 |
Family
ID=78415817
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110539624.0A Active CN113642379B (en) | 2021-05-18 | 2021-05-18 | Human body posture prediction method and system based on attention mechanism fusion multi-flow diagram |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113642379B (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114419105A (en) * | 2022-03-14 | 2022-04-29 | 深圳市海清视讯科技有限公司 | Multi-target pedestrian trajectory prediction model training method, prediction method and device |
CN114926860A (en) * | 2022-05-12 | 2022-08-19 | 哈尔滨工业大学 | Three-dimensional human body attitude estimation method based on millimeter wave radar |
CN114943324A (en) * | 2022-05-26 | 2022-08-26 | 中国科学院深圳先进技术研究院 | Neural network training method, human motion recognition method and device, and storage medium |
CN115407874A (en) * | 2022-08-18 | 2022-11-29 | 中国兵器工业标准化研究所 | Neural network-based VR maintenance training operation proficiency prediction method |
CN115862338A (en) * | 2023-03-01 | 2023-03-28 | 天津大学 | Airport traffic flow prediction method, system, electronic device and medium |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2020258611A1 (en) * | 2019-06-28 | 2020-12-30 | 山东科技大学 | Lymph node ct detection system employing recurrent spatio-temporal attention mechanism |
CN112183862A (en) * | 2020-09-29 | 2021-01-05 | 长春理工大学 | Traffic flow prediction method and system for urban road network |
-
2021
- 2021-05-18 CN CN202110539624.0A patent/CN113642379B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2020258611A1 (en) * | 2019-06-28 | 2020-12-30 | 山东科技大学 | Lymph node ct detection system employing recurrent spatio-temporal attention mechanism |
CN112183862A (en) * | 2020-09-29 | 2021-01-05 | 长春理工大学 | Traffic flow prediction method and system for urban road network |
Non-Patent Citations (1)
Title |
---|
张志扬;张凤荔;陈学勤;王瑞锦;: "基于分层注意力的信息级联预测模型", 计算机科学, no. 06 * |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114419105A (en) * | 2022-03-14 | 2022-04-29 | 深圳市海清视讯科技有限公司 | Multi-target pedestrian trajectory prediction model training method, prediction method and device |
CN114926860A (en) * | 2022-05-12 | 2022-08-19 | 哈尔滨工业大学 | Three-dimensional human body attitude estimation method based on millimeter wave radar |
CN114943324A (en) * | 2022-05-26 | 2022-08-26 | 中国科学院深圳先进技术研究院 | Neural network training method, human motion recognition method and device, and storage medium |
CN114943324B (en) * | 2022-05-26 | 2023-10-13 | 中国科学院深圳先进技术研究院 | Neural network training method, human motion recognition method and device, and storage medium |
CN115407874A (en) * | 2022-08-18 | 2022-11-29 | 中国兵器工业标准化研究所 | Neural network-based VR maintenance training operation proficiency prediction method |
CN115862338A (en) * | 2023-03-01 | 2023-03-28 | 天津大学 | Airport traffic flow prediction method, system, electronic device and medium |
Also Published As
Publication number | Publication date |
---|---|
CN113642379B (en) | 2024-03-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113642379B (en) | Human body posture prediction method and system based on attention mechanism fusion multi-flow diagram | |
Irshad et al. | Hierarchical cross-modal agent for robotics vision-and-language navigation | |
Piergiovanni et al. | Learning real-world robot policies by dreaming | |
CN112906604A (en) | Behavior identification method, device and system based on skeleton and RGB frame fusion | |
CN111191630B (en) | Performance action recognition method suitable for intelligent interactive viewing scene | |
CN113239897B (en) | Human body action evaluation method based on space-time characteristic combination regression | |
CN115659275A (en) | Real-time accurate trajectory prediction method and system in unstructured human-computer interaction environment | |
CN113239892A (en) | Monocular human body three-dimensional attitude estimation method based on data enhancement architecture | |
CN112446253A (en) | Skeleton behavior identification method and device | |
CN113240714B (en) | Human motion intention prediction method based on context awareness network | |
Agarwal et al. | FitMe: a fitness application for accurate pose estimation using deep learning | |
Chowdhury et al. | Assessment of rehabilitation exercises from depth sensor data | |
Réby et al. | Graph transformer for physical rehabilitation evaluation | |
Ramachandruni et al. | Attentive task-net: Self supervised task-attention network for imitation learning using video demonstration | |
Hwang et al. | A deep learning approach for seamless integration of cognitive skills for humanoid robots | |
CN117037216A (en) | Badminton motion prediction method and device oriented to human skeleton | |
CN112153242A (en) | Virtual photography method based on camera behavior learning and sample driving | |
CN116030533A (en) | High-speed motion capturing and identifying method and system for motion scene | |
Dang et al. | Imitation learning-based algorithm for drone cinematography system | |
Mocanu et al. | Human activity recognition with convolution neural network using tiago robot | |
Postnikov et al. | Conditioned human trajectory prediction using iterative attention blocks | |
Aslan et al. | End-to-end learning from demonstation for object manipulation of robotis-Op3 humanoid robot | |
Ogata et al. | Prediction and imitation of other's motions by reusing own forward-inverse model in robots | |
Hristov et al. | Multi-view RGB-D System for Person Specific Activity Recognition in the context of holographic communication | |
Ko et al. | Imitative neural mechanism-based behavior intention recognition system in human–robot interaction |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |