CN111681321B - Method for synthesizing three-dimensional human motion by using cyclic neural network based on layered learning - Google Patents

Method for synthesizing three-dimensional human motion by using cyclic neural network based on layered learning Download PDF

Info

Publication number
CN111681321B
CN111681321B CN202010506080.3A CN202010506080A CN111681321B CN 111681321 B CN111681321 B CN 111681321B CN 202010506080 A CN202010506080 A CN 202010506080A CN 111681321 B CN111681321 B CN 111681321B
Authority
CN
China
Prior art keywords
motion
network
data
level
input
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010506080.3A
Other languages
Chinese (zh)
Other versions
CN111681321A (en
Inventor
周东生
郭重阳
杨鑫
张强
魏小鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dalian University
Original Assignee
Dalian University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dalian University filed Critical Dalian University
Priority to CN202010506080.3A priority Critical patent/CN111681321B/en
Publication of CN111681321A publication Critical patent/CN111681321A/en
Application granted granted Critical
Publication of CN111681321B publication Critical patent/CN111681321B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T19/00Manipulating 3D models or images for computer graphics
    • G06T19/20Editing of 3D images, e.g. changing shapes or colours, aligning objects or positioning parts
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Mathematical Physics (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Hardware Design (AREA)
  • Architecture (AREA)
  • Computer Graphics (AREA)
  • Processing Or Creating Images (AREA)

Abstract

The invention provides a cyclic neural network based on layered learning for three-dimensional human motion synthesis method, which comprises the following steps: training a model step and a test model step; the training model step comprises: constructing a low-layer motion information extraction network by adopting GRU units; establishing a high-level motion synthesis network by adopting a GRU network; the motion data with different styles are used as the input of a high-level motion synthesis network, the motion data skeleton characteristics are combined with the motion characteristics extracted by the low-level motion information extraction network to be used as the input, and the high-level motion synthesis network is trained to learn skeleton space-time relation information of motions with different styles; the first 30 frames of data of the exercise data in the test set are input into the high-level exercise synthesis network after training, and finally verification is carried out. The invention can be used for synthesizing motion conforming to an input track, generating transition motion between two different types of motion, and synthesizing motion sequences with different emotion styles.

Description

Method for synthesizing three-dimensional human motion by using cyclic neural network based on layered learning
Technical Field
The invention relates to the technical field of computer graphics and human body motion modeling, in particular to a method for synthesizing three-dimensional human body motion by using a cyclic neural network based on layered learning.
Background
The three-dimensional human motion capture device is a high-technology device for accurately measuring the motion condition of a human body in a three-dimensional space. Based on the techniques of methods such as multi-view video and computer graphics, the three-dimensional data of the joint points of the human body can be accurately obtained, and then the human body motion data set is reconstructed based on the human body topological structure. The data set has wide application value, and can be widely applied to the fields related to human motion analysis, such as computer animation, virtual reality, security monitoring, human-robot interaction and the like. On the other hand, due to the contradiction between the individuation of human body movement and the requirement of data generality, the reusability of human body movement data is limited. The acquisition of new data requires higher costs due to economic and time costs and other factors, and tends to create a large amount of redundancy for the same type of data. Therefore, how to effectively reuse existing motion data has become one of the key issues to be resolved in the academic and engineering fields.
The human motion synthesis technology synthesizes various new motions meeting the requirements based on the existing data set. The technology not only can solve the problem of data reuse taught by the technology, but also can break the hardware barrier problem in the field related to human motion analysis, and has important research value and significance. Meanwhile, in the typical representative fields of artificial intelligence such as natural human-robot interaction and automatic driving, the technology is taken as one of basic support technologies for understanding, analyzing and predicting human behaviors by machines, and has shown wider and more vitality research and application values and received attention of more and more researchers.
The human motion synthesis technology becomes one of research hotspots in academia and application fields due to high research difficulty, strong practicability and great commercial value. Representative methods of motion synthesis techniques mainly include three types: the method is based on optimization, the deep learning method and the reinforcement learning method. Although the optimization-based method can synthesize motion satisfying constraints, the modeling process is complex and it is difficult to process large datasets. Reinforcement learning-based methods, while capable of interacting with the external environment, are still limited by complex modeling processes and single forms of motion. In contrast, the deep learning-based approach is capable of handling large data sets of diverse motion patterns, and will be capable of encoding complex motion data into small, fixed-size networks. The above advantages make it a growing research hotspot in the field of motion synthesis. .
Disclosure of Invention
According to the technical problems, a cyclic neural network based on layered learning is provided for a three-dimensional human motion synthesis method. The invention mainly utilizes a cyclic neural network based on layered learning for three-dimensional human motion synthesis method, which comprises the following steps: training model step and test model step, its characterized in that: the training model step comprises the following steps:
step S11: constructing a low-layer motion information extraction network by adopting GRU units, wherein the network takes curvature and average speed information of each frame of a skeleton in a data set as input, and the network can output motion characteristics of each frame of a role after training;
step S12: establishing a high-level motion synthesis network by adopting a GRU network; combining the skeleton features in the data set with the motion features extracted in the step S11 as input, training the space-time relationship between the network learning motion sequences, and synthesizing the motion sequences following the user input track;
step S13: adopting motion data with different styles as input of a high-level motion synthesis network, combining the motion data skeleton characteristics with the motion characteristics extracted by the low-level motion information extraction network in the step S11 as input, training the high-level motion synthesis network to learn skeleton space-time relation information of motions with different styles, and synthesizing motion sequences with different styles, namely converting normal walking style motion data into emotion walking style data;
further, the test model step includes the steps of:
step S21: randomly screening out test data from the data set to serve as a test set, inputting the first 30 frame data of the motion data in the test set into a high-level motion synthesis network after training, synthesizing different types of motion sequences, comparing the accuracy of extracting motion information by the low-level motion information extraction network, and testing the performance of the high-level motion synthesis network after training by the joint distance error between the synthesized motion sequence and the real motion sequence and the effect of synthesizing animation;
step S22: in the style motion synthesis task, the effectiveness of the model is demonstrated by comparing the synthesized animation effects with different motion styles.
Further, in the step S11, a recurrent neural network model is acquired by:
training the input motion data into a GRU network, wherein the GRU network is as follows:
z t =σ(W z ·[h t-1 ,x t ])
r t =σ(W r ·[h t-1 ,x t ])
Figure GDA0002583507040000031
Figure GDA0002583507040000032
wherein σ represents a sigmoid activation function, tanh represents a tanh activation function, W represents a weight parameter, (·) represents a point multiplication, (×x) represents a matrix multiplication; z t Representing the state of the update door, wherein the update door is hidden according to the state h of the last step t-1 And hidden state after the update of the unit
Figure GDA0002583507040000033
Updating the final hidden state h of the GRU unit t ,W z Representing the weight, W, of the update gate in the GRU unit r Indicating reset gate r t The reset gate resets the hidden information of the unit according to the hidden state information of the previous step>
Figure GDA0002583507040000034
x t Input information representing the GRU units.
Still further, the low-level motion feature extraction:
first, the definition function is as follows:
Figure GDA0002583507040000035
wherein x= (X) 2 ,x 1 ),X∈R 2
Figure GDA0002583507040000036
Wherein q i Offset value, q, representing i-th frame role i+1 Offset value, q, representing i+1st frame role i ∈R 2 ,p i Representing the world coordinate positions of x and y axes of the root joint of the character in the ith frame, p i+1 Representing the x and y axis world coordinate positions, p, of the root joint of the character in the (i+1) th frame i ∈R 2
c i =f(q i );
Wherein c i Is a curvature feature for input;
Figure GDA0002583507040000037
wherein s is i Representing the instant speed of the character root joint after being subjected to a Gaussian filter; exp (i) denotes a gaussian filter function; sigma represents a parameter of a gaussian filter function;
Figure GDA0002583507040000041
where b represents the average speed of the character, L represents the length of the motion sequence, θ i Contact information indicating the foot; designating theta when the left foot contacts the ground i =2pi; designating theta when the right foot contacts the ground i =π;
d i ={cosδ i ,sinδ i };
Wherein d i Representing the motion orientation, delta, of each frame character i Euler direction angles of x and y axes are shown;
f i =||s i cosθ i ,s i sinθ i || 2
wherein f i Local speed characteristic quantity representing role, motion characteristic quantity f of human body step is calculated according to instant speed of root joint i
Figure GDA0002583507040000042
Wherein beta is i A motion parameter representing a complete sequence;
g i =f(β i+1 )-f(β i );
g i is the step frequency characteristic of the character calculated by using the difference method.
Further, the input features of the high-level motion synthesis network described in S12 may be expressed as:
Figure GDA0002583507040000043
wherein,,
Figure GDA0002583507040000044
representing the first characteristic in the control parameter of the ith frame, wherein theta is contact information of the foot joint;
the process of high-level network composition motion can be expressed as the formula:
x k+1 =P({x 1 ,E 1 ,T 1 },{x 2 ,E 2 ,T 2 },...,{x k ,E k ,T k },φ);
wherein T is i Skeleton height and average speed expressed as last moment character as additional conversion parameter, T i ∈R 2 The additional transition parameters at time T1 are their altitude and average speed, phi representing the training parameters of the network.
Compared with the prior art, the invention has the following advantages:
1) The invention can be used for synthesizing motion conforming to an input track, generating transition motion between two different types of motion, and synthesizing motion sequences with different emotion styles.
2) Compared with other methods, the error value of the synthesized motion sequence is lower, and the synthesized motion is more accurate.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings may be obtained according to the drawings without inventive effort to a person skilled in the art.
Fig. 1 is a schematic diagram of an overall framework of a cyclic neural network based on layered learning for a three-dimensional human motion synthesis method in the invention.
FIG. 2 is a diagram showing the accuracy of the low-level network extraction of motion information, wherein (a) kicks, (b) punches, (c) rolls, and (d) jogges.
FIG. 3 is a graph of error contrast of high-level network synthesized motion, (a) kicking, (b) punching a punch, (c) tumbling, and (d) jogging.
FIG. 4 is a schematic diagram of an animation effect of a composite kick motion; (a) is a true value; (b) is a method of the invention; (c) is a reference method.
FIG. 5 is a schematic diagram of an animation effect of a synthetic punch motion; (a) is a true value; (b) is a method of the invention; (c) is a reference method.
FIG. 6 is a schematic diagram of an animation effect of the synthesized roll motion; (a) is a true value; (b) is a method of the invention; (c) is a reference method.
FIG. 7 is a schematic diagram of an animation effect of a synthesized jogging motion; (a) is a true value; (b) is a method of the invention; (c) is a reference method.
Fig. 8 is a schematic diagram of the transitional effect of the motion of the composite kicking motion to running.
Fig. 9 is a schematic diagram of the motion transition effect of the synthesized rolling motion to running.
Fig. 10 is a schematic diagram of a network architecture for athletic style conversion.
FIG. 11 is a schematic diagram of the generation of different athletic styles. From top to bottom, (a) neutral, (b) angry, (c) depressed, (d) aged, and (e) pride.
Detailed Description
In order that those skilled in the art will better understand the present invention, a technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present invention without making any inventive effort, shall fall within the scope of the present invention.
It should be noted that the terms "first," "second," and the like in the description and the claims of the present invention and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the invention described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
As shown in fig. 1 to 11, the present invention provides a cyclic neural network based on hierarchical learning for three-dimensional human motion synthesis, comprising: training a model step and testing the model step.
As a preferred embodiment of the present application, the training model step in the present application includes the steps of:
step S11: constructing a low-layer motion information extraction network by adopting GRU units, wherein the network takes curvature and average speed information of each frame of a skeleton in a data set as input, and the network can output motion characteristics of each frame of a role after training;
step S12: establishing a high-level motion synthesis network by adopting a GRU network; and (3) combining the skeleton features in the data set with the motion features extracted in the step (S11) as input, training the network to learn the front-back space-time relationship between the motion sequences, and synthesizing the motion sequences following the user input tracks. In this application, training networks learn skeleton spatiotemporal relationship information with emotional motion, such as features like swing arm amplitude while walking frustrated. During the run phase, the network is able to convert the entered normal ambulatory data into emotional ambulatory data. Here, the posture characteristics at the time of exercise are mainly those in which the walking style is certainly different from that at the time of normal emotion, for example, the walking style is heavy at the time of depression, the swing arm amplitude becomes small, and the like.
Step S13: and (3) adopting motion data with different styles as input of a high-level motion synthesis network, combining the motion data skeleton characteristics with the motion characteristics extracted by the low-level motion information extraction network in the step (S11) as input, training the high-level motion synthesis network to learn skeleton space-time relationship information of motions with different styles, and synthesizing motion sequences with different styles, namely converting the motion data with normal walking styles into the walking style data with emotion.
Preferably, the test model step includes the steps of:
step S21: randomly screening out test data from the data set to serve as a test set, inputting the first 30 frame data of the motion data in the test set into a high-level motion synthesis network after training, synthesizing different types of motion sequences, comparing the accuracy of extracting motion information by the low-level motion information extraction network, and testing the performance of the high-level motion synthesis network after training by the joint distance error between the synthesized motion sequence and the real motion sequence and the effect of synthesizing animation;
step S22: in the style motion synthesis task, the effectiveness of the model is demonstrated by comparing the synthesized animation effects with different motion styles.
As a preferred embodiment of the present application, in the step S11, the recurrent neural network model is acquired by:
training the input motion data into a GRU network, wherein the GRU network is as follows:
z t =σ(W z ·[h t-1 ,x t ])
r t =σ(W r ·[h t-1 ,x t ])
Figure GDA0002583507040000071
Figure GDA0002583507040000072
wherein σ represents a sigmoid activation function, tanh represents a tanh activation function, W represents a weight parameter, (·) represents a point multiplication, (×x) represents a matrix multiplication; z t Representing the state of the update door, wherein the update door is hidden according to the state h of the last step t-1 And hidden state after the update of the unit
Figure GDA0002583507040000073
Updating the final hidden state h of the GRU unit t ,W z Representing the weight, W, of the update gate in the GRU unit r Indicating reset gate r t The reset gate resets the hidden information of the unit according to the hidden state information of the previous step>
Figure GDA0002583507040000081
x t Input information representing the GRU units.
The motion data input here may refer to both curvature and average speed information input by the low-level network and input data by the high-level network, because the low-level and high-level networks all use the GRU network, and this is only for explaining the network structure of the GRU.
As a preferred embodiment, the low-level motion feature extraction:
first, the definition function is as follows:
Figure GDA0002583507040000082
wherein x= (X) 2 ,x 1 ),X∈R 2
Figure GDA0002583507040000083
Wherein q i Offset value, q, representing i-th frame role i+1 Offset value, q, representing i+1st frame role i ∈R 2 ,p i Representing the world coordinate positions of x and y axes of the root joint of the character in the ith frame, p i+1 Representing the x and y axis world coordinate positions, p, of the root joint of the character in the (i+1) th frame i ∈R 2
c i =f(q i );
Wherein c i Is a curvature feature for input;
Figure GDA0002583507040000084
wherein s is i Representing the instant speed of the character root joint after being subjected to a Gaussian filter; exp (i) denotes a gaussian filter function; sigma represents a parameter of a gaussian filter function;
Figure GDA0002583507040000085
where b represents the average speed of the character, L represents the length of the motion sequence, θ i Contact information indicating the foot; when the left foot contacts the groundSpecify θ i =2pi; designating theta when the right foot contacts the ground i =π;
d i ={cosδ i ,sinδ i };
Wherein d i Representing the motion orientation, delta, of each frame character i Euler direction angles of x and y axes are shown;
f i =||s i cosθ i ,s i sinθ i ||2;
wherein f i Local speed characteristic quantity representing role, motion characteristic quantity f of human body step is calculated according to instant speed of root joint i
Figure GDA0002583507040000091
Wherein beta is i A motion parameter representing a complete sequence;
g i =f(β i+1 )-f(β i );
g i is the step frequency characteristic of the character calculated by using the difference method.
It will be appreciated that the input features of the high-level motion synthesis network described in step S12 may be expressed as:
Figure GDA0002583507040000092
wherein,,
Figure GDA0002583507040000093
representing the first characteristic in the control parameter of the ith frame, wherein theta is contact information of the foot joint;
the process of high-level network composition motion can be expressed as the formula:
x k+1 =P({x 1 ,E 1 ,T 1 },{x 2 ,E 2 ,T 2 },...,{x k ,E k ,T k },θ);
wherein T is i Represented asThe skeleton height and average speed of the character at the last moment are taken as additional conversion parameters, T i ∈R 2 The additional conversion parameters at time T1 are their altitude and average speed.
Example 1
As an embodiment of the present invention, the effect on the resultant human body movement can be further specifically explained by the following experiment:
experimental conditions:
1) The motion data set used in the experiment is composed of a CMU human motion capture data set comprising a plurality of online large motion databases including various running, walking, kicking, tumbling, etc. sequences of motions.
2) The programming platform used in the experiment was python3.6 and the deep learning framework was pytorch.
4) The server used in the experiment is configured as a Quadro K6000 graphic card, the memory is 12G, the processor model is Intel (R) Xeon (R) CPU E5-2620v3@2.40H, 64.0GB RAM, and the operating system is Ubuntu16.04LTS.
5) In the experiment, the accuracy of extracting the motion information by the low-level network is adopted to evaluate the performance of the low-level network, and the joint position error between the 60-frame motion sequence and the real motion sequence is generated to evaluate the effect of the synthesized motion;
the experimental contents are as follows:
the experiment is based on the method in the text and the method in the document [1], and is aimed at four representative actions of kicking, punching, rolling, jogging and the like, and the accuracy of extracting the motion information is analyzed and compared, and the result is shown in figure 2.
Table1 comparison of errors within 60 frames of motion feature extracted by Table1 using different methods
Figure GDA0002583507040000101
Figure GDA0002583507040000111
Next, joint position errors of the four motions synthesized (kicking, punching, tumbling, jogging) were compared by experiments. We compare the generated motion sequence with the joint positions of the real motion in the database to find the position error per frame, as shown in fig. 3.
Based on the method, a combined experiment based on two different types of motion data is also carried out, and the core of the combined experiment is to generate the motion data of the transition stage. Monotonically regular movements such as walking, running, etc., defined as simple forms of movement; and the violent and changeable actions such as turning over a bucket, jumping, kicking, etc. are called complex action forms. Combining both simple and complex forms of motion, whether the resultant transitional motion is natural, realistic and smooth, is one of the research difficulties. The method performs two experiments, the first being a combination of kicking and running. The second group is to combine the flip bucket with running.
The present method also contemplates using the methods herein for athletic style conversion. A task is defined as generating emotional motion data given a motion trajectory.
Analysis of experimental results:
as can be seen from table1, the present method compares the motion characteristics extracted by two methods to errors within 60 frames. The black bolded is a smaller error value, and as can be seen from the table, the motion features extracted by the method have smaller errors than the true value, and the step motion speed features extracted based on the method do not completely fit the true value, but are compared with the literature [1] The method has lower error and can embody the dynamic characteristic of the role in motion.
Experiments show that the method can generate a motion sequence with higher fitting degree with the original motion data aiming at motions with smaller character displacement change, such as kicking and punching motions. And adopt the literature [1] The same class of motion synthesized by the method of (a) has irregular rotation and sliding phenomena, which lead to insufficient smoothness and fluency of the overall motion, and the visualized animation sequence is qualitatively shown in fig. 4 and 5. The comparison animations all have the same time axis, wherein the first row of green is the real motion sequenceColumn, second row red is a motion sequence generated based on the method herein, third row black is literature-based [1] A framework sequence generated by the method of (a).
For the motion type with larger motion amplitude and complexity, such as rolling motion, as shown in fig. 6, the method can also generate motion data with higher fitting degree with the original data. But based on literature [1] The method of (a) does not perform well enough in generating such data and the generated motion data is ambiguous. As shown in the figure, the raw data input is a roll motion, but the motion generated is a jogging motion.
In addition, for movements of lesser magnitude and complexity, such as jogging datasets, as shown in FIG. 7, although the method and method [1] The method can generate smooth jogging motion, but the method has higher fitting degree with a true value, so that the error degree of the joint is lower and is more close to the characteristic of the true motion.
In the generated transition animation of the transition from the kicking action to the running action, the model can automatically generate a smooth transition frame without any manual editing process, as shown in the black box section in fig. 8. In the generated transition animation from the rolling action to the running action, the generated transition frame shows the state of the human body after the overturning bucket due to instable standing, and the lifelike details of the action show the process of spontaneously regulating the body balance state of the human body in motion, as shown in the black frame part of fig. 9. It should be noted that the above-mentioned synthetic actions are not included in the existing dataset, but are entirely new actions generated based on the methods herein.
In the case of providing neutral emotional motion data, the network can generate motion data with different emotions, such as an angry walking style, a frustrated walking style, an old walking style, and a pride walking style, as shown in fig. 11.
Reference to the literature
[1]D.Pavllo,C.Feichtenhofer,M.Auli,and D.Grangier,Modeling Human Motion with Quaternion-based NeuralNetworks,Springer.Int J Comput Vis.,pp.1-18,Oct.2019,10.1007/s11263-019-01245-6.
The foregoing embodiment numbers of the present invention are merely for the purpose of description, and do not represent the advantages or disadvantages of the embodiments.
In the foregoing embodiments of the present invention, the descriptions of the embodiments are emphasized, and for a portion of this disclosure that is not described in detail in this embodiment, reference is made to the related descriptions of other embodiments. In the several embodiments provided in the present application, it should be understood that the disclosed technology content may be implemented in other manners. Wherein the test method embodiments described above are merely illustrative.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and not for limiting the same; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some or all of the technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit of the invention.

Claims (2)

1. A cyclic neural network based on layered learning is used for the three-dimensional human motion synthesis method, comprising: training model step and test model step, its characterized in that:
the training model step comprises the following steps:
s11: constructing a low-layer motion information extraction network by adopting GRU units, wherein the network takes curvature and average speed information of each frame of a skeleton in a data set as input, and the network can output motion characteristics of each frame of a role after training;
the low-level motion feature extraction:
first, the definition function is as follows:
Figure QLYQS_1
wherein x= (X) 2 ,x 1 ),X∈R 2
Figure QLYQS_2
Wherein q i Offset value, q, representing i-th frame role i ∈R 2 ,p i Representing the world coordinate positions of x and y axes of the root joint of the character in the ith frame, p i+1 Representing the x and y axis world coordinate positions, p, of the root joint of the character in the (i+1) th frame i ∈R 2
c i =f(q i );
Wherein c i Is a curvature feature for input;
Figure QLYQS_3
wherein s is i Representing the instant speed of the character root joint after being subjected to a Gaussian filter; exp (i) denotes a gaussian filter function; sigma represents a parameter of a gaussian filter function;
Figure QLYQS_4
where b represents the average speed of the character, L represents the length of the motion sequence, θ i Contact information indicating the foot; designating theta when the left foot contacts the ground i =2pi; designating theta when the right foot contacts the ground i =π;
d i ={cosδ i ,sinδ i };
Wherein d i Representing the motion orientation, delta, of each frame character i Euler direction angles of x and y axes are shown;
f i =||s i cosθ i ,s i sinθ i || 2
wherein f i Local speed characteristic quantity representing role, motion characteristic quantity f of human body step is calculated according to instant speed of root joint i
Figure QLYQS_5
Wherein beta is i A motion parameter representing a complete sequence;
g i =f(β i+1 )-f(β i );
g i step frequency characteristics of the roles are calculated by using a difference method;
s12: establishing a high-level motion synthesis network by adopting a GRU network; combining the skeleton features in the data set with the motion features extracted in the step S11 as input, training the space-time relationship between the network learning motion sequences, and synthesizing the motion sequences following the user input track;
the input features of the high-level motion synthesis network described in S12 can be expressed as:
Figure QLYQS_6
wherein,,
Figure QLYQS_7
representing the first characteristic in the control parameter of the ith frame, wherein theta is contact information of the foot joint;
the process of high-level network composition motion can be expressed as the formula:
x k+1 =P({x 1 ,E 1 ,T 1 },{x 2 ,E 2 ,T 2 },...,{x k ,E k ,T k },φ);
wherein T is i Skeleton height and average speed expressed as last moment character as additional conversion parameter, T i ∈R 2 ,T 1 The additional conversion parameters at the moment are the self height and average speed, phi represents the training parameters of the network;
s13: adopting motion data with different styles as input of a high-level motion synthesis network, combining the motion data skeleton characteristics with the motion characteristics extracted by the low-level motion information extraction network in the step S11 as input, training the high-level motion synthesis network to learn skeleton space-time relation information of motions with different styles, and synthesizing motion sequences with different styles, namely converting normal walking style motion data into emotion walking style data;
the test model step comprises the following steps:
s21: randomly screening out test data from the data set to serve as a test set, inputting the first 30 frame data of the motion data in the test set into a high-level motion synthesis network after training, synthesizing different types of motion sequences, comparing the accuracy of extracting motion information by the low-level motion information extraction network, and testing the performance of the high-level motion synthesis network after training by the joint distance error between the synthesized motion sequence and the real motion sequence and the effect of synthesizing animation;
s22: in the style motion synthesis task, the effectiveness of the model is demonstrated by comparing the synthesized animation effects with different motion styles.
2. The hierarchical learning based recurrent neural network for three-dimensional human motion synthesis method according to claim 1, wherein in said step S11, a recurrent neural network model is obtained by:
training the input motion data into a GRU network, wherein the GRU network is as follows:
z t =σ(W z ·[h t-1 ,x t ])
r t =σ(W r ·[h t-1 ,x t ])
Figure QLYQS_8
Figure QLYQS_9
wherein σ represents a sigmoid activation function, tanh represents a tanh activation function, W represents a weight parameter, (·) represents a point multiplication, (×x) represents a matrix multiplication; z t Representing the state of the update door, wherein the update door is hidden according to the state h of the last step t-1 And hidden state after the update of the unit
Figure QLYQS_10
Updating the final hidden state h of the GRU unit t ,W z Representing the weight, W, of the update gate in the GRU unit r Indicating reset gate r t The reset gate resets the hidden information of the unit according to the hidden state information of the previous step
Figure QLYQS_11
x t Input information representing the GRU units.
CN202010506080.3A 2020-06-05 2020-06-05 Method for synthesizing three-dimensional human motion by using cyclic neural network based on layered learning Active CN111681321B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010506080.3A CN111681321B (en) 2020-06-05 2020-06-05 Method for synthesizing three-dimensional human motion by using cyclic neural network based on layered learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010506080.3A CN111681321B (en) 2020-06-05 2020-06-05 Method for synthesizing three-dimensional human motion by using cyclic neural network based on layered learning

Publications (2)

Publication Number Publication Date
CN111681321A CN111681321A (en) 2020-09-18
CN111681321B true CN111681321B (en) 2023-07-04

Family

ID=72434985

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010506080.3A Active CN111681321B (en) 2020-06-05 2020-06-05 Method for synthesizing three-dimensional human motion by using cyclic neural network based on layered learning

Country Status (1)

Country Link
CN (1) CN111681321B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114972441A (en) * 2022-06-27 2022-08-30 南京信息工程大学 Motion synthesis framework based on deep neural network

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109584345A (en) * 2018-11-12 2019-04-05 大连大学 Human motion synthetic method based on convolutional neural networks
CN110321833A (en) * 2019-06-28 2019-10-11 南京邮电大学 Human bodys' response method based on convolutional neural networks and Recognition with Recurrent Neural Network
CN110473284A (en) * 2019-07-29 2019-11-19 电子科技大学 A kind of moving object method for reconstructing three-dimensional model based on deep learning
CN111079928A (en) * 2019-12-14 2020-04-28 大连大学 Method for predicting human motion by using recurrent neural network based on antagonistic learning

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109584345A (en) * 2018-11-12 2019-04-05 大连大学 Human motion synthetic method based on convolutional neural networks
CN110321833A (en) * 2019-06-28 2019-10-11 南京邮电大学 Human bodys' response method based on convolutional neural networks and Recognition with Recurrent Neural Network
CN110473284A (en) * 2019-07-29 2019-11-19 电子科技大学 A kind of moving object method for reconstructing three-dimensional model based on deep learning
CN111079928A (en) * 2019-12-14 2020-04-28 大连大学 Method for predicting human motion by using recurrent neural network based on antagonistic learning

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Xiao Guo 等.Human Motion Prediction via Learning Local Structure Representations and Temporal Dependencies.arXiv.2019,全文. *
王鑫 等.基于神经网络的角色运动合成研究进展.《计算机科学》.2019,第46卷(第9期),全文. *

Also Published As

Publication number Publication date
CN111681321A (en) 2020-09-18

Similar Documents

Publication Publication Date Title
US11017577B2 (en) Skinned multi-person linear model
Tanco et al. Realistic synthesis of novel human movements from a database of motion capture examples
Liu et al. Investigating pose representations and motion contexts modeling for 3D motion prediction
CN110599573B (en) Method for realizing real-time human face interactive animation based on monocular camera
Yamane et al. Human motion database with a binary tree and node transition graphs
CN101241600B (en) Chain-shaped bone matching method in movement capturing technology
CN110310351A (en) A kind of 3 D human body skeleton cartoon automatic generation method based on sketch
Al Borno et al. Robust Physics‐based Motion Retargeting with Realistic Body Shapes
CN111681321B (en) Method for synthesizing three-dimensional human motion by using cyclic neural network based on layered learning
Ji Combining knowledge with data for efficient and generalizable visual learning
Mici et al. An incremental self-organizing architecture for sensorimotor learning and prediction
Alemi et al. Machine learning for data-driven movement generation: a review of the state of the art
CN114170353B (en) Multi-condition control dance generation method and system based on neural network
CN109584345B (en) Human motion synthesis method based on convolutional neural network
Zhou et al. 3D human motion synthesis based on convolutional neural network
Wang et al. A cyclic consistency motion style transfer method combined with kinematic constraints
Tian et al. Augmented Reality Animation Image Information Extraction and Modeling Based on Generative Adversarial Network
CN112949419A (en) Action recognition method based on limb hierarchical structure
Jia et al. A Novel Training Quantitative Evaluation Method Based on Virtual Reality
Ettehadi et al. Learning from demonstration: Generalization via task segmentation
CN116276956B (en) Method and device for simulating and learning operation skills of customized medicine preparation robot
CN114618147B (en) Taijiquan rehabilitation training action recognition method
Yin et al. One-shot SADI-EPE: a visual framework of event progress estimation
CN111951359A (en) Interactive motion control method and system based on neural network
Chen et al. Analysis of moving human body detection and recovery aided training in the background of multimedia technology

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant