CN114972441A

CN114972441A - Motion synthesis framework based on deep neural network

Info

Publication number: CN114972441A
Application number: CN202210735748.0A
Authority: CN
Inventors: 何方展; 薛鹏; 夏贵羽; 罗东; 张泽远
Original assignee: Nanjing University of Information Science and Technology
Current assignee: Nanjing University of Information Science and Technology
Priority date: 2022-06-27
Filing date: 2022-06-27
Publication date: 2022-08-30

Abstract

The invention relates to the technical field of computers, in particular to a motion synthesis framework based on a deep neural network, which comprises the following steps: preparing training data and standardizing joint coordinates; extracting a motion rule of the motion sequence; training a motion law extraction network; training a motion synthesis network to establish a relation between a motion sequence head frame and a motion sequence tail frame and a motion rule; generating a corresponding motion rule according to the head and tail frames; the invention is used for synthesizing real human body motion under the condition of giving a section of head and tail frames of the motion sequence, and is used for solving the problems of complex control and limited synthesis content of the existing motion synthesis method.

Description

Motion synthesis framework based on deep neural network

Technical Field

The invention relates to the technical field of computers, in particular to a motion synthesis framework based on a deep neural network.

Background

The motion data acquired by the capturing device can be used for studying the characteristics of human motion, such as motion pattern recognition and motion pattern tracking, and the like, and also derive other promising applications, including the fields of animation, robot driving, motion rehabilitation and the like. However, the motion capture is very expensive and is easily limited by the range of actor's performance, so the motion synthesis technique is an effective means to solve the problem of high cost of motion capture.

The existing motion synthesis algorithm faces two main problems, one is to avoid the non-professional operation of a user on the synthesis process, but reduce the coordination of the motion synthesis result, so that the content of the synthesis result is limited, the requirements of the user are difficult to meet, and the imagination is difficult to exert. The other direction is that users often need to have professional motion synthesis knowledge to successfully complete the motion synthesis task. The invention provides a motion synthesis framework based on a deep neural network, which builds a depth model to establish the relation between a head frame and a tail frame and a motion rule, synthesizes a corresponding motion sequence according to the given head frame and the given tail frame and enhances the controllability of motion synthesis.

Disclosure of Invention

The present invention is directed to a motion synthesis framework based on a deep neural network, so as to solve the problems mentioned in the background art.

The technical scheme of the invention is as follows: a motion synthesis framework based on a deep neural network comprises training data, joint coordinates, a motion rule of a motion sequence, a relation between the motion sequence and the motion rule and a relation between a head frame and a tail frame of the motion sequence and the motion rule, and the motion synthesis method of the motion synthesis framework comprises the following steps:

s1, preparing training data and normalizing joint coordinates: collecting a plurality of motion sequences with a single motion type as training data and converting the motion sequences into joint coordinates, then carrying out standardization processing on the joint coordinates, and taking the relative coordinates of each joint relative to a father joint of the joint as the characteristics of the joint;

s2, extracting the motion rule of the partial motion sequence involved in the S1: calculating the angle between the position of a certain joint at any moment and the initial frame, and taking the change curve of the angle as the motion rule of the motion sequence;

s3, training the depth network according to the standardized motion data and establishing the relation between the motion sequence and the motion law: taking a motion sequence and the motion rules extracted in the S2 as a training data pair, and adopting an LSTM-based deep network for training to construct the relationship between the motion sequence and the motion rules;

s4, extracting the motion rules of all motion sequences by using the motion rule extraction network related to S3;

s5, training the depth network according to the standardized motion data and establishing the relation between the first frame and the last frame of the motion sequence and the motion law: taking a motion sequence head and tail frame and the motion rule extracted in S4 as a training data pair, and adopting a depth network based on LSTM to train to construct the relationship between the motion head and tail frame and the motion rule;

s6, generating a corresponding motion rule, namely a polynomial coefficient, through the trained network according to the given head and tail frames in the S5;

and S7, according to the polynomial coefficient obtained in S6, the position of each joint at any time is obtained, and a complete motion sequence is synthesized.

Preferably, the position of each joint in the joint coordinates in S1 is represented by a three-dimensional vector

Expressing and normalizing it, wherein the coordinates are normalized

Is defined as:

。

preferably, the law of motion of the partial motion sequence involved in S1 is extracted in S2, and the law of motion is defined

Is the angle of the position of the current frame joint relative to the starting position, angle

Carrying out normalization processing, and setting the joint position of the initial frame to the joint position corresponding to the end frame as the positive direction;

angle of the joint position

The correspondence with the three-dimensional coordinates is expressed as:

wherein the content of the first and second substances,

and

respectively representing the position of each joint of the start frame and the end frame,

indicating the angular change of the end frame relative to the start frame; then by the least squares method:

obtaining an angle

With respect to time

The sequence of (a), namely:

。

preferably, in S3, the deep network is trained according to the normalized motion data and the relationship between the motion sequence and the motion law is established: the input is a motion sequence, the output is a polynomial coefficient corresponding to a motion law, the time sequence characteristics of the preprocessed motion sequence are extracted through a three-layer LSTM network, the motion law corresponding to the motion law is output through a full connection layer, and the corresponding loss function is expressed as follows:

wherein

Calculated joint angle for network in time

The value of (a) is,

for actual joint angle in time

The value of (a) is,

representing the number of sampling time points of the selected sequence,

representing the number of input motion sequences.

Preferably, in S4, the motion law extraction network referred to in S3 is used to extract the motion laws of all motion sequences:

the network input is a motion sequence, and the output is a joint angle

With respect to time

Is represented by

Wherein

Representing the number of sampling time points of the selected sequence,

representing the number of input motion sequences.

Preferably, in S5, the deep network is trained according to the normalized motion data and the association between the head and end frames and the motion law is established:

angle obtained in S2

With respect to time

Of (2)

The corresponding relation of which can be used as a function

Represents, i.e.:

said function

By a polynomial of order 5, using

Representing the coefficients of the joint point corresponding polynomial.

The number of the synthesis modules is consistent with that of the human joints, namely, a single module is responsible for feature extraction of a single joint motion rule, a given head frame and a given tail frame are input, and a required motion rule is output. The synthesis module comprises three layers of LSTM units, namely a batch normalization layer and a final full-connection layer, and the LSTM network is responsible for extracting characteristic information of a first frame and a last frame; the loss function for each synthesis module is represented as:

wherein

Denotes the first

The first and last frames of an input

At each of the sampling time points, the sampling time point,

represents the angle value extracted by the S4 motion law extraction network,

and expressing polynomial coefficients corresponding to the motion law generated by the motion synthesis network.

Preferably, in S6, a corresponding motion law, i.e. polynomial coefficient, is generated through the trained network according to the given motion head and end frames:

the network inputs the frame head and tail of the motion sequence, and outputs polynomial coefficients corresponding to the motion law, namely:

indicating the number of human joints.

Preferably, in S7, the human motion is synthesized according to the motion law. For the first

First of frame

Angle corresponding to each joint

It can be expressed as:

wherein the content of the first and second substances,

，

is the first

Polynomial coefficients of the individual joint motion curves; then converting the angle into corresponding three-dimensional coordinates according to the idea of spherical interpolation

The formula is as follows:

，

indicating the normalized coordinates of the first and last frames of a motion sequence,

is the first

The angle change from the starting frame to the ending frame of each joint; and when the normalized position of each joint is obtained, calculating the absolute position coordinate of each joint according to the structure of the human body and the length of the skeleton, and finally reconstructing the real human body motion.

The invention provides a motion synthesis framework based on a deep neural network by improving, compared with the prior art, the invention has the following improvements and advantages:

one is as follows: the motion synthesis framework based on the deep neural network is used for synthesizing real human body motion under the condition of giving a head frame and a tail frame of a motion sequence, and is used for solving the problems that the existing motion synthesis method is complex in control and limited in synthesis content;

the second step is as follows: the motion synthesis framework based on the deep neural network can generate natural intermediate motion according to the head and tail frames of the motion sequence provided by the user, not only ensures the convenience of operation, but also can synthesize rich motion content by controlling the head and tail frames;

and thirdly: the motion synthesis framework based on the deep neural network can be applied to a plurality of fields, wherein in the field of the film and television industry, the motion synthesis framework can be used for synthesizing 3D human body motion to drive virtual characters; in the field of robots, the motion synthesis framework can synthesize special actions to drive the humanoid robot; in the field of medical rehabilitation, the motion synthesis framework can be used for synthesizing the normal motion posture of a patient with dyskinesia so as to assist psychotherapy.

Drawings

The invention is further explained below with reference to the figures and examples:

FIG. 1 is a block flow diagram of the present invention;

FIG. 2 is a normalized graph of the present invention;

FIG. 3 is a diagram of a law of motion extraction network of the present invention;

fig. 4 is a diagram of a motion synthesis network of the present invention.

Detailed Description

The present invention is described in detail below, and technical solutions in the embodiments of the present invention are clearly and completely described, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The invention provides a motion synthesis framework based on a deep neural network through improvement, and the technical scheme of the invention is as follows:

as shown in fig. 1, a motion synthesis framework based on a deep neural network includes training data, joint coordinates, a motion rule of a motion sequence, a relationship between the motion sequence and the motion rule, and a relationship between a head frame and a tail frame of the motion sequence and the motion rule, and the motion synthesis method of the motion synthesis framework includes the following steps:

s4, extracting the motion rules of all motion sequences by using the motion rule related to S3;

and S7, according to the polynomial coefficient obtained in S6, obtaining the position of each joint at any time, thereby synthesizing a complete motion sequence.

Wherein the position of each joint in the joint coordinates in S1 is a three-dimensional vector

Expressed and normalized as shown in fig. 2, where the coordinates are normalized

Is defined as:

。

wherein, the motion rule of the partial motion sequence involved in the S1 is extracted from the S2, and the definition is defined

angle of joint position

The correspondence with the three-dimensional coordinates is expressed as:

wherein the content of the first and second substances,

and

obtaining an angle

With respect to time

The sequence of (a), namely:

。

in S3, the deep network is trained according to the normalized motion data, and the relationship between the motion sequence and the motion law is established: as shown in fig. 3, the input is a motion sequence, the output is a polynomial coefficient corresponding to a motion law, the time sequence feature of the preprocessed motion sequence is extracted through three layers of LSTM networks, the motion law corresponding to the motion sequence is output through a full connection layer, and the corresponding loss function is expressed as:

wherein

Calculated joint angle in time for the network

The value of (a) is,

for actual joint angle in time

The value of (a) is set to (b),

representing the number of sampling time points of the selected sequence,

representing the number of input motion sequences.

In the step S4, the motion law extraction network involved in the step S3 is used to extract the motion laws of all motion sequences:

the network input is a motion sequence and the output is a joint angle

With respect to time

Is represented by

Wherein

Indicating the number of sampling time points of the selected sequence,

representing the number of input motion sequences.

In S5, the deep network is trained according to the normalized motion data, and the association between the first and last frames and the motion law is established:

angle obtained in S2

With respect to time

Of (2) a

The corresponding relation of which can be used as a function

Represents, i.e.:

by a polynomial of order 5, using

Representing the coefficients of the joint point corresponding polynomial.

As shown in fig. 4, the number of the synthesis modules is consistent with that of the human joints, that is, a single module is responsible for feature extraction of a single joint motion law, and the input is a given head and tail frame, and the output is a required motion law. The synthesis module comprises three layers of LSTM units, namely a batch normalization layer and a final full-connection layer, and the LSTM network is responsible for extracting characteristic information of a first frame and a last frame; the loss function for each synthesis module is represented as:

wherein

Is shown as

The first and last frames of an input

At each of the sampling time points, the sampling time point,

represents the angle value extracted by the S4 motion law extraction network,

In S6, a corresponding motion law, that is, a polynomial coefficient, is generated through a trained network according to a given motion start frame and end frame:

indicating the number of human joints.

In S7, the human body motion is synthesized according to the motion law. For the first

First of frame

Angle corresponding to each joint

It can be expressed as:

wherein the content of the first and second substances,

，

is the first

Polynomial coefficients of individual joint motion curves; then converting the angle into corresponding three-dimensional coordinates according to the idea of spherical interpolation

The formula is as follows:

，

is the first

Claims

1. A motion synthesis framework based on a deep neural network is characterized in that: preparing training data and standardizing joint coordinates; extracting a motion rule of the motion sequence; training a motion law extraction network; training a motion synthesis network to establish a relation between a motion sequence head frame and a motion sequence tail frame and a motion rule; generating a corresponding motion rule according to the head and tail frames; converting the generated motion rule into the position of each joint at any moment, and synthesizing a complete motion sequence, wherein the motion synthesis method of the motion synthesis framework comprises the following steps:

s3, training the depth network according to the standardized motion data and establishing the relation between the motion sequence and the motion law: taking a motion sequence and the motion rule extracted in the S2 as a training data pair, and adopting LSTM-based deep network training to construct the relationship between the motion sequence and the motion rule;

s5, training the depth network by using the standardized motion data and establishing the relation between the first frame and the last frame of the motion sequence and the motion law: taking a motion sequence head and tail frame and the motion rule extracted in S4 as a training data pair, and adopting LSTM-based deep network training to construct the relationship between the motion head and tail frame and the motion rule;

s6, generating a corresponding motion law, namely a polynomial coefficient, through the trained network by using the given head and tail frames in the S5;

2. The deep neural network-based motion synthesis framework of claim 1, wherein: three-dimensional vector for position of each joint in the joint coordinates in S1

Expressing and normalizing it, wherein the coordinates are normalized

Is defined as follows:

。

3. the deep neural network-based motion synthesis framework of claim 1, wherein: s2, extracting the motion rule of the part motion sequence involved in S1, and defining

angle of the joint position

The correspondence with the three-dimensional coordinates is expressed as:

wherein the content of the first and second substances,

and

indicating the angular change of the end frame relative to the start frame; and solving by least squares:

obtaining an angle

With respect to time

The sequence of (a), namely:

。

4. the deep neural network-based motion synthesis framework of claim 1, wherein: in S3, the standardized motion data is used to train the depth network and establish the relationship between the motion sequence and the motion law, the input is the motion sequence, the output is the polynomial coefficient corresponding to the motion law, the preprocessed motion sequence first passes through the three-layer LSTM network to extract its time-sequence characteristics, and then passes through the full connection layer to output the corresponding motion law, and the corresponding loss function is expressed as:

wherein

Calculated joint angle in time for the network

The value of (a) is,

for actual joint angle in time

The value of (a) is,

representing the number of sampling time points of the selected sequence,

representing the number of input motion sequences.

5. The deep neural network-based motion synthesis framework of claim 1, wherein: in S4, the motion law extraction network involved in S3 is used to extract the motion laws of all motion sequences:

the network input is a motion sequence and the output is a joint angle

With respect to time

Is represented by

Wherein

Representing selected sequence of sampling time pointsThe number of the first and second groups is,

representing the number of input motion sequences.

6. The deep neural network-based motion synthesis framework of claim 1, wherein: and S5, training the deep network by using the standardized motion data and establishing the relation between the head and the tail frames and the motion law:

angle obtained in S2

With respect to time

Of (2) a

The corresponding relation of which can be used as a function

Represents, i.e.:

said function

By a polynomial of order 5, using

Coefficients representing the joint point corresponding polynomial;

the number of the synthesis modules is consistent with that of the human joints, namely, a single module is responsible for feature extraction of a single joint motion rule, a given head frame and a given tail frame are input, and a required motion rule is output.

7. The synthesis module comprises three layers of LSTM units, namely a batch normalization layer and a final full-connection layer, and the LSTM network is responsible for extracting characteristic information of a first frame and a last frame; the loss function for each synthesis module is represented as:

wherein

Denotes the first

The first and last frames of an input

At each of the sampling time points, the sampling time point,

represents the angle value extracted by the S4 motion law extraction network,

8. The deep neural network-based motion synthesis framework of claim 1, wherein: in S6, a corresponding motion law, i.e., polynomial coefficient, is generated through the trained network according to the given motion head and end frames:

indicating the number of human joints.

9. The deep neural network-based motion synthesis framework of claim 1, wherein: in S7, the human body motion is synthesized according to the motion law.

10. For the first

First of frame

Angle corresponding to each joint

It can be expressed as:

wherein the content of the first and second substances,

，

is the first

The formula is as follows:

，

is the first