CN113345061A - Training method and device for motion completion model, completion method and device, and medium - Google Patents

Training method and device for motion completion model, completion method and device, and medium Download PDF

Info

Publication number
CN113345061A
CN113345061A CN202110890084.0A CN202110890084A CN113345061A CN 113345061 A CN113345061 A CN 113345061A CN 202110890084 A CN202110890084 A CN 202110890084A CN 113345061 A CN113345061 A CN 113345061A
Authority
CN
China
Prior art keywords
frame
joint points
preset
vector
joint
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110890084.0A
Other languages
Chinese (zh)
Other versions
CN113345061B (en
Inventor
何雨龙
唐浩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chengdu Tishi infinite Technology Co.,Ltd.
Original Assignee
Chengdu Tishi Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chengdu Tishi Technology Co ltd filed Critical Chengdu Tishi Technology Co ltd
Priority to CN202110890084.0A priority Critical patent/CN113345061B/en
Publication of CN113345061A publication Critical patent/CN113345061A/en
Application granted granted Critical
Publication of CN113345061B publication Critical patent/CN113345061B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T15/003D [Three Dimensional] image rendering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T13/00Animation
    • G06T13/203D [Three Dimensional] animation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T19/00Manipulating 3D models or images for computer graphics
    • G06T19/20Editing of 3D images, e.g. changing shapes or colours, aligning objects or positioning parts

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Graphics (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Architecture (AREA)
  • Computer Hardware Design (AREA)
  • Processing Or Creating Images (AREA)

Abstract

The application provides a training method, a device, a completion method, equipment and a medium for an action completion model, wherein the training method for the model comprises the following steps: the method comprises the steps of collecting a large number of original motion numbers of human body motion, preprocessing the original motion numbers to obtain a data set, and training and optimizing a motion completion neural network model through the data set to obtain a trained motion completion neural network model. The trained motion completion neural network model can predict and calculate data of a large number of motion transition frames which are automatically generated, is suitable for scenes with long generation time, multiple transition frame numbers and complex motion, and the generated transition frames are real in motion, so that the labor cost required by the original interpolation scheme is reduced, the project quality is improved, and the project period is shortened. The model training device, the action completion method, the computer device and the computer readable storage medium provided by the application have the beneficial effects.

Description

Training method and device for motion completion model, completion method and device, and medium
Technical Field
The application relates to the field of neural network model training, in particular to a training method, a device, a completion method, equipment and a medium for an action completion model.
Background
In applications such as three-dimensional animation and games, three-dimensional virtual images are driven by animation data to form various movements such as walking, running, jumping, and dancing. The animation data may be manually produced by an animator or obtained by a motion capture device. Animation data captured by a motion capture device often cannot be used directly, and often needs to be modified manually by an animator in some unsatisfactory motion segments. In both cases, therefore, manual editing by an animator is required.
When the animator uses the animation software, a plurality of key frames are edited or modified, a plurality of key frames which are related front and back are appointed, and transition frames with the number of appointed frames in the middle are generated by a mathematical interpolation method carried by the animation software, so that a smooth action sequence group is obtained. However, this method is only suitable for operation transition with a short generation time, a small number of transition frames, and a simple operation. In the actual production process, an animator has to edit or modify a large number of key frames, and if the number of key frames is too small, the time interval is too short, or the action is too complex, the interpolation method can obtain unreal and unreasonable transition sequences. The existing scheme is used, the skill and experience of an animator are greatly depended, and the labor cost is further increased; because unreasonable transition sequences are easy to occur, and because the technical levels of personnel such as animators are uneven, the project quality is also influenced; meanwhile, the project period is also greatly increased due to the need of debugging and editing a large number of key frames.
Therefore, how to provide a solution to the above technical problems is a problem that needs to be solved by those skilled in the art.
Disclosure of Invention
The application aims to provide a training method of an action completion model, an action completion method, a training device of the action completion model, computer equipment and a computer readable storage medium, which can process complex, high-frequency and rapid action completion, and the completed action is real and rich. The specific scheme is as follows:
the application provides a training method of an action completion model, which comprises the following steps:
s101: acquiring a large amount of original motion data of human motion, performing data preprocessing according to a father node list corresponding to all preset joint points and a standard skeleton offset value list corresponding to all preset joint points relative to the father nodes to obtain 6D rotation vectors, speed vectors, four-dimensional Boolean vectors and 3D space coordinates of all joint points of each frame except a first frame as a data set, and dividing the data set into a training set and a verification set;
s102: establishing an action completion model, and setting a standard difference value as a preset standard difference value;
s103: extracting a preset batch number of action data batches from the training set as input data of an action completion model, inputting the action completion model to obtain an output result of the action completion model, wherein each action data batch comprises a preset group number of action sequence groups, each action sequence group comprises a past frame, a target frame and S transition frames, and S is a preset group number of uniform sampling values in a preset sampling range value;
s104: and calculating the total loss item of the system in each action sequence group according to the input data and the output result of the action completion model, and optimizing the action completion model according to the total loss item of the system to obtain the trained action completion model.
Preferably, the S101 includes:
acquiring 3D space coordinates of Hips joint points and Euler angles of all joint points of each frame in original motion data, and performing data format conversion on the Euler angles of all joint points of each frame to obtain a rotation matrix of all joint points of each frame;
reserving the first two column vectors of the rotation matrix of all the joint points of each frame and compressing the rotation matrix into a 6-dimensional vector to obtain 6D rotation vectors of all the joint points of each frame;
calculating to obtain the 3D space coordinates of all the joint points of each frame according to the 3D space coordinates of the Hips joint points of each frame, the rotation matrixes of all the joint points of each frame, a preset father node list corresponding to all the joint points and a preset standard skeleton offset value list of all the joint points relative to the corresponding father node, wherein the method is used for calculating the 3D space coordinates of all the joint points of each frame by taking the result of the product of the rotation matrixes of the joint points and the corresponding standard skeleton offset values of the joint points and then the sum of the product and the 3D space coordinates of the corresponding father nodes as the 3D space coordinates of the joint points;
subtracting the 3D space coordinates of all the joint points of the previous frame corresponding to the first frame from the 3D space coordinates of all the joint points of each frame except the first frame to obtain a difference value, and multiplying the difference value by the animation frame rate to obtain the velocity vectors of all the joint points of each frame except the first frame;
reading the velocity vectors of a preset first joint point, a preset second joint point, a preset third joint point and a preset fourth joint point of each frame except the first frame, respectively judging the magnitude relation between the velocity vector of each preset joint point and the preset velocity vector, generating velocity judgment Boolean values of the preset first joint point, the preset second joint point, the preset third joint point and the preset fourth joint point, and splicing the velocity judgment Boolean values to obtain a first Boolean vector of each frame except the first frame;
reading 3D space coordinates of a preset first joint point, a preset second joint point, a preset third joint point and a preset fourth joint point of each frame except the first frame, respectively judging the magnitude relation between the y coordinate value of each preset joint point and a preset length value, generating distance judgment Boolean values of the preset first joint point, the preset second joint point, the preset third joint point and the preset fourth joint point, and splicing the distance judgment Boolean values to obtain a second Boolean vector of each frame except the first frame;
performing AND operation on the first Boolean vector and the second Boolean vector of each frame except the first frame to obtain a four-dimensional Boolean vector of each frame except the first frame,
and saving the 6D rotation vectors, the 3D space coordinates, the speed vectors and the four-dimensional Boolean vectors of all the joint points of each frame except the first frame as a data set, wherein the data set with a first preset proportion is divided into a training set, and the data set with a second preset proportion is divided into a testing set.
Preferably, the motion completion model in S102 includes: the system comprises a first mapping sub-network, a second mapping sub-network, a third mapping sub-network, a gated neural network and 128 expert neural networks EPi, wherein the first mapping sub-network, the second mapping sub-network and the third mapping sub-network form a coding system, motion state quantities are coded onto 768-dimensional hidden vectors, the gated neural network receives the hidden vectors and makes decisions, different expert neural networks EPi and corresponding weights are selected to perform weighted fusion of hidden vector calculation, and the expert neural networks EPi calculate the hidden vectors to obtain motion state increment of the next frame.
Preferably, the S103 includes:
s1031: appointing a past frame as a current frame, acquiring the distance frame number of the current frame from a target frame, and initializing a transition frame number k = 1;
s1032: extracting 6D rotation vectors of all joint points of the current frame, speed vectors of all joint points of the current frame and four-dimensional Boolean vectors of the current frame, splicing the vectors into a first input vector, inputting a first mapping sub-network of the motion completion model, and outputting a first output vector Cs;
s1033: extracting 3D space coordinates of all joint points of a target frame and 6D rotation vectors of all joint points, subtracting the 3D space coordinates of all joint points of a current frame from the 3D space coordinates of all joint points of the target frame to obtain a first offset vector, subtracting the 6D rotation vectors of all joint points of the current frame from the 6D rotation vectors of all joint points of the target frame to obtain a second offset vector, splicing the first offset vector and the second offset vector into a second input vector, inputting the second mapping sub-network of the motion compensation model, and outputting a second output vector Co;
s1034: extracting 6D rotation vectors of all joint points of the target frame as third input vectors, inputting a third mapping sub-network of the motion completion model, and outputting third output vectors Ct;
s1035: splicing the first output vector Cs, the second output vector Co and the third output vector Ct into a vector D, inputting the vector D into a gated neural network of the motion completion model, outputting 128 weights Wi by the gated neural network, wherein the weights Wi are between 0 and 1 and correspond to 128 expert neural networks EPi, freezing the corresponding expert neural networks EPi if the weights are 0, and activating the corresponding expert neural networks EPi if the weights are not 0;
s1036: respectively inputting the vector D, the distance frame number and the standard deviation into an activated expert neural network EPi, outputting a posture increment Ti, i =0, 1, 2.. 127 calculated by each expert network, multiplying the Ti by a weight Wi to obtain a weighted posture increment Mi calculated by each expert network, adding the weighted posture increments Mi calculated by each expert network to obtain a posture increment Tk calculated by a kth transition frame, storing the posture increment Tk, subtracting 1 from the distance frame number, and numbering k = k +1 for the transition frame;
s1037: if the distance frame number is greater than 0, the next transition frame is designated as the current frame, the step returns to S1032, and if the distance frame number is 0, the step enters S1038;
s1038: and splicing the obtained attitude increment Tk of the S transition frames to obtain the results T of all the transition frames in the action sequence group, namely the output result of the action completion model.
Preferably, the S104 includes:
s1041: obtaining 6D rotation vector increment of all joint points of all transition frames predicted by the motion completion model, 3D space coordinate increment of Hips joint points of all predicted transition frames and four-dimensional Boolean vectors of all predicted transition frames from results T of all transition frames;
s1042: adding the 6D rotation vector increment of all the joint points of all the predicted transition frames to the 6D rotation vectors of all the joint points of the corresponding previous frame in the action sequence group to obtain the 6D rotation vectors of all the joint points of all the predicted transition frames, and calculating 2 norms of the difference values between the 6D rotation vectors of all the joint points of all the predicted transition frames and the 6D rotation vectors of all the joint points of corresponding transition frames in the action sequence group to obtain a first loss term;
s1043: adding the predicted 3D space coordinate increment of the Hips joint points of all the transition frames to the 3D space coordinate of the Hips joint point of the previous frame corresponding to the action sequence group to obtain the predicted 3D space coordinate of the Hips joint points of all the transition frames;
taking the first 3 values of the 6D rotation vectors of all the joint points of all the predicted transition frames as a vector x, taking the last 3 values as a vector y, standardizing the vector x to obtain an updated vector x, solving a cross product of the vector y and the updated vector x to obtain a vector z, standardizing the vector z to obtain an updated vector z, solving a cross product of the updated vector z and the updated vector x to obtain an updated vector y, and respectively taking the updated vector x, the updated vector y and the updated vector z as column vectors of a three-dimensional square matrix to obtain the rotation matrices of all the joint points of all the predicted transition frames;
calculating the 3D space coordinates of all the joint points of all the predicted transition frames according to the predicted 3D space coordinates of the Hips joint points of all the transition frames, the predicted rotation matrixes of all the joint points of all the transition frames, a preset father node list corresponding to all the joint points and a preset standard skeleton offset value list of all the joint points relative to the corresponding father nodes, wherein the method is used for calculating the 3D space coordinates of all the joint points of all the predicted transition frames by taking the result of the product of the rotation matrixes of the joint points and the corresponding standard skeleton offset values of the joint points and then the sum of the product and the 3D space coordinates of the corresponding father nodes as the 3D space coordinates of the joint points;
calculating 2 norms of differences between the predicted 3D space coordinates of all joint points of all transition frames and the 3D space coordinates of all joint points of the corresponding transition frames in the action sequence group to obtain a second loss term;
s1044: calculating 2 norms of differences between the predicted four-dimensional Boolean vectors of the transition frames and the four-dimensional Boolean vectors of the corresponding transition frames in the action sequence group to obtain third loss items;
s1045: in the action sequence group, summing the weights Wi of different samples of each expert neural network EPi to obtain the importance of each expert neural network EPi in the action sequence group, and calculating the square of the variance according to the importance of 128 expert neural networks EPi to obtain a fourth loss term;
s1046: and adding the first loss term, the second loss term, the third loss term and the fourth loss term to obtain a total loss term, optimizing the neural network parameters of the action completion model by using a preset optimizer according to the total loss term, verifying the action completion model by using the verification set, and selecting the neural network parameters which enable the loss function of the verification set to be convergent and have the lowest loss as the neural network parameters of the trained action completion model.
The application also provides an action completion method, which comprises the following steps:
s201: acquiring a target frame, at least two past frames, an animation frame rate, a standard deviation and a number of motion frames to be supplemented of motion data, and performing data preprocessing according to a father node list corresponding to all preset joint points and a standard skeleton offset value list corresponding to all preset joint points relative to the father nodes to obtain 6D rotation vectors of all joint points, 3D space coordinates of all joint points, speed vectors of all joint points, four-dimensional Boolean vectors, 6D rotation vectors of all joint points of the target frame and 3D space coordinates of all joint points of the current frame;
s202: loading a trained motion completion model, and inputting an animation frame rate and a standard deviation, wherein the trained motion completion model is obtained by training the motion completion model by using the training method;
s203: performing data splicing on the 6D rotation vectors of all joint points of the current frame, the 3D space coordinates of all joint points, the velocity vectors and the four-dimensional Boolean vectors of all joint points, the 6D rotation vectors of all joint points of the target frame and the 3D space coordinates of all joint points to obtain a first input vector, a second input vector and a third input vector;
s204: inputting the first input vector, the second input vector, the third input vector and the number of motion frames to be compensated into the trained motion compensation model, and outputting predicted 6D rotation vector increment of all joint points of all transition frames and predicted 3D space coordinate increment of Hips joint points of all transition frames;
s205: adding the 6D rotation vector increment of all the joint points of all the predicted transition frames to the 6D rotation vector increment of all the joint points of the current frame to obtain the 6D rotation vector of all the joint points of the next transition frame, storing the 6D rotation vector increment of all the joint points of all the predicted transition frames, adding the 3D space coordinate increment of the Hips joint points of all the predicted transition frames to the 3D space coordinate of the Hips joint points of the current frame to obtain the 3D space coordinate of the Hips joint points of the next transition frame, storing the 3D space coordinate increment of the Hips joint points of the next transition frame, and subtracting 1 from the number of the action frames to be supplemented;
s206: if the number of the action frames to be supplemented is greater than 0, calculating to obtain rotation matrixes of all joint points of the current frame according to 6D rotation vectors of all joint points of the next transition frame, calculating to obtain 3D space coordinates, speed vectors and four-dimensional Boolean vectors of all joint points of the current frame according to the 3D space coordinates of the Hips joint points of the next transition frame and the rotation matrixes of all joint points of the current frame, returning to S203, and if the number of the action frames to be supplemented is equal to 0, entering S207;
s207: and 6D rotation vectors of all the joint points of each transition frame and 3D space coordinates of the Hips joint points are obtained, and the 6D rotation vectors of all the joint points of each transition frame are converted into a preset data format, so that the motion completion data of all the transition frames are obtained.
The application also provides a training device of the motion completion model, which comprises:
the training data acquisition module is used for acquiring a large amount of original motion data of human motion, performing data preprocessing according to a father node list corresponding to all preset joint points and a standard skeleton offset value list corresponding to all preset joint points relative to the father nodes to obtain 6D rotation vectors, speed vectors and four-dimensional Boolean vectors of all joint points and 3D space coordinates of all joint points of each frame except a first frame as a data set, and dividing the data set into a training set and a verification set;
the network model establishing module is used for establishing an action completion model and setting a standard difference value as a preset standard difference value;
the network model training module is used for extracting a preset batch number of action data batches from a training set as input data of an action completion model, inputting the action completion model to obtain an output result of the action completion model, wherein each action data batch comprises a preset group number of action sequence groups, each action sequence group comprises a past frame, a target frame and S transition frames, and S is a preset group number of uniform sampling values in a preset sampling range value;
and the network model optimization module is used for calculating the total loss item of the system in each action sequence group according to the input data and the output result of the action completion model, and optimizing the action completion model according to the total loss item of the system to obtain the trained action completion model.
Preferably, the training data obtaining module includes:
the rotation matrix acquisition unit is used for acquiring the 3D space coordinates of the Hips joint points of each frame and the Euler angles of all the joint points in the original motion data, and performing data format conversion on the Euler angles of all the joint points of each frame to obtain the rotation matrix of all the joint points of each frame;
the rotation vector acquisition unit is used for reserving the first two column vectors of the rotation matrix of all the joint points of each frame and compressing the two column vectors into a 6-dimensional vector to obtain 6D rotation vectors of all the joint points of each frame;
the spatial coordinate acquisition unit is used for calculating the 3D spatial coordinates of all the joint points of each frame according to the 3D spatial coordinates of the Hips joint points of each frame, the rotation matrixes of all the joint points of each frame, a preset father node list corresponding to all the joint points and a preset standard skeleton offset value list corresponding to all the joint points relative to the father nodes, wherein the 3D spatial coordinates of all the joint points of each frame are obtained by adopting a method that the rotation matrixes of the joint points are multiplied by the corresponding standard skeleton offset values and then are summed with the 3D spatial coordinates of the corresponding father nodes to serve as the 3D spatial coordinates of the joint points;
the speed vector acquisition unit is used for subtracting the 3D space coordinates of all the joint points of the previous frame corresponding to the frame from the 3D space coordinates of all the joint points of each frame except the first frame to obtain a difference value, and multiplying the difference value by the animation frame rate to obtain the speed vectors of all the joint points of each frame except the first frame;
a boolean vector acquisition unit, configured to read velocity vectors of a preset first joint point, a preset second joint point, a preset third joint point, and a preset fourth joint point of each frame except for the first frame, respectively determine a magnitude relationship between the velocity vector of each preset joint point and the preset velocity vector, generate velocity determination boolean values of the preset first joint point, the preset second joint point, the preset third joint point, and the preset fourth joint point, and concatenate the velocity determination boolean values to obtain a first boolean vector of each frame except for the first frame; reading 3D space coordinates of a preset first joint point, a preset second joint point, a preset third joint point and a preset fourth joint point of each frame except the first frame, respectively judging the magnitude relation between the y coordinate value of each preset joint point and a preset length value, generating distance judgment Boolean values of the preset first joint point, the preset second joint point, the preset third joint point and the preset fourth joint point, and splicing the distance judgment Boolean values to obtain a second Boolean vector of each frame except the first frame; performing AND operation on the first Boolean vector and the second Boolean vector of each frame except the first frame to obtain a four-dimensional Boolean vector of each frame except the first frame,
and the data set dividing unit is used for saving the 6D rotation vectors, the 3D space coordinates, the speed vectors and the four-dimensional Boolean vectors of all the joint points of each frame except the first frame as data sets, wherein the data sets with a first preset proportion are divided into training sets, and the data sets with a second preset proportion are divided into testing sets.
The present application further provides a computer device, comprising:
a memory for storing a computer program;
and the processor is used for realizing the steps of the training method of the motion completion model when the computer program is executed.
The present application also provides a computer-readable storage medium, in which a computer program is stored, which, when being executed by a processor, carries out the steps of the training method of the above-mentioned motion completion model.
The application provides a training method of an action completion model, which comprises the following steps: acquiring a large amount of original motion data of human motion, preprocessing the original motion data to obtain a training set and a verification set, establishing a motion completion model, extracting data of the training set to form motion data batch input models for model training, calculating total loss items of the models in each motion sequence group, and optimizing the models according to the total loss items of the models to obtain the trained motion completion model.
Therefore, the method acquires a data set by acquiring the original motion number of a large amount of human motion and preprocessing the data set, trains and optimizes the motion completion model through the data set to obtain the trained motion completion model, can predict and calculate the data of a large amount of motion transition frames automatically based on the trained motion completion model, is suitable for scenes with long generation time, multiple transition frame numbers and complex motion, generates the transition frames with real motion, reduces the labor cost required by the original interpolation scheme, improves the project quality and shortens the project period. The application also provides an action completion method, a training device of the action completion model, computer equipment and a computer readable storage medium, which all have the beneficial effects and are not repeated herein.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, it is obvious that the drawings in the following description are only embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.
Fig. 1 is a flowchart of a training method of an action completion model according to an embodiment of the present disclosure;
fig. 2 is a schematic structural diagram of an action completion model according to an embodiment of the present disclosure;
fig. 3 is a schematic structural diagram of a first mapping sub-network according to an embodiment of the present disclosure;
fig. 4 is a schematic structural diagram of a second mapping sub-network according to an embodiment of the present disclosure;
fig. 5 is a schematic structural diagram of a third mapping sub-network according to an embodiment of the present disclosure;
fig. 6 is a schematic structural diagram of a gate control neural network provided in an embodiment of the present application;
fig. 7 is a schematic structural diagram of an expert neural network provided in an embodiment of the present application;
fig. 8 is a flowchart of an action completion method according to an embodiment of the present application;
fig. 9 is a flowchart of a data preprocessing method according to an embodiment of the present application;
fig. 10 is a schematic structural diagram of a training device for an action completion model according to an embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
In applications such as three-dimensional animation and games, three-dimensional virtual images are driven by animation data to form various movements such as walking, running, jumping, and dancing. The animation data may be manually produced by an animator or obtained by a motion capture device. Animation data captured by a motion capture device often cannot be used directly, and often needs to be modified manually by an animator in some unsatisfactory motion segments. In both cases, therefore, manual editing by an animator is required. When the animator uses the animation software, a plurality of key frames are edited or modified, a plurality of key frames which are related front and back are appointed, and transition frames with the number of appointed frames in the middle are generated by a mathematical interpolation method carried by the animation software, so that a smooth action sequence group is obtained. However, this method is only suitable for operation transition with a short generation time, a small number of transition frames, and a simple operation. In the actual production process, an animator has to edit or modify a large number of key frames, and if the number of key frames is too small, the time interval is too short, or the action is too complex, the interpolation method can obtain unreal and unreasonable transition sequences. The existing scheme is used, the skill and experience of an animator are greatly depended, and the labor cost is further increased; because unreasonable transition sequences are easy to occur, and because the technical levels of personnel such as animators are uneven, the project quality is also influenced; meanwhile, the project period is also greatly increased due to the need of debugging and editing a large number of key frames.
Based on the above problems, this embodiment provides a method for training an action completion model, which acquires a data set by preprocessing a large amount of original action numbers of human body movements, trains and optimizes the action completion model through the data set to obtain a trained action completion model, and the trained action completion model can predict and calculate data for automatically generating a large amount of action transition frames, so that the method is suitable for scenes with long generation time, multiple transition frame numbers and complex actions, and the generated transition frames act truly, thereby reducing the labor cost required by an original interpolation scheme, improving the quality of the project, and shortening the period of the project, and specifically refer to fig. 1, where fig. 1 is a flowchart of a method for training an action completion model provided by this embodiment of the present application, and specifically includes:
s101: acquiring a large amount of original motion data of human motion, performing data preprocessing according to a father node list corresponding to all preset joint points and a standard skeleton deviation value list corresponding to all preset joint points relative to the father nodes to obtain 6D rotation vectors, speed vectors, four-dimensional Boolean vectors and 3D space coordinates of all joint points of each frame except a first frame as a data set, and dividing the data set into a training set and a verification set.
The present embodiment does not limit the manner of collecting the original motion data of the human body motion, and it is all right if the motion data meeting the requirements of the present embodiment can be collected. For example, motion data of a large amount of human body motion can be acquired by a Vicon motion capture device, wherein the number of joint points of human bones is preset to be 33, and the frame rate is 25 FPS.
Specific articulation points include: {0, Hips }, {1, Spine }, {2, Spine1}, {3, Spine2}, {4, Spine3}, {5, hack }, {6, hack 1}, {7, Head }, {8, Head }, {9, right shot }, {10, RightArm }, {11, rightForeArm }, {12, RightHand }, {13, rightHandHandthu }, {14, right each expert network calculated weighted pose increment Middle }, {15, leftshot }, {16, LeftArm }, {17, leftForeArm }, {18, leftd }, {19, LeftHand }, {20, LembD calculated weighted pose } of each network, FoftForteArt }, {22, FoftDot }, {26, FoftDoftDot }, {23, LeftDoftDo }, {20, LeftDo }, {20, LeftDod network calculated weighted pose, FoftDoftDoftDo }, {26, LeftDoftDoftDo }, {23, LeftDoftDo }, {26, LeftDoftDoftDo }, lefttowebabaseend }. The data in the front of the parentheses is the serial number of the joint, and the English in the back is the name of the joint.
The rotation axis of each joint point is: {0, { X, Y, Z } }, {1, { X, Y, Z } }, {2, { X, Y, Z } }, {3, { X, Y, Z } }, {4, { X, Y, Z } }, {5, { X, Y, Z } }, {6, { X, Y, Z } }, {7, { X, Y, Z } }, {8, { }, {9, { X, Z } }, {10, { X, Y, Z } }, {11, { X, Y } }, {12, { X, Z } }, {13, { }, {14, { }, {15, { X, Z } }, {16, { X, Y, Z } }, {17, { X, Y } }, {18, { X, Z } }, {19, { }, {20, { X, Z } }, {21, { X, Z } }, X, Y } }, {22, { X, Z } }, {22, { X, Y } }, { X, Z } }, {7, {3, { X, Y, Z } }, {3, { X, Z } }, {3, { X, Z } }, {3, { X, Z } }, {3, { X, Z } }, {3, { X, Z } }, X, Z } }, X, Z } }, and {22, Z } }, Z }, and {22, {6, Z } }, X, Z }, and {22, {6, Z }, and {22, Z } }, Z }, and {22, {6, Z }, and {22, Z }, a unit, X, and {6, X, Z } }, {6, X, and {6, Z }, in a unit, Z }, respectively, {23, { X, Y, Z } }, {24, { }, {25, { X } }, {26, { }, {27, { X, Y, Z } }, {28, { X } }, {29, { X, Y, Z } }, {30, { }, {31, { X } }, {32, { }. Where 0-32 represent the serial numbers of the joint points, X, Y, Z represent the X-axis, Y-axis, and Z-axis, respectively.
Presetting a father node list corresponding to all the joint points as follows: {0, { }, {1, {0} }, {2, {1} }, {3, {2} }, {4, {3} }, {5, {4} }, {6, {5} }, {7, {8} }, {8, {7} }, {9, {4} }, {10, {9, {11, {10} }, {12, {11} }, {13, {12} }, {14, {12} }, {15, {4} }, {16, {15, {24, {18, }, {20, {18, }, {21, {0} }, {22, {21, {22, }, {22} }, {24, }, {23, {25, {26 } }, {26, }, {27} }, {28, }, {27} } {29, {28} }, {30, {29} }, {31, {30} }, {32, {31} }. Wherein, the former numbers 0-32 in the outer brackets represent the serial numbers of the joint points, and the numbers in the inner brackets represent the serial numbers of the parent nodes corresponding to the joint points.
Presetting a standard bone offset value list of all joint points relative to a corresponding father node as follows: {1, {0.0,10.973504172820446, -1.8905411922178885}, {2, {0.0,9.839155069029879,0.0} }, {3, {0.0,9.839155069029879,0.0} }, {4, {0.0,9.839155069029879,0.0} }, {5, {0.0,12.650342231609844,1.2650342231609846} }, {6, {0.0,5.627352749797471,0.0} }, {7, {0.0,5.627352749797471,0.0} }, {8, {0.0,9.727720099469252,0.0} }, {9, {0.0, 68653, 7.079303955478363} }, {10, {0.0,18.540949701045772,0.0} }, {11, {0.0,28.886655990727327,0.0} }, {12, {0.0,22.323240911467128,0.0} }, {13, {3.0643109737346546, 3.0643109737346546, {0, 3.0643109737346546, { 72, { 0.72, {3.0643109737346546, }, {0.0,8.673840303542688, -0.055601540407324915} }, {21, { -9.336602969020769,0.0,0.0} }, {22, {0.0, -41.79405063115827,0.0} }, {23, {0.0, -39.71819658810898,0.0} }, {24, {0.0, -3.5319294628875477,0.0} }, {25, {0.0,0.0,13.83215668621111} }, {26, {0.0,0.0,7.599936005436189} }, {27, {9.336602969020769,0.0,0.0} }, {28, {0.0, -41.79405063115827,0.0} }, {29, {0.0, -39.71819658810898,0.0} }, {30, {0.0, -3.5319294628875477,0.0} }, {31, {0.0,0.0,13.83215668621111} }, {32, {0.0,0.0, 0. 7.329371066117648 }.0. Wherein the first numbers 1-32 in the outer brackets represent the serial numbers of the joint points, and the data in the inner brackets represent the standard bone offset values, which are three-dimensional vectors.
In one implementable embodiment, the method of step S101 comprises:
s1011: and acquiring the 3D space coordinates of the Hips joint points and the Euler angles of all the joint points of each frame in the original motion data, and performing data format conversion on the Euler angles of all the joint points of each frame to obtain the rotation matrix of all the joint points of each frame.
The raw motion data includes 3D space coordinates of Hips joint points (root nodes) of each frame, and euler angles of all joint points, i.e., rotation is expressed by the euler angles. The euler angle data can be converted to a rotation matrix using a scipy mathematical tool library.
S1012: and reserving the first two column vectors of the rotation matrix of all the joint points of each frame and compressing the column vectors into a 6-dimensional vector to obtain 6D rotation vectors of all the joint points of each frame.
S1013: and calculating the 3D space coordinates of all the joint points of each frame according to the 3D space coordinates of the Hips joint points of each frame, the rotation matrixes of all the joint points of each frame, a preset father node list corresponding to all the joint points and a preset standard skeleton offset value list of all the joint points relative to the corresponding father node, wherein the 3D space coordinates of all the joint points of each frame are obtained by taking the result of the summation of the product of the rotation matrixes of the joint points and the corresponding standard skeleton offset values of the joint points and the 3D space coordinates of the corresponding father nodes as the 3D space coordinates of the joint points.
S1014: and subtracting the 3D space coordinates of all the joint points of the previous frame corresponding to the frame from the 3D space coordinates of all the joint points of each frame except the first frame to obtain a difference value, and multiplying the difference value by the animation frame rate to obtain the velocity vectors of all the joint points of each frame except the first frame.
Since the first frame data has no previous frame data, the velocity vectors of all the joint points of the first frame cannot be solved, and therefore only the velocity vectors of all the joint points of each frame except the first frame can be obtained.
S1015: reading the velocity vectors of a preset first joint point, a preset second joint point, a preset third joint point and a preset fourth joint point of each frame except the first frame, respectively judging the magnitude relation between the velocity vector of each preset joint point and the preset velocity vector, generating velocity judgment Boolean values of the preset first joint point, the preset second joint point, the preset third joint point and the preset fourth joint point, and splicing the velocity judgment Boolean values to obtain a first Boolean vector of each frame except the first frame; reading 3D space coordinates of a preset first joint point, a preset second joint point, a preset third joint point and a preset fourth joint point of each frame except the first frame, respectively judging the magnitude relation between the y coordinate value of each preset joint point and a preset length value, generating distance judgment Boolean values of the preset first joint point, the preset second joint point, the preset third joint point and the preset fourth joint point, and splicing the distance judgment Boolean values to obtain a second Boolean vector of each frame except the first frame; and the first Boolean vector and the second Boolean vector of each frame except the first frame are subjected to AND operation to obtain the four-dimensional Boolean vector of each frame except the first frame.
The preset first joint point, the preset second joint point, the preset third joint point and the preset fourth joint point are rightForeFoot, rightToeBaseEnd, leftForeFoot and leftToeBaseEnd respectively. The preset velocity vector is not limited here, and may be set according to specific needs, for example, may be set to 0.7 m/s. The preset length value is not limited and can be set according to specific needs, for example, can be set to 5 cm. Specifically, according to the velocity vectors of the rightForeFoot, rightToeBaseEnd, leftForeFoot and leftToeBaseEnd joint points of the current frame, if the velocity vector on the joint point is greater than or equal to 0.7m/s, the velocity judgment Boolean value is assigned to be 0; and if the speed is less than 0.7m/s, the speed judges that the Boolean value is assigned to be 1. And splicing the Boolean values of the four joint points to obtain a first Boolean vector. According to the 3D space coordinates of the RightForeFoot, RightToBaseEnd, LeftForeFoot and LeftToeBaseEnd of the current frame, if the y coordinate on the joint point is more than or equal to 5cm, the distance judgment Boolean value is assigned to be 0; and if the y coordinate is less than 5cm, the distance judgment Boolean value is assigned to be 1. And splicing the distance judgment Boolean values of the four joint points to obtain a second Boolean vector, and carrying out AND operation on the first Boolean vector and the second Boolean vector of each frame except the first frame to obtain the four-dimensional Boolean vector of each frame except the first frame.
S1016: and saving the 6D rotation vectors, the 3D space coordinates, the speed vectors and the four-dimensional Boolean vectors of all the joint points of each frame except the first frame as a data set, wherein the data set with a first preset proportion is divided into a training set, and the data set with a second preset proportion is divided into a testing set.
The data set is divided into a training set and a verification set, wherein the proportion of the training set accounts for 80 percent, and the proportion of the testing set accounts for 20 percent.
S102: and establishing an action completion model, and setting the standard difference value as a preset standard difference value.
The standard deviation value is set to be a preset standard deviation value, the preset standard deviation value sigma =0.01, and the standard deviation value is set to be 0.01, so that the variability and diversity of action completion can be ensured to be in an ideal range.
In an implementation manner, fig. 2 illustrates a motion provided by the embodiment of the present applicationA schematic representation of the structure of a completion model, the completion model comprising a first mapping subnetwork E1Second mapping sub-network E2A third mapping sub-network E3Gated neural network G and 128 expert neural networks EPi, i =1,2, 3.. 128, wherein a first mapping sub-network E1Second mapping sub-network E2And a third mapping sub-network E3And forming a coding system, coding the motion state quantity to a 768-dimensional hidden vector, receiving the hidden vector by a gated neural network G and making a decision, selecting different expert neural networks EPi and corresponding weights to perform weighted fusion of hidden vector calculation, and calculating the hidden vector by the expert neural networks EPi to obtain the motion state increment of the next frame so as to obtain all motion data of all transition frames.
The functions of the action completion model include: the system realizes a function of generating motion data of an intermediate transition frame from motion data of a past frame and motion data of a target frame. The advantages include: the system can automatically generate reasonable transition frame animation by a large data set and a deep learning method, thereby reducing the manual intervention of an animator, and can synthesize richer and more real animation by a simpler interpolation technology. The system has strong synthesis capability and robustness by adopting a mixed expert technology and adding a position information module. The system can increase the randomness of synthesis by adding the control information module, and the user controls the synthesis of the transition frame animation under different changing scenes, thereby ensuring the flexibility of the system under the production environment. The design of adding the position information module can accelerate the convergence of model training and make the generated transition frame animation smooth.
Specifically, fig. 3, fig. 4, and fig. 5 are structures of a first mapping sub-network, a second mapping sub-network, and a third mapping sub-network provided in an embodiment of the present application, respectively, in all neural network structures, middle brackets all indicate data dimensions, J indicates the size of a joint point, and J =33 in this embodiment. Full connection layer [ m, n ]]The full connection layer containing parameters has the formula as follows:
Figure 294369DEST_PATH_IMAGE001
wherein, in the step (A),matrix array
Figure 742668DEST_PATH_IMAGE002
And
Figure 53563DEST_PATH_IMAGE003
are the model parameters. The equation for linear rectification with leakage is:
Figure 714352DEST_PATH_IMAGE004
wherein, in the step (A),λ=0.01。
the first, second and third mapping sub-networks each include a fully connected layer and a band leakage linear rectification. Fig. 3-5 are both a parameterized, learnable combinatorial function from input to output. The function is to map the motion input data onto a hidden space of dimension 256. The working principle provides linear mapping for the full connection layer, and nonlinear mapping is provided for the band leakage linear rectification. The method has the advantages of being beneficial to extracting data features and model training of the neural network.
Specifically, in fig. 3, the input data [ J × 9+4] represents that the dimension of the input data is (J × 9+ 4), where J =33, the dimension of the input vector is matched with the input dimension of the model, the fully connected layer [ J × 9+4,1024] represents that the fully connected layer transforms the (J × 9+ 4) dimensional data into 1024 dimensional data, the fully connected layer [1024, 256] represents that the fully connected layer transforms the 1024 dimensional data into 256 dimensional data, and the output data [256] represents that the output data is a hidden space with the dimension of (256); in fig. 4, input data [ J × 9] represents that the dimension of the input data is (J × 9) dimension, where J =33, the dimension of the input vector and the input dimension of the model are matched, a fully connected layer [ J × 9,1024] represents that the fully connected layer transforms (J × 9) dimension data into 1024 dimension data, a fully connected layer [1024, 256] represents that the fully connected layer transforms 1024 dimension data into 256 dimension data, and output data [256] represents that the output data is a hidden space with dimension (256); in fig. 5, input data [ J × 6] represents that the dimension of the input data is (J × 6) dimension, where J =33, the dimension of the input vector and the input dimension of the model are matched, a fully connected layer [ J × 6] represents that the fully connected layer transforms (J × 6) dimension data into 1024 dimension data, a fully connected layer [1024, 256] represents that the fully connected layer transforms 1024 dimension data into 256 dimension data, and output data [256] represents that the output data is hidden space with dimension (256); where 1024 is the width of the fully-connected layer commonly used in deep learning, and is an empirical value, and is generally an index of 2, such as 512, 1024, 2048, and the like. Where 1024 is the width of the fully-connected layer commonly used in deep learning, and is an empirical value, and is generally an index of 2, such as 512, 1024, 2048, and the like.
Fig. 6 is a schematic structural diagram of a gate control neural network according to an embodiment of the present application, where a function representation of the noise adding module in fig. 6 is:
Figure 309150DEST_PATH_IMAGE005
where i represents the ith element of each vector; "standard dnormal" means gaussian noise with mean 0 and standard deviation of 1; the function of Softplus is expressed as:
Figure 397192DEST_PATH_IMAGE006
and the notation denotes the addition of vectors by element. The maximum K values are retained, in this embodiment the transition frame number K =16, whose functional representation is:
Figure 726542DEST_PATH_IMAGE007
in normalization, the function used is Softmax:
Figure 738492DEST_PATH_IMAGE008
the gated neural network comprises a full connection layer, a noise adding module, a maximum K value and normalization. FIG. 8 is a parameterized, learnable combinatorial function from input to output. The functions include: the learnable gated neural network G reads the motion features in the hidden space with the dimension 768, and obtains 128 sparse (most of the weights are 0) weights, which correspond to 128 different expert networks EPi, respectively. The gated neural network G is used for learning and judgment, which expert network should select for subsequent data processing. The working principle is as follows: in a hidden space with the dimensionality of 768, a full-connection layer is used for carrying out linear transformation on input, and 128 sparse weights are obtained. In addition, a parallel full link layer linearly transforms the input for noise addition. After preserving the maximum K values and normalizing, 128 sparse weights can be obtained, and it is guaranteed that all components are added up to 1. The advantages include: based on the thought of 'professional expertise', different expert networks are selected for calculation according to different action states, and the modeling capacity of the system can be greatly improved through the module; because the role animation has many complex changes, the module can disambiguate different action changes, thereby being beneficial to the model training of the system; the module can help the system to improve robustness because it does not rely on an expert neural network, but rather uses a hybrid expert model.
Fig. 7 is a schematic structural diagram of an expert neural network provided in an embodiment of the present application, and the adding location information module in fig. 7 is dependent on a value of a system input s, and functions to add a location vector with a dimension size of 256. Known function
Figure 204108DEST_PATH_IMAGE009
Figure 728630DEST_PATH_IMAGE010
Get it
Figure 748539DEST_PATH_IMAGE011
And generating a position vector:
Figure 268906DEST_PATH_IMAGE012
the function of the module is defined as a function
Figure 323450DEST_PATH_IMAGE013
Figure 284453DEST_PATH_IMAGE014
The adding control information module depends on values of system input s and sigma, and the adding control information module is used for adding random vectors with dimension size of 256. For a normal distribution with a mean of 0 and a standard deviation of sigma, 256 samples are taken as a random vector vr ("sigma"). The function of the module is defined as a function
Figure 57237DEST_PATH_IMAGE015
Figure 614251DEST_PATH_IMAGE016
. The functions of the expert neural network include: the learnable expert neural network EPi reads the action characteristics on the hidden space with the dimensionality of 768 and outputs the action state variable quantity of the next frame. The state change amount is added to the operation state of the current frame to obtain the operation state of the next frame. From this iterative calculation, a completed motion animation can be obtained. The working principle is as follows: the position information module is added, and the system can identify the position of the current frame by introducing the frame number s of the distance target frame, so that the convergence of model training is facilitated. The control information adding module can control the variability of the completion animation by setting sigma by a user and adding noise in a hidden space. The advantages include: the module automatically completes reasonable action data frames by receiving the feature vectors of the hidden space; by adding the control information module, the system is changed from a deterministic system to a stochastic system, so that a user can generate different reasonable action data frames. And the design of adding a control information module ensures that the system is a random model by adding control information into the hidden vector. By adopting the method, the diversity of system generation actions can be increased.
Specifically, in fig. 7, the input data a, the input data B, and the input data C are output data of the first mapping sub-network, the second mapping sub-network, and the third mapping sub-network, respectively, a fully-connected layer [768, 1024] represents that the fully-connected layer transforms 768-dimensional data into 1024-dimensional data, a fully-connected layer [1024, J9 +4] represents that the fully-connected layer transforms 1024-dimensional data into (J9 + 4) -dimensional data, and output data [ J6 +7] represents that the output data has a dimension of (J6 + 7), J = 33; where 1024 is the width of the fully-connected layer commonly used in deep learning, and is an empirical value, and is generally an index of 2, such as 512, 1024, 2048, and the like. Where 1024 is the width of the fully-connected layer commonly used in deep learning, and is an empirical value, and is generally an index of 2, such as 512, 1024, 2048, and the like.
By adopting gated neural networks and expert neural network techniques, it can be applied to large motion capture datasets. The system can automatically select the most suitable expert neural network for hybrid calculation according to different action states, and the robustness and performance of the model are effectively improved.
S103: and extracting a preset batch number of action data batches from the training set as input data of the action completion model, inputting the action completion model to obtain an output result of the action completion model, wherein each action data batch comprises a preset group number of action sequence groups, each action sequence group comprises a past frame, a target frame and S transition frames, and S is a preset group number of uniform sampling values in a preset sampling range value.
In this embodiment, the preset batch is not limited, the manner of the action sequence group is not limited, and the preset sampling range value is not limited, and the manner of extracting the action data in the training set may be designed according to specific requirements. The preset data extraction method adopted by the embodiment may be as follows: the data of the training set is divided into different action sequence groups according to batches and input into the action completion model, for example, the number of batches is determined according to the data volume of the data set, each batch comprises 32 action sequence groups, each action sequence group can comprise 1 past frame, S transition frames and 1 target frame, wherein S can be a uniform sampling value of 5 to 100. During training, model training is carried out on the input models in batches.
Specifically, a neural network forward operation is performed, and data of the training set is extracted as input of the motion completion model to obtain output of the motion completion model. The method for extracting the data features of the data set comprises the following steps: first mapping sub-network E using motion completion model1Second mapping sub-network E2And a third mapping sub-network E3The three motion input data are mapped to a hidden space with a dimension of 256. The first input vector represents the action state of the current frame; the second input vector represents the deviation between the current frame action state and the target frame action state; the third input vector represents the target frameThe operation state of (1).
In one implementation, the method of step 103 includes:
s1031: and designating a past frame as a current frame, acquiring the distance frame number of the current frame from a target frame, and initializing a transition frame number k = 1.
S1032: extracting the 6D rotation vectors of all the joint points of the current frame, the speed vectors of all the joint points of the current frame and the four-dimensional Boolean vectors of the current frame, splicing into a first input vector, inputting a first mapping sub-network of the motion completion model, and outputting a first output vector Cs.
S1033: extracting the 3D space coordinates of all the joint points of the target frame and the 6D rotation vectors of all the joint points, subtracting the 3D space coordinates of all the joint points of the current frame from the 3D space coordinates of all the joint points of the target frame to obtain a first offset vector, subtracting the 6D rotation vectors of all the joint points of the current frame from the 6D rotation vectors of all the joint points of the target frame to obtain a second offset vector, splicing the first offset vector and the second offset vector into a second input vector, inputting the second mapping sub-network of the motion compensation model, and outputting a second output vector Co.
S1034: and extracting the 6D rotation vectors of all the joint points of the target frame as third input vectors, inputting a third mapping sub-network of the motion completion model, and outputting third output vectors Ct.
S1035: splicing the first output vector Cs, the second output vector Co and the third output vector Ct into a vector D, inputting the vector D into a gated neural network of the motion completion model, outputting 128 weights Wi by the gated neural network, wherein the value of the weight Wi is between 0 and 1, corresponding to 128 expert neural networks EPi, freezing the corresponding expert neural networks EPi if the weight is 0, and activating the corresponding expert neural networks EPi if the weight is not 0.
Wherein, expert weight calculation is carried out through the gated neural network G. And splicing the three 256-dimensional first output vectors Cs, second output vectors Co and third output vectors Ct to obtain a vector D, using the vector D as the input of the gated neural network G, and outputting to obtain sparse expert weights Wi. That is, the gated neural network G outputs a sparse weight Wi of not less than 0, whose dimension size is N = 128. The first output vector Cs represents: the implicit vector of the current frame attitude information, the second vector Co represents: the hidden vector of the offset information of the current frame and the target frame, and the third output vector Ct represents: and (4) a hidden vector of the target frame attitude information.
S1036: respectively inputting the vector D, the distance frame number and the standard deviation into an activated expert neural network EPi, outputting the attitude increment Ti, i =0, 1, 2.. 127 calculated by each expert network, multiplying the Ti by a weight Wi to obtain a weighted attitude increment Mi calculated by each expert network, adding the weighted attitude increments Mi calculated by each expert network to obtain an attitude increment Tk calculated by the kth transition frame, storing the attitude increment Tk, subtracting 1 from the distance frame number, and numbering k = k +1 for the transition frame.
The action completion model comprises 128 expert neural networks EPi with the same structure, the dimension of the weight corresponds to the expert neural networks EPi with different serial numbers, if the weight is 0, the corresponding expert neural network is frozen, and if the weight is more than 0, the expert neural network is activated.
And selecting a plurality of expert neural networks EPi for weighted calculation according to the 128 sparse weights Wi by using 128 expert neural network EPi calculation results, wherein the input data is a vector D, and finally, each calculation of the attitude increment Tk calculated by the kth transition frame to obtain the result can obtain the attitude increment Tk calculated by the kth transition frame, and the number k =1, 2.
S1037: if the distance frame number is greater than 0, the next transition frame is designated as the current frame, the step returns to S1032, and if the distance frame number is 0, the step enters S2038;
wherein, steps S1023 to S1027 are repeated S times and S1023 to S1027 to find the pose increment Tk calculated by the k-th transition frame.
S1038: and splicing the obtained attitude increment Tk of the S transition frames to obtain the results T of all the transition frames in the action sequence group, namely the output result of the action completion model.
S104: and calculating the total loss item of the system in each action sequence group according to the input data and the output result of the action completion model, and optimizing the action completion model according to the total loss item of the system to obtain the trained action completion model.
In one implementation, the method of step S104 includes:
s1041: and 6D rotation vector increment of all joint points of all transition frames predicted by the motion completion model, 3D space coordinate increment of Hips joint points of all predicted transition frames and four-dimensional Boolean vectors of all predicted transition frames are obtained from the results T of all transition frames.
Wherein the result T of obtaining all transition frames can be divided into 6D rotation vector increments of all joint points of all transition frames predicted by the motion completion model, 3D spatial coordinate increments of Hips joint points of all transition frames predicted, and four-dimensional boolean vectors of all transition frames predicted.
S1042: and adding the 6D rotation vector increment of all the joint points of all the predicted transition frames to the 6D rotation vectors of all the joint points of the corresponding previous frame in the action sequence group to obtain the 6D rotation vectors of all the joint points of all the predicted transition frames, and calculating the 2 norm of the difference between the 6D rotation vectors of all the joint points of all the predicted transition frames and the 6D rotation vectors of all the joint points of corresponding transition frames in the action sequence group to obtain a first loss term.
S1043: adding the predicted 3D space coordinate increment of the Hips joint points of all the transition frames to the 3D space coordinate of the Hips joint point of the previous frame corresponding to the action sequence group to obtain the predicted 3D space coordinate of the Hips joint points of all the transition frames;
taking the first 3 values of the 6D rotation vectors of all the joint points of all the predicted transition frames as a vector x, taking the last 3 values as a vector y, standardizing the vector x to obtain an updated vector x, solving a cross product of the vector y and the updated vector x to obtain a vector z, standardizing the vector z to obtain an updated vector z, solving a cross product of the updated vector z and the updated vector x to obtain an updated vector y, and respectively taking the updated vector x, the updated vector y and the updated vector z as column vectors of a three-dimensional square matrix to obtain the rotation matrices of all the joint points of all the predicted transition frames; and calculating the 3D space coordinates of all the joint points of all the predicted transition frames according to the predicted 3D space coordinates of the Hips joint points of all the transition frames, the predicted rotation matrixes of all the joint points of all the transition frames, a preset father node list corresponding to all the joint points and a preset standard skeleton offset value list of all the joint points relative to the corresponding father nodes, wherein the method is used for calculating the 3D space coordinates of all the joint points of all the predicted transition frames by taking the result of the product of the rotation matrixes of the joint points and the corresponding standard skeleton offset values of the joint points and then the sum of the product and the 3D space coordinates of the corresponding father nodes as the 3D space coordinates of the joint points. Calculating 2 norms of differences between the predicted 3D space coordinates of all joint points of all transition frames and the 3D space coordinates of all joint points of the corresponding transition frames in the action sequence group to obtain a second loss term;
s1044: calculating 2 norms of differences between the predicted four-dimensional Boolean vectors of the transition frames and the four-dimensional Boolean vectors of the corresponding transition frames in the action sequence group to obtain third loss items;
s1045: in the action sequence group, summing the weights Wi of different samples of each expert neural network EPi to obtain the importance of each expert neural network EPi in the action sequence group, and calculating the square of the variance according to the importance of 128 expert neural networks EPi to obtain a fourth loss term;
s1046: and adding the first loss term, the second loss term, the third loss term and the fourth loss term to obtain a total loss term, optimizing the neural network parameters of the action completion model by using a preset optimizer according to the total loss term, verifying the action completion model by using the verification set, and selecting the neural network parameters which enable the loss function of the verification set to be convergent and have the lowest loss as the neural network parameters of the trained action completion model.
In the iteration of the action sequence group of each batch, according to the total loss item, an Adam optimizer can be used for optimizing the neural network parameters. Through continuous iteration, the loss function of the verification set is converged, namely the loss of the verification set is not reduced after 3 times of continuous traversal of the training set, and the neural network parameter with the lowest loss of the verification set is selected as the neural network parameter of the trained motion completion model.
Specifically, the design of the loss function helps the system to learn from the data set, resulting in a robust neural network model. The first loss function, the second loss function and the third loss function guarantee the precision of the calculation result; the fourth loss function guarantees load balancing of the expert neural network.
The embodiment provides an action completion method, which performs preprocessing and splicing by acquiring data of a past frame and a target frame, predicts and calculates data of a large number of action transition frames based on a trained action completion model, and is suitable for scenes with long generation time, many transition frame numbers and complex actions, the generated transition frames have real actions, the labor cost required by an original interpolation scheme is reduced, the project quality is improved, and the project period is shortened, specifically refer to fig. 8, where fig. 8 is a flowchart of the action completion method provided by the embodiment of the present application, and specifically includes:
s201: acquiring a target frame, at least two past frames, an animation frame rate, a standard deviation and a motion frame number to be supplemented of motion data, and performing data preprocessing according to a father node list corresponding to all preset joint points and a standard skeleton offset value list corresponding to all preset joint points relative to the father node to obtain 6D rotation vectors of all joint points, 3D space coordinates of all joint points, speed vectors of all joint points, four-dimensional Boolean vectors, 6D rotation vectors of all joint points of the target frame and 3D space coordinates of all joint points of the current frame.
The purpose of the past frame is to determine the current role action state, which refers to a frame before a transition frame to be supplemented, and the target frame is specified by a user and is used for meeting the requirements of the user, and is usually the action state modified by the user. That is, the target frame is the action state that the character should reach after the user wants to join the transition frame. The reason that at least two past frames of data are required here is that at least two frames of data are required to calculate the velocity vectors between frames because the velocity vectors of all the joint points of the current frame are calculated. The current frame here refers to the frame preceding the transition frame to be supplemented.
In this embodiment, the number of past frames is not limited, and may be 2,3 or more, or the positions of 2 past frames are not limited, and may be adjacent frames, or may not be adjacent frames, as long as the purpose of calculating the velocity vectors of all the joint points of the current frame can be achieved. For example, in a 20FPS animation, 2 past frames are adjacent frames, and the 2 past frames differ only by 1/20=0.05 seconds, and if the 2 past frames are not adjacent frames, the time by which the 2 past frames differ may be divided by the number of frames spaced between the 2 past frames. The present embodiment does not limit the animation frame rate, and the animation frame rate may be set to 30 FPS. The standard deviation sigma is not limited in the embodiment, and is the standard deviation in probability theory, and the sigma can adjust the variability of the action completion and is set according to requirements. The number of the action frames to be compensated is not limited, and the user can set the action frames according to needs. Specifically, the user can specify animation data and animation frame rate of 2 past frames and 1 target frame, set standard deviation input sigma, and number of animation frames S to be completed through a user interaction interface provided by the plug-in.
The preset number of joint points of the human skeleton in this embodiment is 33, which specifically includes: {0, Hips }, {1, Spine }, {2, Spine1}, {3, Spine2}, {4, Spine3}, {5, hack }, {6, hack 1}, {7, Head }, {8, Head }, {9, right shot }, {10, RightArm }, {11, rightForeArm }, {12, RightHand }, {13, rightHandHandthu }, {14, right each expert network calculated weighted pose increment Middle }, {15, leftshot }, {16, LeftArm }, {17, leftForeArm }, {18, leftd }, {19, LeftHand }, {20, LembD calculated weighted pose } of each network, FoftForteArt }, {22, FoftDot }, {26, FoftDoftDot }, {23, LeftDoftDo }, {20, LeftDo }, {20, LeftDod network calculated weighted pose, FoftDoftDoftDo }, {26, LeftDoftDoftDo }, {23, LeftDoftDo }, {26, LeftDoftDoftDo }, lefttowebabaseend }. Wherein the Hips joint point is the root joint point of the human skeleton.
The rotation axis of each joint point is: {0, { X, Y, Z } }, {1, { X, Y, Z } }, {2, { X, Y, Z } }, {3, { X, Y, Z } }, {4, { X, Y, Z } }, {5, { X, Y, Z } }, {6, { X, Y, Z } }, {7, { X, Y, Z } }, {8, { }, {9, { X, Z } }, {10, { X, Y, Z } }, {11, { X, Y } }, {12, { X, Z } }, {13, { }, {14, { }, {15, { X, Z } }, {16, { X, Y, Z } }, {17, { X, Y } }, {18, { X, Z } }, {19, { }, {20, { X, Z } }, {21, { X, Z } }, X, Y } }, {22, { X, Z } }, {22, { X, Y } }, { X, Z } }, {7, {3, { X, Y, Z } }, {3, { X, Z } }, {3, { X, Z } }, {3, { X, Z } }, {3, { X, Z } }, {3, { X, Z } }, X, Z } }, X, Z } }, and {22, Z } }, Z }, and {22, {6, Z } }, X, Z }, and {22, {6, Z }, and {22, Z } }, Z }, and {22, {6, Z }, and {22, Z }, a unit, X, and {6, X, Z } }, {6, X, and {6, Z }, in a unit, Z }, respectively, {23, { X, Y, Z } }, {24, { }, {25, { X } }, {26, { }, {27, { X, Y, Z } }, {28, { X } }, {29, { X, Y, Z } }, {30, { }, {31, { X } }, {32, { }. Where 0-32 represent the serial numbers of the joint points, X, Y, Z represent the X-axis, Y-axis, and Z-axis, respectively.
Presetting a father node list corresponding to all the joint points as follows: {0, { }, {1, {0} }, {2, {1} }, {3, {2} }, {4, {3} }, {5, {4} }, {6, {5} }, {7, {8} }, {8, {7} }, {9, {4} }, {10, {9, {11, {10} }, {12, {11} }, {13, {12} }, {14, {12} }, {15, {4} }, {16, {15, {24, {18, }, {20, {18, }, {21, {0} }, {22, {21, {22, }, {22} }, {24, }, {23, {25, {26 } }, {26, }, {27} }, {28, }, {27} } {29, {28} }, {30, {29} }, {31, {30} }, {32, {31} }.
Presetting a standard bone offset value list of all joint points relative to a corresponding father node as follows: {1, {0.0,10.973504172820446, -1.8905411922178885}, {2, {0.0,9.839155069029879,0.0} }, {3, {0.0,9.839155069029879,0.0} }, {4, {0.0,9.839155069029879,0.0} }, {5, {0.0,12.650342231609844,1.2650342231609846} }, {6, {0.0,5.627352749797471,0.0} }, {7, {0.0,5.627352749797471,0.0} }, {8, {0.0,9.727720099469252,0.0} }, {9, {0.0, 68653, 7.079303955478363} }, {10, {0.0,18.540949701045772,0.0} }, {11, {0.0,28.886655990727327,0.0} }, {12, {0.0,22.323240911467128,0.0} }, {13, {3.0643109737346546, 3.0643109737346546, {0, 3.0643109737346546, { 72, { 0.72, {3.0643109737346546, }, {0.0,8.673840303542688, -0.055601540407324915} }, {21, { -9.336602969020769,0.0,0.0} }, {22, {0.0, -41.79405063115827,0.0} }, {23, {0.0, -39.71819658810898,0.0} }, {24, {0.0, -3.5319294628875477,0.0} }, {25, {0.0,0.0,13.83215668621111} }, {26, {0.0,0.0,7.599936005436189} }, {27, {9.336602969020769,0.0,0.0} }, {28, {0.0, -41.79405063115827,0.0} }, {29, {0.0, -39.71819658810898,0.0} }, {30, {0.0, -3.5319294628875477,0.0} }, {31, {0.0,0.0,13.83215668621111} }, {32, {0.0,0.0, 0. 7.329371066117648 }.0.
In an implementation manner, fig. 9 is a flowchart of a data preprocessing method provided in an embodiment of the present application, and step S201 includes:
s2011: acquiring data of a first past frame, a second past frame and a target frame of motion data to be supplemented, and performing data format conversion on Euler angles of all joint points of the first past frame, the second past frame and the target frame to obtain rotation matrixes of all joint points of the first past frame, the second past frame and the target frame.
Here, the present embodiment employs 2 adjacent past frames, the second past frame is an adjacent frame closest to the transition frame to be generated, and the first past frame is a frame adjacent to the second past frame. At present, animation data frames generally adopt euler angle formats of all joint points, so that data format conversion needs to be performed on a first past frame, a second past frame and a target frame to obtain rotation matrices of all joint points required by the embodiment. Specifically, the euler angle data may be converted to a rotation matrix using a scipy mathematical tool library.
S2012: and reserving the first two column vectors of the rotation matrixes of all the joint points of the first past frame, the second past frame and the target frame and compressing the two column vectors into a 6-dimensional vector to obtain 6D rotation vectors of all the joint points of the first past frame, the second past frame and the target frame.
Specifically, the first two column vectors of the rotation matrix of all the joint points of the first past frame are reserved and compressed into a 6-dimensional vector, so as to obtain 6D rotation vectors of all the joint points of the first past frame; reserving the first two column vectors of the rotation matrix of all the joint points of the second past frame and compressing the rotation matrix into a 6-dimensional vector to obtain 6D rotation vectors of all the joint points of the second past frame; and reserving the first two column vectors of the rotation matrix of all the joint points of the target frame and compressing the rotation matrix into a 6-dimensional vector to obtain 6D rotation vectors of all the joint points of the target frame.
S2013: and calculating the 3D space coordinates of all the joint points of the first past frame, the second past frame and the target frame according to the 3D space coordinates of the Hips joint points of the first past frame, the second past frame and the target frame, the rotation matrixes of all the joint points of the first past frame, the second past frame and the target frame, a preset father node list corresponding to all the joint points and a preset standard skeleton offset value list relative to the corresponding father node, wherein the 3D space coordinates of all the joint points of the first past frame, the second past frame and the target frame are obtained by adopting a method that the result of the product of the rotation matrixes of the joint points and the corresponding standard skeleton offset values and the sum of the product and the 3D space coordinates of the corresponding father nodes is taken as the 3D space coordinates of the joint points.
Specifically, the 3D space coordinates of all the joint points of the first past frame are calculated according to the 3D space coordinates of the Hips joint points of the first past frame, the rotation matrices of all the joint points of the first past frame, a preset parent node list corresponding to all the joint points, and a preset standard bone offset value list of all the joint points relative to the corresponding parent node, wherein the 3D space coordinates of all the joint points of the first past frame are obtained by taking the result of the product of the rotation matrices of the joint points and the corresponding standard bone offset values of the joint points and then the sum of the product and the 3D space coordinates of the corresponding parent nodes as the 3D space coordinates of the joint points.
And calculating the 3D space coordinates of all the joint points of the second past frame according to the 3D space coordinates of the Hips joint points of the second past frame, the rotation matrixes of all the joint points of the second past frame, a preset father node list corresponding to all the joint points and a preset standard skeleton offset value list of all the joint points relative to the corresponding father nodes, wherein the 3D space coordinates of all the joint points of the second past frame are obtained by adopting a method that the rotation matrixes of the joint points are multiplied by the corresponding standard skeleton offset values and then are summed with the 3D space coordinates of the corresponding father nodes to serve as the 3D space coordinates of the joint points.
And calculating the 3D space coordinates of all the joint points of the target frame according to the 3D space coordinates of the Hips joint points of the target frame, the rotation matrixes of all the joint points of the target frame, a preset father node list corresponding to all the joint points and a preset standard skeleton offset value list of all the joint points relative to the corresponding father nodes, wherein the method is used for calculating the 3D space coordinates of all the joint points of the target frame by taking the result of the summation of the rotation matrixes of the joint points and the corresponding standard skeleton offset values of the joint points and the 3D space coordinates of the corresponding father nodes as the 3D space coordinates of the joint points.
Wherein, since the second past frame is a frame closest to the transition frame to be generated, the second past frame is designated as the current frame, and then the 6D rotation vectors of all the joint points of the second past frame are the 6D rotation vectors of all the joint points of the current frame, and the 3D spatial coordinates of all the joint points of the second past frame are the 3D spatial coordinates of all the joint points of the current frame.
S2014: and subtracting the 3D space coordinates of all the joint points of the first past frame from the 3D space coordinates of all the joint points of the second past frame to obtain a difference value, and multiplying the difference value by the animation frame rate to obtain the velocity vectors of all the joint points of the current frame.
S2015: reading the velocity vectors of a preset first joint point, a preset second joint point, a preset third joint point and a preset fourth joint point of a second past frame, respectively judging the magnitude relation between the velocity vector of each preset joint point and the preset velocity vector, generating velocity judgment Boolean values of the preset first joint point, the preset second joint point, the preset third joint point and the preset fourth joint point, and splicing the velocity judgment Boolean values to obtain a first Boolean vector of the current frame; reading 3D space coordinates of a preset first joint point, a preset second joint point, a preset third joint point and a preset fourth joint point of a second past frame, respectively judging the size relationship between the y coordinate value of each preset joint point and a preset length value, generating distance judgment Boolean values of the preset first joint point, the preset second joint point, the preset third joint point and the preset fourth joint point, and splicing the distance judgment Boolean values to obtain a second Boolean vector of the current frame; and the first Boolean vector and the second Boolean vector of the current frame are subjected to AND operation to obtain a four-dimensional Boolean vector of the current frame.
The preset first joint point, the preset second joint point, the preset third joint point and the preset fourth joint point are rightForeFoot, rightToeBaseEnd, leftForeFoot and leftToeBaseEnd respectively. The preset velocity vector is not limited here, and may be set according to specific needs, for example, may be set to 0.7 m/s. The preset length value is not limited and can be set according to specific needs, for example, can be set to 5 cm. Specifically, according to the velocity vectors of the rightForeFoot, rightToeBaseEnd, leftForeFoot and leftToeBaseEnd joint points of the current frame, if the velocity vector on the joint point is greater than or equal to 0.7m/s, the velocity judgment Boolean value is assigned to be 0; and if the speed is less than 0.7m/s, the speed judges that the Boolean value is assigned to be 1. And splicing the Boolean values of the four joint points to obtain a first Boolean vector. According to the 3D space coordinates of the RightForeFoot, RightToBaseEnd, LeftForeFoot and LeftToeBaseEnd of the current frame, if the y coordinate on the joint point is more than or equal to 5cm, the distance judgment Boolean value is assigned to be 0; and if the y coordinate is less than 5cm, the distance judgment Boolean value is assigned to be 1. And splicing the distance judgment Boolean values of the four joint points to obtain a second Boolean vector, and carrying out AND operation on the first Boolean vector and the second Boolean vector of the current frame to obtain a four-dimensional Boolean vector of the current frame.
Based on the technical scheme, the 6D rotation vectors of all the joint points of the current frame, the 3D space coordinates of all the joint points, the velocity vectors and the four-dimensional Boolean vectors of all the joint points of the target frame, the 6D rotation vectors of all the joint points of the target frame and the 3D space coordinates of all the joint points are obtained, data preprocessing is completed, and preparation is made for generating a transition frame later.
S202: and loading the trained motion completion model, and inputting the animation frame rate and the standard deviation.
After the action completion model is trained, the neural network parameters are stored on a hard disk or other storage spaces, when the action completion model needs to be used, the neural network parameters on the hard disk or other storage spaces are read, the process is called loading, the number of action frames to be completed, the standard deviation and the action frame rate are extra configurations of the action completion model after the loading is completed, and preparation is made for generating a transition frame later.
In an implementation manner, fig. 3 is a schematic structural diagram of an action completion model provided in this embodiment, and the action completion model trained in S102 includes a first mapping sub-network E1Second mapping sub-network E2A third mapping sub-network E3Gated neural network G and 128 expertsNeural network EPi, i =1,2, 3.. 128, wherein the first mapping subnetwork E maps to a subnetwork E1Second mapping sub-network E2And a third mapping sub-network E3And forming a coding system, coding the motion state quantity to a 768-dimensional hidden vector, receiving the hidden vector by a gated neural network G and making a decision, selecting different expert neural networks EPi and corresponding weights to perform weighted fusion of hidden vector calculation, and calculating the hidden vector by the expert neural networks EPi to obtain the motion state increment of the next frame.
S203: and performing data splicing on the 6D rotation vectors of all the joint points of the current frame, the 3D space coordinates of all the joint points, the velocity vectors and the four-dimensional Boolean vectors of all the joint points, the 6D rotation vectors of all the joint points of the target frame and the 3D space coordinates of all the joint points to obtain a first input vector, a second input vector and a third input vector.
In one implementable embodiment, step S103 includes: splicing the 6D rotation vectors of all the joint points of the current frame, the velocity vectors of all the joint points of the current frame and the four-dimensional Boolean vector of the current frame to obtain a first input vector; subtracting the 3D space coordinates of all the joint points of the current frame from the 3D space coordinates of all the joint points of the target frame to obtain a first offset vector, subtracting the 6D rotation vectors of all the joint points of the current frame from the 6D rotation vectors of all the joint points of the target frame to obtain a second offset vector, and splicing the first offset vector and the second offset vector to obtain a second input vector; the 6D rotation vectors of all the joint points of the target frame are taken as the third input vector.
The stitching refers to a data merging operation, for example, the stitching refers to merging the 6D rotation vectors of all joint points of the current frame, the velocity vectors of all joint points of the current frame, and the four-dimensional boolean vector of the current frame, that is, the 6D rotation vectors of all joint points of the current frame, the velocity vectors of all joint points of the current frame, and the four-dimensional boolean vector of the current frame are merged together in the same dimension, and an obtained result includes data information of the two.
S204: and inputting the first input vector, the second input vector, the third input vector and the number of motion frames to be compensated into the trained motion compensation model, and outputting predicted 6D rotation vector increment of all joint points of all transition frames and predicted 3D space coordinate increment of Hips joint points of all transition frames.
S205: and adding the 6D rotation vector increment of all the joint points of all the predicted transition frames to the 6D rotation vector of all the joint points of the current frame to obtain the 6D rotation vector of all the joint points of the next transition frame, storing the 6D rotation vector, adding the 3D space coordinate increment of the Hips joint points of all the predicted transition frames to the 3D space coordinate of the Hips joint points of the current frame to obtain the 3D space coordinate of the Hips joint points of the next transition frame, storing the 3D space coordinate increment of the Hips joint points of all the predicted transition frames, and subtracting 1 from the number of the action frames to be supplemented.
The obtained data of the next transition frame comprises 6D rotation vectors of all joint points of the next transition frame and 3D space coordinates of Hips joint points of the next transition frame, the data are stored, and after the data of the next transition frame are obtained, the number of the action frames to be compensated is reduced by one, so that the number of the action frames to be compensated is reduced by 1.
S206: if the number of the action frames to be supplemented is greater than 0, calculating to obtain rotation matrixes of all joint points of the current frame according to 6D rotation vectors of all joint points of the next transition frame, calculating to obtain 3D space coordinates of all joint points of the current frame, speed vectors of all joint points and four-dimensional Boolean vectors according to the 3D space coordinates of the Hips joint points of the next transition frame and the rotation matrixes of all joint points of the current frame, returning to S203, and if the number of the action frames to be supplemented is equal to 0, entering S207.
Judging whether the value of the number of motion frames to be supplemented is greater than 0, if the number of the motion frames to be supplemented is greater than 0, indicating that no transition frame is generated, calculating rotation matrixes of all joint points of the current frame according to 6D rotation vectors of all joint points of the next transition frame, calculating 3D space coordinates of all joint points of the current frame, speed vectors of all joint points and four-dimensional Boolean vectors according to the 3D space coordinates of all joint points of the next transition frame and the rotation matrixes of all joint points of the current frame, returning to the step S203, and executing the step of generating the transition frame again. Specifically, when the number of the action frames to be compensated is greater than 0, the currently obtained next transition frame data is used as the current frame data of the next transition frame data to be solved to participate in calculation so as to solve the data of the next transition frame. If the number of motion frames to be compensated is equal to 0, it indicates that all transition frames have been generated, and the process may proceed to step S207.
In an implementation manner, the method for calculating, in step S206, rotation matrices of all joint points of the current frame according to 6D rotation vectors of all joint points of the current frame, and calculating 3D spatial coordinates of all joint points of the current frame, velocity vectors of all joint points, and four-dimensional boolean vectors according to 3D spatial coordinates of Hips joint points of the current frame and rotation matrices of all joint points of the current frame includes:
s2061: taking the first 3 values of the 6D rotation vectors of all the joint points of the current frame as a vector x, taking the last 3 values as a vector y, standardizing the vector x to obtain an updated vector x, solving a cross product of the vector y and the updated vector x to obtain a vector z, standardizing the vector z to obtain an updated vector z, solving a cross product of the updated vector z and the updated vector x to obtain an updated vector y, and respectively taking the updated vector x, the updated vector y and the updated vector z as column vectors of a three-dimensional square matrix to obtain rotation matrices of all the joint points of the current frame;
s2062: according to the 3D space coordinates of the Hips joint points of the current frame, the rotation matrixes of all the joint points of the current frame, a father node list corresponding to all the joint points and a standard skeleton offset value list corresponding to all the joint points relative to the father node are preset, and for each joint point, the 3D space coordinates of all the joint points of the current frame are calculated by taking the result of the summation of the rotation matrix of the joint point and the corresponding standard skeleton offset value and the 3D space coordinates of the corresponding father node as the 3D space coordinates of the joint point;
s2063: subtracting the 3D space coordinates of all the joint points of the previous frame corresponding to the current frame from the 3D space coordinates of all the joint points of the current frame to obtain a difference value, and multiplying the difference value by the animation frame rate to obtain the velocity vectors of all the joint points of the current frame;
s2064: reading speed vectors of a preset first joint point, a preset second joint point, a preset third joint point and a preset fourth joint point of a current frame, respectively judging the magnitude relation between the speed vector of each preset joint point and the preset speed vector, generating speed judgment Boolean values of the preset first joint point, the preset second joint point, the preset third joint point and the preset fourth joint point, and splicing the speed judgment Boolean values to obtain a first Boolean vector of the current frame; reading 3D space coordinates of a preset first joint point, a preset second joint point, a preset third joint point and a preset fourth joint point of the current frame, respectively judging the size relationship between the y coordinate value of each preset joint point and the preset length value, generating distance judgment Boolean values of the preset first joint point, the preset second joint point, the preset third joint point and the preset fourth joint point, and splicing the distance judgment Boolean values to obtain a second Boolean vector of the current frame; and the first Boolean vector and the second Boolean vector of the current frame are subjected to AND operation to obtain a four-dimensional Boolean vector of the current frame.
S207: and 6D rotation vectors of all the joint points of each transition frame and 3D space coordinates of the Hips joint points are obtained, and the 6D rotation vectors of all the joint points of each transition frame are converted into a preset data format, so that the motion completion data of all the transition frames are obtained.
Specifically, the 6D rotation vectors of all the joint points and the 3D space coordinates of the Hips joint points form the complete animation data of each transition frame, and since the 6D rotation vectors are not in a data format commonly used for current animation data, the 6D rotation vectors need to be converted into a preset data format, such as an euler angle format, where a specific data format is not limited, and may be set according to specific situations, and finally the animation data of S transition frames is supplemented to the original animation data.
The above is a complete flow of motion completion, and the user can call the above flow for many times, and can obtain reasonable transitional motion data in different forms to meet different requirements of the user.
Referring to fig. 10, fig. 10 is a schematic structural diagram of a training device for an action completion model provided in an embodiment of the present application, where the training device for an action completion model described below and the training method for an action completion model described above are referred to in correspondence, and the training device for an action completion model provided in an embodiment of the present application includes:
a training data acquisition module 301, configured to acquire a large amount of original motion data of human body motion, perform data preprocessing according to a father node list corresponding to all preset joint points and a standard skeleton offset value list corresponding to all preset joint points relative to the father node, obtain, as a data set, 6D rotation vectors of all joint points, velocity vectors of all joint points, four-dimensional boolean vectors, and 3D spatial coordinates of all joint points of each frame except a first frame, and divide the data set into a training set and a verification set;
a network model establishing module 302, configured to establish an action completion model, where a standard deviation value is set as a preset standard deviation value;
a network model training module 303, configured to extract a preset number of action data batches from the training set as input data of the action completion model, and input the action completion model to obtain an output result of the action completion model, where each action data batch includes a preset number of action sequence groups, and each action sequence group includes a past frame, a target frame, and S transition frames, where S is a preset number of uniform sampling values within a preset sampling range value;
and the network model optimization module 304 is configured to calculate a total loss item of the system in each action sequence group according to the input data and the output result of the action completion model, and optimize the action completion model according to the total loss item of the system to obtain a trained action completion model.
Optionally, the training data obtaining module 301 includes:
the rotation matrix acquisition unit is used for acquiring the 3D space coordinates of the Hips joint points of each frame and the Euler angles of all the joint points in the original motion data, and performing data format conversion on the Euler angles of all the joint points of each frame to obtain the rotation matrix of all the joint points of each frame;
the rotation vector acquisition unit is used for reserving the first two column vectors of the rotation matrix of all the joint points of each frame and compressing the two column vectors into a 6-dimensional vector to obtain 6D rotation vectors of all the joint points of each frame;
the spatial coordinate acquisition unit is used for calculating the 3D spatial coordinates of all the joint points of each frame according to the 3D spatial coordinates of the Hips joint points of each frame, the rotation matrixes of all the joint points of each frame, a preset father node list corresponding to all the joint points and a preset standard skeleton offset value list corresponding to all the joint points relative to the father nodes, wherein the 3D spatial coordinates of all the joint points of each frame are obtained by adopting a method that the rotation matrixes of the joint points are multiplied by the corresponding standard skeleton offset values and then are summed with the 3D spatial coordinates of the corresponding father nodes to serve as the 3D spatial coordinates of the joint points;
the speed vector acquisition unit is used for subtracting the 3D space coordinates of all the joint points of the previous frame corresponding to the frame from the 3D space coordinates of all the joint points of each frame except the first frame to obtain a difference value, and multiplying the difference value by the animation frame rate to obtain the speed vectors of all the joint points of each frame except the first frame;
a boolean vector acquisition unit, configured to read velocity vectors of a preset first joint point, a preset second joint point, a preset third joint point, and a preset fourth joint point of each frame except for the first frame, respectively determine a magnitude relationship between the velocity vector of each preset joint point and the preset velocity vector, generate velocity determination boolean values of the preset first joint point, the preset second joint point, the preset third joint point, and the preset fourth joint point, and concatenate the velocity determination boolean values to obtain a first boolean vector of each frame except for the first frame; reading 3D space coordinates of a preset first joint point, a preset second joint point, a preset third joint point and a preset fourth joint point of each frame except the first frame, respectively judging the magnitude relation between the y coordinate value of each preset joint point and a preset length value, generating distance judgment Boolean values of the preset first joint point, the preset second joint point, the preset third joint point and the preset fourth joint point, and splicing the distance judgment Boolean values to obtain a second Boolean vector of each frame except the first frame; performing AND operation on the first Boolean vector and the second Boolean vector of each frame except the first frame to obtain a four-dimensional Boolean vector of each frame except the first frame,
and the data set dividing unit is used for saving the 6D rotation vectors, the 3D space coordinates, the speed vectors and the four-dimensional Boolean vectors of all the joint points of each frame except the first frame as data sets, wherein the data sets with a first preset proportion are divided into training sets, and the data sets with a second preset proportion are divided into testing sets.
Optionally, the network model training module 303 includes:
a distance frame number obtaining unit, configured to designate a past frame as a current frame, obtain a distance frame number of the current frame from a target frame, and initialize a transition frame number k = 1;
the first vector acquisition unit is used for extracting 6D rotation vectors of all joint points of the current frame, velocity vectors of all joint points of the current frame and four-dimensional Boolean vectors of the current frame to be spliced into a first input vector, inputting a first mapping sub-network of the motion completion model and outputting a first output vector Cs;
the second vector acquisition unit is used for extracting the 3D space coordinates of all joint points and the 6D rotation vectors of all joint points of the target frame, subtracting the 3D space coordinates of all joint points of the current frame from the 3D space coordinates of all joint points of the target frame to obtain a first offset vector, subtracting the 6D rotation vectors of all joint points of the current frame from the 6D rotation vectors of all joint points of the target frame to obtain a second offset vector, splicing the first offset vector and the second offset vector into a second input vector, inputting the second mapping sub-network of the motion compensation model, and outputting a second output vector Co;
a third vector acquisition unit, configured to extract 6D rotation vectors of all joint points of the target frame as a third input vector, input a third mapping sub-network of the motion completion model, and output a third output vector Ct;
the weight parameter obtaining unit is used for splicing the first output vector Cs, the second output vector Co and the third output vector Ct into a vector D, inputting the vector D into a gated neural network of the motion completion model, outputting 128 weights Wi by the gated neural network, wherein the weights Wi are between 0 and 1 and correspond to 128 expert neural networks EPi, if the weight is 0, the corresponding expert neural network EPi is frozen, and if the weight is not 0, the corresponding expert neural network EPi is activated;
a calculation result obtaining unit, configured to input the vector D, the distance frame number, and the standard deviation into an activated expert neural network EPi, output a posture increment Ti calculated by each expert network, where i =0, 1, 2.. 127, and the Ti is multiplied by a weight Wi to obtain a weighted posture increment Mi calculated by each expert network, add the weighted posture increments Mi calculated by each expert network to obtain a posture increment Tk calculated by a kth transition frame, store the posture increment Tk, subtract 1 from the distance frame number, and obtain a transition frame number k = k + 1;
the distance frame number judging unit is used for judging whether the distance frame number is greater than 0 or not, designating the next transition frame as the current frame, returning to the first vector acquiring unit, and entering the model result acquiring unit if the distance frame number is 0;
and the model result acquisition unit is used for splicing the obtained attitude increments Tk of the S transition frames to obtain the results T of all the transition frames in the action sequence group, namely the output result of the action completion model.
Optionally, the network model optimization module 304 includes:
a transition frame data obtaining unit, configured to obtain, from results T of all transition frames, 6D rotation vector increments of all joint points of all transition frames predicted by the motion completion model, 3D spatial coordinate increments of Hips joint points of all predicted transition frames, and four-dimensional boolean vectors of all predicted transition frames;
a first loss obtaining unit, configured to add the predicted 6D rotation vector increments of all joint points of all transition frames to the 6D rotation vectors of all joint points of the previous frame corresponding to the action sequence group to obtain 6D rotation vectors of all joint points of all predicted transition frames, and calculate a 2 norm of a difference between the predicted 6D rotation vectors of all joint points of all transition frames and the 6D rotation vectors of all joint points of the corresponding transition frames in the action sequence group to obtain a first loss term;
the second loss acquisition unit is used for adding the 3D space coordinate increment of the Hips joint points of all the predicted transition frames to the 3D space coordinate of the Hips joint point of the corresponding previous frame in the action sequence group to obtain the 3D space coordinate of the Hips joint points of all the predicted transition frames; taking the first 3 values of the 6D rotation vectors of all the joint points of all the predicted transition frames as a vector x, taking the last 3 values as a vector y, standardizing the vector x to obtain an updated vector x, solving a cross product of the vector y and the updated vector x to obtain a vector z, standardizing the vector z to obtain an updated vector z, solving a cross product of the updated vector z and the updated vector x to obtain an updated vector y, and respectively taking the updated vector x, the updated vector y and the updated vector z as column vectors of a three-dimensional square matrix to obtain the rotation matrices of all the joint points of all the predicted transition frames; calculating the 3D space coordinates of all the joint points of all the predicted transition frames according to the predicted 3D space coordinates of the Hips joint points of all the transition frames, the predicted rotation matrixes of all the joint points of all the transition frames, a preset father node list corresponding to all the joint points and a preset standard skeleton offset value list of all the joint points relative to the corresponding father nodes, wherein the method is used for calculating the 3D space coordinates of all the joint points of all the predicted transition frames by taking the result of the product of the rotation matrixes of the joint points and the corresponding standard skeleton offset values of the joint points and then the sum of the product and the 3D space coordinates of the corresponding father nodes as the 3D space coordinates of the joint points; calculating 2 norms of differences between the predicted 3D space coordinates of all joint points of all transition frames and the 3D space coordinates of all joint points of the corresponding transition frames in the action sequence group to obtain a second loss term;
a third loss obtaining unit, configured to calculate a 2-norm of a difference between a predicted four-dimensional boolean vector of the transition frame and a four-dimensional boolean vector of a corresponding transition frame in the action sequence group, to obtain a third loss term;
the fourth loss obtaining unit is used for summing the weights Wi of different samples of each expert neural network EPi in the action sequence group to obtain the importance of each expert neural network EPi in the action sequence group, and calculating the square of the variance according to the importance of 128 expert neural networks EPi to obtain a fourth loss term;
and the network parameter optimization unit is used for adding the first loss item, the second loss item, the third loss item and the fourth loss item to obtain a total loss item, optimizing the neural network parameters of the action completion model by using a preset optimizer according to the total loss item, verifying the action completion model by the verification set, and selecting the neural network parameters which enable the loss function of the verification set to be convergent and have the lowest loss as the neural network parameters of the trained action completion model.
In the following, a computer device provided in the embodiments of the present application is introduced, and the computer device described below and the training method of the motion completion model described above may be referred to correspondingly.
The present application provides a computer device comprising:
a memory for storing a computer program;
and the processor is used for realizing the steps of the training method of the motion completion model when the computer program is executed.
Since the embodiment of the computer device portion corresponds to the embodiment of the training method portion of the motion compensation model, please refer to the description of the embodiment of the training method portion of the motion compensation model for the embodiment of the computer device portion, and details are not repeated here.
In the following, a computer-readable storage medium provided by an embodiment of the present application is introduced, and the computer-readable storage medium described below and the training method of the motion completion model described above may be referred to correspondingly.
The present application provides a computer-readable storage medium having stored thereon a computer program which, when being executed by a processor, carries out the steps of the method of training a motion completion model as described above.
Since the embodiment of the computer-readable storage medium portion corresponds to the embodiment of the training method portion of the motion compensation model, for the embodiment of the computer-readable storage medium portion, please refer to the description of the embodiment of the training method portion of the motion compensation model, which is not repeated here.
The embodiments are described in a progressive manner in the specification, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. The device disclosed by the embodiment corresponds to the method disclosed by the embodiment, so that the description is simple, and the relevant points can be referred to the method part for description.
Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative components and steps have been described above generally in terms of their functionality in order to clearly illustrate this interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in Random Access Memory (RAM), memory, Read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.
The present application provides a method for training a motion compensation model, a method for compensating a motion, a device for training a motion compensation model, a computer device, and a computer-readable storage medium. The principles and embodiments of the present application are explained herein using specific examples, which are provided only to help understand the method and the core idea of the present application. It should be noted that, for those skilled in the art, it is possible to make several improvements and modifications to the present application without departing from the principle of the present application, and such improvements and modifications also fall within the scope of the claims of the present application.

Claims (10)

1. A training method of an action completion model is characterized by comprising the following steps:
s101: acquiring a large amount of original motion data of human motion, performing data preprocessing according to a father node list corresponding to all preset joint points and a standard skeleton offset value list corresponding to all preset joint points relative to the father nodes to obtain 6D rotation vectors, speed vectors, four-dimensional Boolean vectors and 3D space coordinates of all joint points of each frame except a first frame as a data set, and dividing the data set into a training set and a verification set;
s102: establishing an action completion model, and setting a standard difference value as a preset standard difference value;
s103: extracting a preset batch number of action data batches from the training set as input data of an action completion model, inputting the action completion model to obtain an output result of the action completion model, wherein each action data batch comprises a preset group number of action sequence groups, each action sequence group comprises a past frame, a target frame and S transition frames, and S is a preset group number of uniform sampling values in a preset sampling range value;
s104: and calculating the total loss item of the system in each action sequence group according to the input data and the output result of the action completion model, and optimizing the action completion model according to the total loss item of the system to obtain the trained action completion model.
2. The method for training the motion completion model according to claim 1, wherein the S101 comprises:
acquiring 3D space coordinates of Hips joint points and Euler angles of all joint points of each frame in original motion data, and performing data format conversion on the Euler angles of all joint points of each frame to obtain a rotation matrix of all joint points of each frame;
reserving the first two column vectors of the rotation matrix of all the joint points of each frame and compressing the rotation matrix into a 6-dimensional vector to obtain 6D rotation vectors of all the joint points of each frame;
calculating to obtain the 3D space coordinates of all the joint points of each frame according to the 3D space coordinates of the Hips joint points of each frame, the rotation matrixes of all the joint points of each frame, a preset father node list corresponding to all the joint points and a preset standard skeleton offset value list of all the joint points relative to the corresponding father node, wherein the method is used for calculating the 3D space coordinates of all the joint points of each frame by taking the result of the product of the rotation matrixes of the joint points and the corresponding standard skeleton offset values of the joint points and then the sum of the product and the 3D space coordinates of the corresponding father nodes as the 3D space coordinates of the joint points;
subtracting the 3D space coordinates of all the joint points of the previous frame corresponding to the first frame from the 3D space coordinates of all the joint points of each frame except the first frame to obtain a difference value, and multiplying the difference value by the animation frame rate to obtain the velocity vectors of all the joint points of each frame except the first frame;
reading the velocity vectors of a preset first joint point, a preset second joint point, a preset third joint point and a preset fourth joint point of each frame except the first frame, respectively judging the magnitude relation between the velocity vector of each preset joint point and the preset velocity vector, generating velocity judgment Boolean values of the preset first joint point, the preset second joint point, the preset third joint point and the preset fourth joint point, and splicing the velocity judgment Boolean values to obtain a first Boolean vector of each frame except the first frame;
reading 3D space coordinates of a preset first joint point, a preset second joint point, a preset third joint point and a preset fourth joint point of each frame except the first frame, respectively judging the magnitude relation between the y coordinate value of each preset joint point and a preset length value, generating distance judgment Boolean values of the preset first joint point, the preset second joint point, the preset third joint point and the preset fourth joint point, and splicing the distance judgment Boolean values to obtain a second Boolean vector of each frame except the first frame;
performing AND operation on the first Boolean vector and the second Boolean vector of each frame except the first frame to obtain a four-dimensional Boolean vector of each frame except the first frame,
and saving the 6D rotation vectors, the 3D space coordinates, the speed vectors and the four-dimensional Boolean vectors of all the joint points of each frame except the first frame as a data set, wherein the data set with a first preset proportion is divided into a training set, and the data set with a second preset proportion is divided into a testing set.
3. The method for training the motion completion model according to claim 1, wherein the motion completion model in S102 comprises: the system comprises a first mapping sub-network, a second mapping sub-network, a third mapping sub-network, a gated neural network and 128 expert neural networks EPi, wherein the first mapping sub-network, the second mapping sub-network and the third mapping sub-network form a coding system, motion state quantities are coded onto 768-dimensional hidden vectors, the gated neural network receives the hidden vectors and makes decisions, different expert neural networks EPi and corresponding weights are selected to perform weighted fusion of hidden vector calculation, and the expert neural networks EPi calculate the hidden vectors to obtain motion state increment of the next frame.
4. The method for training the motion completion model according to claim 3, wherein the step S103 comprises:
s1031: appointing a past frame as a current frame, acquiring the distance frame number of the current frame from a target frame, and initializing a transition frame number k = 1;
s1032: extracting 6D rotation vectors of all joint points of the current frame, speed vectors of all joint points of the current frame and four-dimensional Boolean vectors of the current frame, splicing the vectors into a first input vector, inputting a first mapping sub-network of the motion completion model, and outputting a first output vector Cs;
s1033: extracting 3D space coordinates of all joint points of a target frame and 6D rotation vectors of all joint points, subtracting the 3D space coordinates of all joint points of a current frame from the 3D space coordinates of all joint points of the target frame to obtain a first offset vector, subtracting the 6D rotation vectors of all joint points of the current frame from the 6D rotation vectors of all joint points of the target frame to obtain a second offset vector, splicing the first offset vector and the second offset vector into a second input vector, inputting the second mapping sub-network of the motion compensation model, and outputting a second output vector Co;
s1034: extracting 6D rotation vectors of all joint points of the target frame as third input vectors, inputting a third mapping sub-network of the motion completion model, and outputting third output vectors Ct;
s1035: splicing the first output vector Cs, the second output vector Co and the third output vector Ct into a vector D, inputting the vector D into a gated neural network of the motion completion model, outputting 128 weights Wi by the gated neural network, wherein the weights Wi are between 0 and 1 and correspond to 128 expert neural networks EPi, freezing the corresponding expert neural networks EPi if the weights are 0, and activating the corresponding expert neural networks EPi if the weights are not 0;
s1036: respectively inputting the vector D, the distance frame number and the standard deviation into an activated expert neural network EPi, outputting a posture increment Ti, i =0, 1, 2.. 127 calculated by each expert network, multiplying the Ti by a weight Wi to obtain a weighted posture increment Mi calculated by each expert network, adding the weighted posture increments Mi calculated by each expert network to obtain a posture increment Tk calculated by a kth transition frame, storing the posture increment Tk, subtracting 1 from the distance frame number, and numbering k = k +1 for the transition frame;
s1037: if the distance frame number is greater than 0, the next transition frame is designated as the current frame, the step returns to S1032, and if the distance frame number is 0, the step enters S1038;
s1038: and splicing the obtained attitude increment Tk of the S transition frames to obtain the results T of all the transition frames in the action sequence group, namely the output result of the action completion model.
5. The method for training the motion completion model according to claim 1, wherein the S104 comprises:
s1041: obtaining 6D rotation vector increment of all joint points of all transition frames predicted by the motion completion model, 3D space coordinate increment of Hips joint points of all predicted transition frames and four-dimensional Boolean vectors of all predicted transition frames from results T of all transition frames;
s1042: adding the 6D rotation vector increment of all the joint points of all the predicted transition frames to the 6D rotation vectors of all the joint points of the corresponding previous frame in the action sequence group to obtain the 6D rotation vectors of all the joint points of all the predicted transition frames, and calculating 2 norms of the difference values between the 6D rotation vectors of all the joint points of all the predicted transition frames and the 6D rotation vectors of all the joint points of corresponding transition frames in the action sequence group to obtain a first loss term;
s1043: adding the predicted 3D space coordinate increment of the Hips joint points of all the transition frames to the 3D space coordinate of the Hips joint point of the previous frame corresponding to the action sequence group to obtain the predicted 3D space coordinate of the Hips joint points of all the transition frames;
taking the first 3 values of the 6D rotation vectors of all the joint points of all the predicted transition frames as a vector x, taking the last 3 values as a vector y, standardizing the vector x to obtain an updated vector x, solving a cross product of the vector y and the updated vector x to obtain a vector z, standardizing the vector z to obtain an updated vector z, solving a cross product of the updated vector z and the updated vector x to obtain an updated vector y, and respectively taking the updated vector x, the updated vector y and the updated vector z as column vectors of a three-dimensional square matrix to obtain the rotation matrices of all the joint points of all the predicted transition frames;
calculating the 3D space coordinates of all the joint points of all the predicted transition frames according to the predicted 3D space coordinates of the Hips joint points of all the transition frames, the predicted rotation matrixes of all the joint points of all the transition frames, a preset father node list corresponding to all the joint points and a preset standard skeleton offset value list of all the joint points relative to the corresponding father nodes, wherein the method is used for calculating the 3D space coordinates of all the joint points of all the predicted transition frames by taking the result of the product of the rotation matrixes of the joint points and the corresponding standard skeleton offset values of the joint points and then the sum of the product and the 3D space coordinates of the corresponding father nodes as the 3D space coordinates of the joint points;
calculating 2 norms of differences between the predicted 3D space coordinates of all joint points of all transition frames and the 3D space coordinates of all joint points of the corresponding transition frames in the action sequence group to obtain a second loss term;
s1044: calculating 2 norms of differences between the predicted four-dimensional Boolean vectors of the transition frames and the four-dimensional Boolean vectors of the corresponding transition frames in the action sequence group to obtain third loss items;
s1045: in the action sequence group, summing the weights Wi of different samples of each expert neural network EPi to obtain the importance of each expert neural network EPi in the action sequence group, and calculating the square of the variance according to the importance of 128 expert neural networks EPi to obtain a fourth loss term;
s1046: and adding the first loss term, the second loss term, the third loss term and the fourth loss term to obtain a total loss term, optimizing the neural network parameters of the action completion model by using a preset optimizer according to the total loss term, verifying the action completion model by using the verification set, and selecting the neural network parameters which enable the loss function of the verification set to be convergent and have the lowest loss as the neural network parameters of the trained action completion model.
6. An action completion method, comprising:
s201: acquiring a target frame, at least two past frames, an animation frame rate, a standard deviation and a number of motion frames to be supplemented of motion data, and performing data preprocessing according to a father node list corresponding to all preset joint points and a standard skeleton offset value list corresponding to all preset joint points relative to the father nodes to obtain 6D rotation vectors of all joint points, 3D space coordinates of all joint points, speed vectors of all joint points, four-dimensional Boolean vectors, 6D rotation vectors of all joint points of the target frame and 3D space coordinates of all joint points of the current frame;
s202: loading a trained motion completion model, and inputting an animation frame rate and a standard deviation, wherein the trained motion completion model is obtained by training through a training method of the motion completion model according to any one of claims 1 to 5;
s203: performing data splicing on the 6D rotation vectors of all joint points of the current frame, the 3D space coordinates of all joint points, the velocity vectors and the four-dimensional Boolean vectors of all joint points, the 6D rotation vectors of all joint points of the target frame and the 3D space coordinates of all joint points to obtain a first input vector, a second input vector and a third input vector;
s204: inputting the first input vector, the second input vector, the third input vector and the number of motion frames to be compensated into the trained motion compensation model, and outputting predicted 6D rotation vector increment of all joint points of all transition frames and predicted 3D space coordinate increment of Hips joint points of all transition frames;
s205: adding the 6D rotation vector increment of all the joint points of all the predicted transition frames to the 6D rotation vector increment of all the joint points of the current frame to obtain the 6D rotation vector of all the joint points of the next transition frame, storing the 6D rotation vector increment of all the joint points of all the predicted transition frames, adding the 3D space coordinate increment of the Hips joint points of all the predicted transition frames to the 3D space coordinate of the Hips joint points of the current frame to obtain the 3D space coordinate of the Hips joint points of the next transition frame, storing the 3D space coordinate increment of the Hips joint points of the next transition frame, and subtracting 1 from the number of the action frames to be supplemented;
s206: if the number of the action frames to be supplemented is greater than 0, calculating to obtain rotation matrixes of all joint points of the current frame according to 6D rotation vectors of all joint points of the next transition frame, calculating to obtain 3D space coordinates, speed vectors and four-dimensional Boolean vectors of all joint points of the current frame according to the 3D space coordinates of the Hips joint points of the next transition frame and the rotation matrixes of all joint points of the current frame, returning to S203, and if the number of the action frames to be supplemented is equal to 0, entering S207;
s207: and 6D rotation vectors of all the joint points of each transition frame and 3D space coordinates of the Hips joint points are obtained, and the 6D rotation vectors of all the joint points of each transition frame are converted into a preset data format, so that the motion completion data of all the transition frames are obtained.
7. A training device for an action completion model, comprising:
the training data acquisition module is used for acquiring a large amount of original motion data of human motion, performing data preprocessing according to a father node list corresponding to all preset joint points and a standard skeleton offset value list corresponding to all preset joint points relative to the father nodes to obtain 6D rotation vectors, speed vectors and four-dimensional Boolean vectors of all joint points and 3D space coordinates of all joint points of each frame except a first frame as a data set, and dividing the data set into a training set and a verification set;
the network model establishing module is used for establishing an action completion model and setting a standard difference value as a preset standard difference value;
the network model training module is used for extracting a preset batch number of action data batches from a training set as input data of an action completion model, inputting the action completion model to obtain an output result of the action completion model, wherein each action data batch comprises a preset group number of action sequence groups, each action sequence group comprises a past frame, a target frame and S transition frames, and S is a preset group number of uniform sampling values in a preset sampling range value;
and the network model optimization module is used for calculating the total loss item of the system in each action sequence group according to the input data and the output result of the action completion model, and optimizing the action completion model according to the total loss item of the system to obtain the trained action completion model.
8. The apparatus for training an action completion model according to claim 7, wherein the training data acquisition module comprises:
the rotation matrix acquisition unit is used for acquiring the 3D space coordinates of the Hips joint points of each frame and the Euler angles of all the joint points in the original motion data, and performing data format conversion on the Euler angles of all the joint points of each frame to obtain the rotation matrix of all the joint points of each frame;
the rotation vector acquisition unit is used for reserving the first two column vectors of the rotation matrix of all the joint points of each frame and compressing the two column vectors into a 6-dimensional vector to obtain 6D rotation vectors of all the joint points of each frame;
the spatial coordinate acquisition unit is used for calculating the 3D spatial coordinates of all the joint points of each frame according to the 3D spatial coordinates of the Hips joint points of each frame, the rotation matrixes of all the joint points of each frame, a preset father node list corresponding to all the joint points and a preset standard skeleton offset value list corresponding to all the joint points relative to the father nodes, wherein the 3D spatial coordinates of all the joint points of each frame are obtained by adopting a method that the rotation matrixes of the joint points are multiplied by the corresponding standard skeleton offset values and then are summed with the 3D spatial coordinates of the corresponding father nodes to serve as the 3D spatial coordinates of the joint points;
the speed vector acquisition unit is used for subtracting the 3D space coordinates of all the joint points of the previous frame corresponding to the frame from the 3D space coordinates of all the joint points of each frame except the first frame to obtain a difference value, and multiplying the difference value by the animation frame rate to obtain the speed vectors of all the joint points of each frame except the first frame;
a boolean vector acquisition unit, configured to read velocity vectors of a preset first joint point, a preset second joint point, a preset third joint point, and a preset fourth joint point of each frame except for the first frame, respectively determine a magnitude relationship between the velocity vector of each preset joint point and the preset velocity vector, generate velocity determination boolean values of the preset first joint point, the preset second joint point, the preset third joint point, and the preset fourth joint point, and concatenate the velocity determination boolean values to obtain a first boolean vector of each frame except for the first frame; reading 3D space coordinates of a preset first joint point, a preset second joint point, a preset third joint point and a preset fourth joint point of each frame except the first frame, respectively judging the magnitude relation between the y coordinate value of each preset joint point and a preset length value, generating distance judgment Boolean values of the preset first joint point, the preset second joint point, the preset third joint point and the preset fourth joint point, and splicing the distance judgment Boolean values to obtain a second Boolean vector of each frame except the first frame; performing AND operation on the first Boolean vector and the second Boolean vector of each frame except the first frame to obtain a four-dimensional Boolean vector of each frame except the first frame,
and the data set dividing unit is used for saving the 6D rotation vectors, the 3D space coordinates, the speed vectors and the four-dimensional Boolean vectors of all the joint points of each frame except the first frame as data sets, wherein the data sets with a first preset proportion are divided into training sets, and the data sets with a second preset proportion are divided into testing sets.
9. A computer device, comprising:
a memory for storing a computer program;
a processor for implementing the steps of the training method of the motion completion model according to any of claims 1 to 5 when executing the computer program.
10. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program which, when being executed by a processor, carries out the steps of the training method of the motion completion model according to any one of claims 1 to 5.
CN202110890084.0A 2021-08-04 2021-08-04 Training method and device for motion completion model, completion method and device, and medium Active CN113345061B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110890084.0A CN113345061B (en) 2021-08-04 2021-08-04 Training method and device for motion completion model, completion method and device, and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110890084.0A CN113345061B (en) 2021-08-04 2021-08-04 Training method and device for motion completion model, completion method and device, and medium

Publications (2)

Publication Number Publication Date
CN113345061A true CN113345061A (en) 2021-09-03
CN113345061B CN113345061B (en) 2021-11-05

Family

ID=77480640

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110890084.0A Active CN113345061B (en) 2021-08-04 2021-08-04 Training method and device for motion completion model, completion method and device, and medium

Country Status (1)

Country Link
CN (1) CN113345061B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113554736A (en) * 2021-09-22 2021-10-26 成都市谛视科技有限公司 Skeleton animation vertex correction method and model learning method, device and equipment
CN116664555A (en) * 2023-07-26 2023-08-29 瀚博半导体(上海)有限公司 Neural network slice deployment method and system in real-time application scene

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104573665A (en) * 2015-01-23 2015-04-29 北京理工大学 Continuous motion recognition method based on improved viterbi algorithm
US20190295305A1 (en) * 2018-03-20 2019-09-26 Adobe Inc. Retargeting skeleton motion sequences through cycle consistency adversarial training of a motion synthesis neural network with a forward kinematics layer
CN111259779A (en) * 2020-01-13 2020-06-09 南京大学 Video motion detection method based on central point trajectory prediction
CN111583364A (en) * 2020-05-07 2020-08-25 江苏原力数字科技股份有限公司 Group animation generation method based on neural network
CN111709284A (en) * 2020-05-07 2020-09-25 西安理工大学 Dance emotion recognition method based on CNN-LSTM
US20200356827A1 (en) * 2019-05-10 2020-11-12 Samsung Electronics Co., Ltd. Efficient cnn-based solution for video frame interpolation
CN112037312A (en) * 2020-11-04 2020-12-04 成都市谛视科技有限公司 Real-time human body posture inverse kinematics solving method and device
CN112949544A (en) * 2021-03-17 2021-06-11 上海大学 Action time sequence detection method based on 3D convolutional network
CN113112577A (en) * 2021-04-20 2021-07-13 网易(杭州)网络有限公司 Training method of transition frame prediction model and transition frame prediction method

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104573665A (en) * 2015-01-23 2015-04-29 北京理工大学 Continuous motion recognition method based on improved viterbi algorithm
US20190295305A1 (en) * 2018-03-20 2019-09-26 Adobe Inc. Retargeting skeleton motion sequences through cycle consistency adversarial training of a motion synthesis neural network with a forward kinematics layer
US20200356827A1 (en) * 2019-05-10 2020-11-12 Samsung Electronics Co., Ltd. Efficient cnn-based solution for video frame interpolation
CN111259779A (en) * 2020-01-13 2020-06-09 南京大学 Video motion detection method based on central point trajectory prediction
CN111583364A (en) * 2020-05-07 2020-08-25 江苏原力数字科技股份有限公司 Group animation generation method based on neural network
CN111709284A (en) * 2020-05-07 2020-09-25 西安理工大学 Dance emotion recognition method based on CNN-LSTM
CN112037312A (en) * 2020-11-04 2020-12-04 成都市谛视科技有限公司 Real-time human body posture inverse kinematics solving method and device
CN112949544A (en) * 2021-03-17 2021-06-11 上海大学 Action time sequence detection method based on 3D convolutional network
CN113112577A (en) * 2021-04-20 2021-07-13 网易(杭州)网络有限公司 Training method of transition frame prediction model and transition frame prediction method

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
SINA HONARI ET AL: "Unsupervised Learning on Monocular Videos for 3D Human Pose Estimation", 《ARXIV.ORG》 *
张倩: "基于深度学习的视频插帧技术研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113554736A (en) * 2021-09-22 2021-10-26 成都市谛视科技有限公司 Skeleton animation vertex correction method and model learning method, device and equipment
CN116664555A (en) * 2023-07-26 2023-08-29 瀚博半导体(上海)有限公司 Neural network slice deployment method and system in real-time application scene
CN116664555B (en) * 2023-07-26 2024-02-06 瀚博半导体(上海)有限公司 Neural network slice deployment method and system in real-time application scene

Also Published As

Publication number Publication date
CN113345061B (en) 2021-11-05

Similar Documents

Publication Publication Date Title
CN113345062B (en) Three-dimensional virtual character action completion method and device and server
CN113345061B (en) Training method and device for motion completion model, completion method and device, and medium
CN110288681B (en) Character model skin method, device, medium and electronic equipment
Feng et al. An analysis of motion blending techniques
CN112991502B (en) Model training method, device, equipment and storage medium
CN112258625B (en) Method and system for reconstructing single image to three-dimensional point cloud model based on attention mechanism
CN109446952A (en) A kind of piano measure of supervision, device, computer equipment and storage medium
CN111462274A (en) Human body image synthesis method and system based on SMP L model
CN115294275A (en) Method and device for reconstructing three-dimensional model and computer readable storage medium
CN112561028A (en) Method for training neural network model, and method and device for data processing
CN111223168A (en) Target object control method and device, storage medium and computer equipment
CN114494003B (en) Ancient character generation method combining shape transformation and texture transformation
CN110826500B (en) Method for estimating 3D human body posture based on antagonistic network of motion link space
CN116778045A (en) Digital human generation method, system and device for nerve radiation field
CN113781616B (en) Facial animation binding acceleration method based on neural network
CN112966390B (en) Method and apparatus for garment processing based on dual three-dimensional distance fields
Huang et al. Inverse kinematics using dynamic joint parameters: inverse kinematics animation synthesis learnt from sub-divided motion micro-segments
CN113706670A (en) Method and device for generating dynamic three-dimensional human body mesh model sequence
CN115937374B (en) Digital human modeling method, device, equipment and medium
CN111882659A (en) High-precision human body foot shape reconstruction method integrating human body foot shape rule and visual shell
CN113077383B (en) Model training method and model training device
CN114863013A (en) Target object three-dimensional model reconstruction method
CN112560326A (en) Method and device for determining pressure field
CN117218300B (en) Three-dimensional model construction method, three-dimensional model construction training method and device
CN113538663B (en) Controllable human body shape complementing method based on depth characteristic decoupling

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20211201

Address after: 610000 No. 04 and 05, 27th floor, building 1, No. 716, middle section of Jiannan Avenue, Chengdu hi tech Zone, China (Sichuan) pilot Free Trade Zone, Chengdu, Sichuan

Patentee after: Chengdu Tishi infinite Technology Co.,Ltd.

Address before: No.04 and No.05, 27th floor, building 1, No.716, middle section of Jiannan Avenue, Chengdu high tech Zone, China (Sichuan) pilot Free Trade Zone, Chengdu, Sichuan 610094

Patentee before: Chengdu Tishi Technology Co.,Ltd.