CN106971414B

CN106971414B - Three-dimensional animation generation method based on deep cycle neural network algorithm

Info

Publication number: CN106971414B
Application number: CN201710143013.8A
Authority: CN
Inventors: 罗国亮; 项国雄; 李玉华; 易玉根; 雷浩鹏; 谢文强; 姜永金; 王金磊
Original assignee: East China Jiaotong University
Current assignee: East China Jiaotong University
Priority date: 2017-03-10
Filing date: 2017-03-10
Publication date: 2021-02-23
Anticipated expiration: 2037-03-10
Also published as: CN106971414A

Abstract

A three-dimensional animation generating method based on a deep circulation neural network algorithm is characterized in that a technical route which is supposed to be taken by the method is given by referring to an attached figure 1, and a technical route which takes an existing three-dimensional animation data set as a drive, generates a three-dimensional animation by fusing a deep neural network model technology on the basis of extracting dynamic characteristics of the three-dimensional animation and carries out related theory and algorithm design is described.

Description

Three-dimensional animation generation method based on deep cycle neural network algorithm

Technical Field

The invention relates to the field of computers, in particular to a three-dimensional animation generation method based on a deep cycle neural network algorithm.

Background

Three-dimensional Animation Generation (3D Animation Generation): by inputting the three-dimensional model, the behaviors of the specified three-dimensional animation are automatically extracted and the given three-dimensional model is driven, so that the given model is driven to generate the three-dimensional animation with similar behaviors.

Artificial Neural Network (ANN): the artificial neural network model comprises an input layer, a hidden layer and an output layer, simulates the principle of a nervous system in the human brain, and realizes information processing by adjusting the interconnection relationship among nodes of each layer in the neural network model. The neural network model can theoretically achieve infinite approximation to any target function.

Deep Learning (DL): deep learning is a new research field in machine learning research, is an extension of an artificial neural network, and is a deep learning network structure comprising multiple hidden layers, so that the process of human cognition is further simulated. In recent years, tools for deep learning have been widely used in many fields such as computer vision, speech recognition, and natural language processing, and their effective information recognition ability has been sufficiently verified.

The invention is based on national natural science fund, project name: the data-driven three-dimensional animation generation method study based on the deep learning (project number: 61602222).

Three-dimensional animation data has been widely used in the fields of digital entertainment, movie and television, medicine, education, etc. because three-dimensional animation representation information has the characteristics of accuracy, authenticity and operability. For example, in the aspect of movie production, a 3D cartoon "catch monster" and "mahogany coming" shown on the bottom of 2015 successively strike a homemade movie box office record, which is one of the widely accepted impressions of three-dimensional animation. Driven by the strong demand of many application fields, three-dimensional animation has gradually become an important research object in the field of computer graphics. With the rapid development of motion capture and three-dimensional model scanning technologies, the acquisition modes of three-dimensional animation are diversified, researchers widely research and design methods for generating three-dimensional animation, and the generation of three-dimensional animation is no longer an expensive technology. As can be easily predicted, three-dimensional animation data is becoming another mainstream data carrier following characters, voice, pictures, and video, and research on methods for generating three-dimensional animation is becoming one of the main research subjects in the field of computer graphics.

With the popularization of three-dimensional animation, a method for generating animation conveniently and accurately becomes a research hotspot and a key point of the current academic circles at home and abroad. At present, there are professional modeling and animation tools such as 3ds Max, Maya, and 3D Blender, and there are also methods for scanning three-dimensional objects one frame by one frame to obtain three-dimensional animation. With the gradual development and maturity of motion capture technology and high-precision three-dimensional static model scanning technology and the lower and lower cost, the method for realizing three-dimensional animation generation by fusing motion capture technology and three-dimensional model and loading motion capture data for grid model has been widely researched and designed by researchers in recent years. With the increasing attention and research on three-dimensional animation generation methods, the existing online three-dimensional animation data gradually grow, so that the data-driven three-dimensional animation generation method becomes possible. The method is characterized in that features in training data are extracted, a learning algorithm model is fused, model parameters are perfected through the training data, and therefore similar data with the training data features are directly generated through the trained model. It can be seen that three-dimensional animation generation technologies are diversified, and three-dimensional animation data sets are gradually increased, so that an efficient and convenient data-driven three-dimensional animation generation method becomes an inevitable trend. Deep learning algorithms have been of great interest and application in recent years as data-driven machine learning methods. In 2016, 3 months, google AlphaGo developed based on deep learning technology won the game with absolute advantage in the "man-machine war" with the korean weiqi master lithage. Deep learning algorithms also provide a new opportunity and challenge for developing animation generation techniques. On one hand, based on the neural network model obtained by similar three-dimensional animation data training, a grid model is given as input, and the trained neural network model is used for generating a new three-dimensional animation, so that the time cost is lower, and the output result is more similar to the characteristics of the training data. On the other hand, the application of the existing deep neural network model to the generation of three-dimensional animation still faces some problems: 1) the descriptors describing the dynamic characteristics of the three-dimensional animation are limited; 2) along with the increase of the input data scale, the operation amount index of the neural network is increased, and the three-dimensional animation is usually large in data scale and cannot be directly used as the input of the neural network; 3) there is no uniform method or criterion for evaluating the generation effect of the three-dimensional animation. With the continuous increase of the demand of the application industries and fields on the three-dimensional animation, the requirements on the efficiency and the quality of the three-dimensional animation generation method are higher and higher. The above problems greatly limit the design of the three-dimensional animation generation method, and become one of the major bottlenecks in the development of the three-dimensional animation-related industry. How to design a method which can generate three-dimensional animation efficiently and conveniently while taking the above problems into consideration is a main problem to be solved by the research.

Research aiming at three-dimensional animation production and generation technology is also widely concerned by scholars at home and abroad, and the technology for generating the three-dimensional animation is gradually developed towards diversification, convenience and low cost. In general, the main categories and advantages and disadvantages of the existing three-dimensional animation generation technologies are listed as follows:

1. frame-frame scanning, which mainly relies on static scanning of a three-dimensional model, and three-dimensional animation is obtained by scanning a series of postures of a certain motion of a human body to obtain a three-dimensional model sequence. The latest human Body Scanning devices and systems in the industry were developed in international conference "3D Body Scanning Technologies" held by lucerne, switzerland 10 months in 2015, most of which could complete the Scanning of the human Body static model at the speed of second. The method has the advantages that the data is true, but as the point cloud model is usually obtained by scanning, a large amount of post-processing is needed to obtain a grid model and maintain the topological structure of the grid between frames;

2. the principle of the method is to establish an optimization algorithm model, so that the transposing and the transforming of each patch in the given mesh model to the corresponding patch in each frame of the given three-dimensional animation are optimized. The advantage of this method is that the generated three-dimensional animation is very similar to the given animation, but the optimization needs to be solved for each operation, resulting in a large amount of calculation. In order to further optimize the operand, one method is based on a space-time division method and transposes and deforms each rigid space-time division module (a mesh patch and a vertex in the module are relatively static);

3. motion capture system, three-dimensional animation can be generated directly by a motion capture system, whose principle is to capture the spatial trajectory of the markers on the motion demonstration object. Motion capture systems such as Vicon and Qualisys can quickly generate real three-dimensional animations with stable topologies, but the cost of many millions is prohibitive for most users;

4. and (3) a linear hybrid algorithm, wherein the core of the method is to calculate a linear mapping relation between each mesh patch (or vertex) and each bone (or characteristic point), so that a three-dimensional animation is generated by driving a mesh model through the bones (or characteristic points). Given a grid model, a method maps a depth image acquired by a Kinect camera to a three-dimensional grid model, and the mapping relation is gradually optimized and updated according to real-time image data and a maximum expectation algorithm (EM algorithm), so that Kinect video-driven three-dimensional animation is realized for the given grid. Similarly, there are also many existing methods for mapping the facial expression feature points in the video to a parameterized mesh model controllable by the feature points, thereby implementing video-driven three-dimensional animation. The research result in the direction enables the three-dimensional animation generation mode to be more convenient and fast, and is beneficial to promoting the development of a plurality of related three-dimensional animation applications, such as the control of actors on virtual characters in 3D movies.

With the development of three-dimensional animation production and generation technologies, the increasing three-dimensional animation database provides sufficient data preparation for researching data-driven three-dimensional animation methods. It has been proposed to generate a complete animation from one action to another by concatenating similar poses (poses) as transitions of different three-dimensional animated sub-sequences of actions. However, it is well known that a typical data-driven algorithm is a learning algorithm that extracts features from training data. In recent years, deep learning algorithms have been widely used by researchers for pattern recognition and are applied to design a variety of data generation methods including generating picture descriptions, images, language dialogs, text, and the like. Some methods extract the content in the common photo and the style of the artistic picture by means of a convolutional neural network, and convert the common photo into the picture with artistic style by linear mixing of the content and the style. In some methods, a long-term memory model is used as a hidden unit of a recurrent neural network, so that the problems of gradient explosion and disappearance are avoided, and an automatic generation algorithm of a handwritten text is realized. One approach is to generate pictures by analyzing the process of people observing the picture features, in conjunction with a coding recurrent neural network to compress the picture and a decoding recurrent neural network to decode the output of the network. And modeling the behavior of the motion capture data by using a recurrent neural network, and predicting the motion trend of each joint point so as to realize prediction and generate the motion capture data. The common point of some newly proposed methods is that an algorithm for generating time series data such as texts and conversations is designed based on a deep neural network model. The method has great heuristic significance for researching a three-dimensional animation generation algorithm based on a deep neural network model.

By combining related research at home and abroad, we analyze and find that research based on a deep learning algorithm mainly focuses on the network structure design of a deep neural network model and the application aspects including pattern recognition of data such as images, texts and conversations and data generation, but is seriously insufficient in the discussion and research of related problems such as behavior analysis oriented to three-dimensional animation and application of data generation. In the research and system of the three-dimensional animation generation method based on the deep cycle neural network algorithm, the existing three-dimensional animation data set is used as the drive, the deep learning algorithm is used as the basis of the data analysis and feature extraction method, the front-edge work is carried out in the data-driven three-dimensional animation generation direction, and the breakthrough is made on the technology of relevant hot spots and difficult points.

Disclosure of Invention

The invention aims to provide a three-dimensional animation generation method based on a deep circulation neural network algorithm, which is a local feature descriptor oriented to a grid model and an animation and irrelevant to global displacement and rotation.

The invention is realized in this way, refer to figure 1, give out the technical route that the method is supposed to take, describe and regard existing three-dimensional cartoon data set as the drive, on the basis of extracting the dynamic characteristic of three-dimensional cartoon, fuse the technology of the neural network model of the depth to produce the three-dimensional cartoon, carry on the technical route of relevant theory and algorithm design, first, from theoretical basic research, design the three-dimensional cartoon dynamic descriptor in order to quantize the dynamic behavior information of the mesh patch of the three-dimensional cartoon, and then calculate the similarity between the three-dimensional cartoons, fuse the neural network model of the depth and propose the three-dimensional cartoon generating algorithm, including designing the coding method of the three-dimensional cartoon in order to meet the efficiency of the data processing of the neural network model, and designing the decoding method of the output result of the neural network model in order to finally produce the three-dimensional cartoon, the method is from characteristic extraction to, and forming a complete three-dimensional animation generation method based on the deep neural network model.

The invention has the technical effects that: 1. the invention has the main beneficial effects that the data-driven three-dimensional animation generation method is provided, based on real three-dimensional animation data, a deep neural network technology is fused, the dynamic characteristics of training data are extracted and loaded to a given grid model, so that animation generation is realized, although the deep neural network training usually needs to process a larger training sample and the training calculation amount is larger, once the training is completed and all network parameters are obtained, the three-dimensional animation can be rapidly and automatically generated for the given grid only through the calculation of the neural network model, and thus, the data-driven three-dimensional animation generation method has obvious effect in the long run; 2. the dynamic descriptor for the grid surface patch based on the engineering dependent variable has the advantages of high efficiency, independence on the position and direction of the three-dimensional model in the global coordinate system, capability of accurately describing the deformation of the three-dimensional model and the like, and is suitable for various application scenes such as behavior-based three-dimensional animation corresponding relation calculation, similarity comparison, model restoration and the like.

Drawings

Fig. 1 is a technical route diagram of the present invention.

FIG. 2 is a schematic diagram of the deformation and dynamic descriptor design for a three-dimensional animated mesh patch of the present invention.

FIG. 3 is a diagram of the basic architecture of the present invention.

FIG. 4 is a schematic diagram of a deep recurrent neural network model according to the present invention.

FIG. 5 is a schematic diagram of a long-term and short-term memory model according to the present invention.

FIG. 6 is a schematic diagram of three-dimensional animation generation according to the present invention.

Detailed Description

The advantages of the present invention will be described in detail below with reference to the accompanying fig. 1-6 and examples, which are intended to help the reader to better understand the essence of the present invention, but are not intended to limit the scope of the invention.

Next, the details of the respective modules are described separately:

a three-dimensional animation dynamic descriptor,

for any three-dimensional animation, a natural Pose sight (Neutral Pose) is given as a reference Pose for calculating the deformation quantity of each frame of grid surface of the three-dimensional animation.

First, assume a three-dimensional animation comprises T frames, each frame comprises a mesh animation of M patches and V vertices, i.e. a three-dimensional animation is regarded as a deformed and topologically fixed three-dimensional mesh, and thus, the V-th vertex in the T-th frame can be represented as

The mth patch in the tth frame can be represented as

M-1, …, M, V-1, …, V, T-1, …, T, natural gesture area may be considered frame 0, i.e., T-0.

Referring to FIG. 2, the top left represents a patch in the natural pose mesh

In the t-th frame after the top-right conversion for the patch

Wherein the transformation comprises affine transformation F and translational transformation d, and the expression is as follows:

wherein v is_iAnd

respectively represent patches atBefore and after transformation, corresponding vertex of the patch is noteworthy that translation transformation d is not a cause of patch deformation and needs to be removed from the expression, researchers usually obtain a unit normal vector by calculating the cross product of two edge vectors, and use the end point of the unit normal vector as the fourth vertex of the patch, so as to obtain a fourth equation similar to equation (1), and obtain a new fourth equation by respectively subtracting the new fourth equation from the expression in equation (1)

Wherein V ═ V₁-v₄ v₂-v₄ v₃-v₄],

And then, solving affine transformation F according to a deformation gradient theory in continuous medium mechanics to obtain the deformation of the surface patch.

Notably, the above affine transformation

Is a three-dimensional matrix, when the number of the surface patches of the grid model is large or the number of frames is large, the calculation amount required to be solved becomes a burden, and for further optimizing the operation performance, the surface patches are firstly put forward

Is spatially transposed to

In the same plane, and aligning a vertex and an edge, as shown in FIG. 2, to obtain a patch

And three vertexes corresponding to the three vertexes

With the translation transformation d removed, equation (1) can be simplified as follows:

the second and third equations are subtracted by the first equation to obtain a new equation

Wherein V ═ V₂-v₁ v₃-v₁],

According to the principles of continuous medium mechanics, the affine transformation F contains the rotational strain R and the tensile-shrinkage strain U of the patch, i.e. F ═ RU, since the preprocessing shown in fig. 2 has removed the rotational strain R, the process is therefore not free of the rotational strain R

By Principal Component Analysis (PCA) of F, two characteristic values λ₁And λ₂Respectively, characterizing the stretching and shrinking of the dough sheet, wherein₁≥1，0<λ₂Less than or equal to 1. It should be noted that, although the newly proposed method for designing the dependent variables has one more spatial transposition operation than the method of the expression (1) so as to solve the two-dimensional transformation matrix, the total operation amount is usually smaller than the principal component analysis for solving the three-dimensional matrix, and thus, the efficiency of the three-dimensional animation dynamic descriptor and the optimization method thereof designed by the invention is improved more remarkably for extracting the large-scale three-dimensional animation characteristics.

Thus, the deformation of each mesh patch in the time domain and the space domain in a three-dimensional animation can be formed by a binary group

Representing, respectively representing the stretching and shrinking of the surface patch, and further converting into the deformation of the mesh vertex because the similarity of the three-dimensional animation is studied subsequently and the mesh vertex is taken as a basic unit

For each vertex, find all patches containing the vertex, and then separately for λ₁And λ₂And calculating the average value.

A method for calculating the similarity of three-dimensional animation,

the descriptor of the three-dimensional animation provides a new way for comparing the three-dimensional animation and calculating the similarity, which is very important for designing a reasonable neural network loss function and evaluating the three-dimensional animation effect output by the invention.

For two given three-dimensional animations

And

the complete corresponding relation of the vertexes of the mesh model corresponding to the natural posture (note: many research achievements in the computer graphics field can be used for preprocessing the complete corresponding relation between the computational meshes) is expressed as d^v→C(d^v) Wherein d is^vTo represent

The v-th vertex of (C) (d)^v) To represent

All of (A) and (d)^vCorresponding set of vertices, three-dimensional animation

The v-th vertex in the t-th frame and

the difference between the corresponding vertices in (b) is defined as follows:

wherein V is 1, …, V, T is 1, …, T.

To represent

Neutralization of

Middle vertex

Corresponding k-th vertex, k being 1, …, k_v，(λ₁,λ₂) Representing a binary representation of the deformation descriptor of the v-th vertex in the t-th frame proposed by the invention, thereby a three-dimensional animation of two

And

the difference between the two is calculated as follows,

i.e., the sum of the descriptor differences between corresponding vertices in all time and space domains, the distance calculation method described above only considers the differences between corresponding vertices in all time and space domains

Corresponds to one vertex in

The possibility of multiple vertices, while ignoring the opposite, i.e.

Corresponds to one vertex in

A plurality of vertices, and thus, the above distance definition is asymmetric, i.e.,

s(H₀,H₁)≠s(H₁,H₀)，

to avoid this, the calculation can be done separately and averaged, i.e.,

by the method, the similarity of the designed three-dimensional animation meets the nonnegativity and the symmetry, because the grid surface patches react with local deformation, the action information of the animation is reflected by the observation of all the surface patches in a time-space domain, namely, the similarity between the three-dimensional animations reflects the action similarity to a certain extent, the similarity calculation method is an open framework, and other existing descriptors can be introduced according to needs to carry out linear combination according to importance, such as geodesic distance, shape diameter functions and the like, so that the more perfect similarity is defined, and a foundation is laid for designing a loss function of a neural network in a subsequent sequence.

The encoding and decoding of the three-dimensional animation,

as a machine learning tool, the deep neural network model needs to simplify large-scale three-dimensional animation input data to be used as algorithm input, and in the research of natural language processing, researchers propose a word vector method, such as a one-hot vector encoding method, to encode words. Different from words, the three-dimensional animation generation method is designed, each input network node corresponds to one frame of grid model, the data complexity and the scale are far larger than one word, high-dimensional data can seriously increase the algorithm calculation complexity, reduce the algorithm efficiency and even cause the problems of small samples and dimension disasters, and therefore the three-dimensional animation generation method provides the following encoding mode which is suitable for a neural network for the three-dimensional animation based on a dimension reduction method.

Three-dimensional animation is regarded as a deformed grid model in the time domain, wherein the t frame grid can be expressed as a vector

V is 1, …, V, wherein

Representing vertices

Thus, a three-dimensional animation can be represented as a matrix H with dimensions T × 3V, i.e., containing T3V-dimensional samples, each of which is a frame of three-dimensional model.

Many researches are made on a high-dimensional sample dimension reduction tool, wherein a classical dimension reduction method includes Principal Component Analysis (PCA), equidistant Feature Mapping (ISOMAP), Local Linear Embedding (LLE), Laplacian Eigenmapping (LE), and the like, wherein the PCA method is based on a matrix decomposition method, that is, original high-dimensional data is expressed as a Linear combination of a low-dimensional representation of a sample and a set of basis vectors (eigenvectors), and has the characteristics of high calculation speed and convenience in data reduction, and taking the PCA method as an example, a matrix H of a three-dimensional animation can be decomposed as follows:

H＝x^T×dB^d×3V，(6)

where x is the low-dimensional representation of each frame mesh, B is the basis vector matrix and is to be used for data reduction, and d represents the reduced low-dimension, whereby the t-th frame mesh of the three-dimensional animation is represented as a vector after being reduced in dimension

d is determined according to the proportion of the main component in all the components (i.e. the feature vector). For three-dimensional animation, d is generally less than or equal to 3 on the premise of ensuring that the proportion is not less than 90%.

Obviously, d is much less than 3V, so that the neural network can adequately receive three-dimensional animation as input. And B needs to be retained as basis vector matrix (see fig. 1) and used for decoding the network output result, i.e. decoding

Wherein

Representing the network model output, H_oRepresenting the three-dimensional animation generated after decoding. Due to the use of a common B^d×3VThe encoding and decoding mode ensures the consistency of the input and output three-dimensional animation data in the basis vector space.

A three-dimensional animation generation method based on a deep learning algorithm,

and a fusion deep learning algorithm, wherein a loss function of the deep neural network model is designed based on the three-dimensional animation similarity calculation method, and a training method of the deep neural network model is discussed, so that a data-driven three-dimensional animation generation algorithm is realized.

The traditional neural network is connected in a one-way mode from an input layer to a hidden node layer and then to an output layer, and nodes between each layer are not connected, so that the application of the neural network in a plurality of problems is limited, and particularly, the neural network is not careful when timing problems (speech recognition, text generation and the like) are processed, because the problems often have strong timing dependence. The deep cyclic neural Network (RNN) is different, and it constructs the connection between hidden layer nodes, thereby implementing the memory function of the Network to the preamble information, and establishing the correlation between the current output of a sequence and the preamble output.

Referring to fig. 3, the invention provides a three-dimensional animation generation method based on an RNN model, the model comprises an input layer, a hidden layer and an output layer, firstly, encoding preprocessing is performed on the input three-dimensional animation at the input layer, low-dimensional time sequence characteristics are extracted as the input of corresponding nodes of a neural network, corresponding low-dimensional time sequence output is obtained at the output nodes through the processing of the hidden layer, and decoding operation is performed on the output layer in combination with a grid model, so that dynamic information is loaded on a given three-dimensional grid, and thus the three-dimensional animation is obtained. (Note: not all the connections between nodes are shown in FIG. 3 for readability of the schematic, and the RNN model detailed connection diagram is shown in FIG. 4.)

By combining the characteristics of RNN, the three-dimensional animation generation algorithm has the main characteristics as follows:

and (3) extracting implicit characteristics of training data, wherein the multilayer feedforward neural network can approach a nonlinear function infinitely in principle as is well known. The model shown in fig. 3, except for the input layer and the output layer, has a hidden layer comprising a plurality of hidden nodes, and the hidden nodes are correlated with each other and affect the result of the output node together, so that the model has more sufficient hidden feature extraction capability than a basic feedforward neural network.

The calculation cost is reduced through parameter sharing, for the neural network, the calculation amount (especially in the training process) increases exponentially along with the number of network nodes, and the network structure of the RNN model is obviously much more complex than that of the traditional neural network structure. However, one of the main features of RNN is parameter sharing, as shown in fig. 3, the weight parameters on all edges of each time unit (corresponding to each column) are the same as the corresponding nodes in other time units, which greatly reduces the amount of computation in the network training process without losing the generality of the network structure, and also makes it more feasible to improve the learning accuracy by increasing the number of network layers.

The memory property and the three-dimensional animation are taken as a time sequence, the current data has dependency on preamble information, a traditional neural network model can only independently model single-frame data, but has no influence on information in a time dimension, as can be seen from an RNN model in FIG. 3, each hidden unit not only receives an input node and hidden nodes of a current time unit (current column), but also is influenced by the hidden nodes corresponding to the preambles (of course, the corresponding hidden nodes in the subsequent sequence are also influenced). This dependency across time units can be understood as implying the memory of the nodes.

By introducing the RNN model, a method for generating three-dimensional animation driven by design data becomes possible, and the method has many advantages, and for the designed three-dimensional animation generating algorithm based on the RNN model, the discussion of each functional part is as follows:

a) RNN-based three-dimensional animation generation algorithm network model structure,

as shown in fig. 1 and fig. 3, the three-dimensional animation is encoded to obtain low-dimensional data, the low-dimensional data is used as the input of the RNN model, the data passes through the nodes of the hidden layer, the same low-dimensional data is obtained at the output layer, the topological structure of the mesh model to which the animation is to be loaded is fused, and the three-dimensional animation is output. In FIG. 3, each input node corresponds to a frame grid, i.e., a vector

And the state of the hidden layer is determined by inputting the current word vector and the state (namely the historical state) of the corresponding node of the hidden layer at the last moment (the previous frame). The number of the output layer nodes is the same as that of the input layer nodes, and the output layer nodes and the input layer nodes are T.

The detailed connections between the layers in the RNN model are shown in fig. 4, where d is 2, and 3 hidden nodes in the hidden layer are taken as examples, and the input value of each node in the input layer is 2, that is, the input of the t-th node is a vector

The states of the nodes of the hidden layer and the connection relation between the nodes and the input nodes can be represented by the following formula:

where b represents a correction constant and f () represents an activation function. Common activation functions for neural network models are sigmoid function and tanh function, which generally gives better convergence in RNN model.

And (3) output layer state:

for the RNN model architecture, the training process is actually a process of confirming the strength (or called weight) of the network connection. Generally, the RNN defines a loss function according to the result of an output layer, and solves an optimization problem by fitting training data, so that optimal network parameters are obtained through training.

b) The function of the loss is a function of,

the optimization goal of the neural network model is usually to implement training of the neural network by defining a loss function and gradually adjusting network parameters according to errors, and the commonly used loss functions include Mean Squared Error (MSE) and softmax cost function. Since the three-dimensional animation similarity calculation method defined in equation (3,4,5) is closer to the MSE, the loss function is defined with reference to the MSE as follows:

the three-dimensional animation similarity calculation method defined in formula (5) is based on a new dynamic descriptor, and it is difficult to ensure the reasonableness of the evaluation of the generated three-dimensional animation since only the local shape variable relationship is considered and the geometric adjacency relationship between the mesh vertices is ignored. Referring to the definition of the similarity of the three-dimensional animation behaviors in the formulas (2) to (5), the similarity of the two three-dimensional animation vertex trajectories can be further defined as the superposition of the spatial Euler distances between all the corresponding vertices, and the expression is as follows:

similarly, to ensure the symmetry of the point trajectory distance, the mean value is calculated as follows:

thus, the target penalty function of the three-dimensional animation generation algorithm model is defined as follows:

L(x)＝α·simi_dynamic(H₀,H₁)+(1-α)·simi_traj(H₀,H₁)， (10)

wherein simi _ dynamic (H)₀,H₁) The similarity of dynamic behaviors between two three-dimensional animations is reflected; simi _ traj (H)₀,H₁) The difference of corresponding vertex trajectories between the two three-dimensional animations is represented, so that the geometric difference between the two three-dimensional animations is reflected more directly; alpha is an e [ 01 ∈]The method is a linear scale parameter and is used for controlling the influence degree of two influencing factors on the similarity of the three-dimensional animation.

It is worth noting that the penalty function defined in expression (10) takes into account the three-dimensional animation behavior and the geometric similarity (or difference) through dynamic descriptors and point trajectories, respectively, and in fact, there are still other optional descriptors for the behavior and geometric feature description of the three-dimensional animation. For example, global descriptors such as Shape histograms (Shape histograms), Spin images (Spin images), Spherical Harmonics maps (sphere Harmonics) and the like may be used to compare mesh morphologies. By being compatible with other descriptors and perfecting the loss function in the formula (10) according to the above process, the algorithm efficiency is improved to a certain extent. Of course, the calculation amount of the algorithm will also increase synchronously, and it can be seen that the loss function defined in the formula (10) is an open framework, and the optional three-dimensional model descriptor can be configured according to the performance requirement and the actual calculation condition.

c) An RNN model training method based on LSTM,

fig. 4 is a more detailed description of the RNN model in fig. 3, and as can be seen from fig. 4, the RNN is very similar to a Feed-Forward Neural Network (FNN) model, so that the model can be trained by referring to a conventional Back Propagation (BP) algorithm, however, when the model shown in fig. 4 is regarded as a normal FNN, the training time consumption exponential growth of the BP algorithm is caused or even cannot be solved due to the long time dimension. Due to the parameter sharing characteristic of the RNN structure, the connection weight value at each time is the same, so that the training of the model can be completed in reasonable time.

Another problem may be encountered during training: after using a batch of training samples, we need to derive and correct the weights by using gradient descent (gradient component) based on the calculated error, and the local gradient search space may have a huge gradient similar to a "cliff" or a "null" with a derivative close to 0, which is called "explosion and disappearance of gradient", however, in recent years, the RNN model of fusing Short Term Memory (LSTM), which is widely discussed by researchers, has been very effective to avoid this problem, as shown in fig. 5, and the LSTM model includes a Memory unit for storing information. The memory unit is composed of three gate neurons including an input gate, a forgetting gate and an output gate, and is used for respectively controlling writing, storing and reading operations. These three gates are all logic units (logic units). The other part of the neural network allows the contents to be written into the memory cell when the input gate is 1, and allows the contents to be read from the memory cell when the output gate is 1. When the forgetting gate is 1, the memory unit writes the content into itself, and when it is 0, the content in the memory unit is cleared, and by means of the control mechanism of the LSTM, the BP algorithm is executed from the moment a value is loaded into the memory unit until the value is read out from the memory unit and then cleared, in which process there is no need to worry about the burst or disappearance of the gradient because the error derivative is locally held.

By the scheme, the input three-dimensional animation is coded so as to apply the processing of the deep neural network model, the calculation method of the three-dimensional animation similarity is integrated into the loss function design of the deep neural network model, and the data-driven three-dimensional animation generation method is finally realized by decoding the network output.

The above-mentioned embodiments are merely illustrative of the preferred embodiments of the present invention, and do not limit the scope of the present invention, and various modifications and improvements of the technical solution of the present invention by those skilled in the art should fall within the protection scope defined by the claims of the present invention without departing from the spirit of the present invention.

Claims

1. A three-dimensional animation generation method based on a deep circulation neural network algorithm is characterized in that an existing three-dimensional animation data set is used as a drive, on the basis of extracting dynamic features of three-dimensional animation, a three-dimensional animation is generated by combining a deep neural network model technology, relevant theory and algorithm design is carried out,

firstly, designing a three-dimensional animation dynamic descriptor to quantize dynamic behavior information of a three-dimensional animation grid surface patch, further calculating similarity between three-dimensional animations, fusing a deep neural network model and providing a three-dimensional animation generation algorithm, wherein the three-dimensional animation generation algorithm comprises designing a three-dimensional animation coding method so as to meet the data processing efficiency of the neural network model and designing a decoding method of a neural network model output result so as to finally generate the three-dimensional animation, and the method comprises the steps of extracting characteristics to result output, evaluating a three-dimensional animation generation effect and training a neural network model to form a complete three-dimensional animation generation method of a base deep neural network model;

wherein designing the three-dimensional animation dynamic descriptor comprises:

for any three-dimensional animation, giving a natural posture as a reference posture for calculating the deformation quantity of the grid surface of each frame of the three-dimensional animation; starting from a single surface patch of the three-dimensional grid model, designing a three-dimensional animation dynamic descriptor capable of representing the dependent variable of the surface patch of each frame of the grid model; the method specifically comprises the following steps:

assuming that a three-dimensional animation includes T frames, each frame includes a mesh animation of M patches and V vertexes, the V-th vertex in the T-th frame can be expressed as

The mth patch in the tth frame can be represented as

Natural gestures can be viewed as frame 0, i.e., t equals 0, written as

First, each patch is removed by preprocessing

Relative to the natural posture patch

Obtaining a dough sheet by the rotational strain R

Solving affine transformation F to obtain patch

The obtained F is subjected to principal component analysis, and two characteristic values lambda representing the stretching and shrinking of the surface patch are solved₁And λ₂(ii) a Thus, the deformation of each mesh patch in the time domain and the space domain in a three-dimensional animation can be formed by a binary group

Represents; wherein the preprocessing comprises combining the dough pieces

Is spatially transposed to

The same plane is aligned with a vertex and an edge, thereby obtaining a patch

And three vertexes corresponding to the three vertexes

The designing of the three-dimensional animation coding method so as to meet the efficiency of data processing of the neural network model and the designing of the decoding method of the output result of the neural network model so as to finally generate the three-dimensional animation comprises the following steps: three-dimensional animation is regarded as a deformed grid model in the time domain, wherein the t frame grid can be expressed as a vector

Wherein

Representing vertices

The spatial coordinates of (a); thus, a three-dimensional animation can be represented as a matrix H with dimensions T × 3V, i.e., containing T3V dimensional samples, each sample being a frame of three-dimensional model;

by adopting a matrix decomposition method, original high-dimensional data is expressed as a linear combination of a low-dimensional representation of a sample and a group of basis vectors, and a matrix H of the three-dimensional animation can be decomposed as follows:

H＝x^T×dB^d×3V，

wherein x is a low-dimensional representation of each frame grid, B is a basis vector matrix and is to be used for data reduction, and d represents a reduced low-dimensional representation; thus, the t-th frame mesh of the three-dimensional animation is represented as a vector after being reduced in dimension

Wherein d is far less than 3V, so that the neural network can fully receive the three-dimensional animation as input; b is retained as a basis vector matrix and used for decoding the network output results, i.e.

Wherein

Representing the network model output, H_oRepresenting the three-dimensional animation generated after decoding.