CN106971414B - Three-dimensional animation generation method based on deep cycle neural network algorithm - Google Patents

Three-dimensional animation generation method based on deep cycle neural network algorithm Download PDF

Info

Publication number
CN106971414B
CN106971414B CN201710143013.8A CN201710143013A CN106971414B CN 106971414 B CN106971414 B CN 106971414B CN 201710143013 A CN201710143013 A CN 201710143013A CN 106971414 B CN106971414 B CN 106971414B
Authority
CN
China
Prior art keywords
dimensional animation
dimensional
neural network
animation
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710143013.8A
Other languages
Chinese (zh)
Other versions
CN106971414A (en
Inventor
罗国亮
项国雄
李玉华
易玉根
雷浩鹏
谢文强
姜永金
王金磊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
East China Jiaotong University
Original Assignee
East China Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by East China Jiaotong University filed Critical East China Jiaotong University
Priority to CN201710143013.8A priority Critical patent/CN106971414B/en
Publication of CN106971414A publication Critical patent/CN106971414A/en
Application granted granted Critical
Publication of CN106971414B publication Critical patent/CN106971414B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T13/00Animation
    • G06T13/203D [Three Dimensional] animation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2213/00Indexing scheme for animation
    • G06T2213/12Rule based animation

Abstract

A three-dimensional animation generating method based on a deep circulation neural network algorithm is characterized in that a technical route which is supposed to be taken by the method is given by referring to an attached figure 1, and a technical route which takes an existing three-dimensional animation data set as a drive, generates a three-dimensional animation by fusing a deep neural network model technology on the basis of extracting dynamic characteristics of the three-dimensional animation and carries out related theory and algorithm design is described.

Description

Three-dimensional animation generation method based on deep cycle neural network algorithm
Technical Field
The invention relates to the field of computers, in particular to a three-dimensional animation generation method based on a deep cycle neural network algorithm.
Background
Three-dimensional Animation Generation (3D Animation Generation): by inputting the three-dimensional model, the behaviors of the specified three-dimensional animation are automatically extracted and the given three-dimensional model is driven, so that the given model is driven to generate the three-dimensional animation with similar behaviors.
Artificial Neural Network (ANN): the artificial neural network model comprises an input layer, a hidden layer and an output layer, simulates the principle of a nervous system in the human brain, and realizes information processing by adjusting the interconnection relationship among nodes of each layer in the neural network model. The neural network model can theoretically achieve infinite approximation to any target function.
Deep Learning (DL): deep learning is a new research field in machine learning research, is an extension of an artificial neural network, and is a deep learning network structure comprising multiple hidden layers, so that the process of human cognition is further simulated. In recent years, tools for deep learning have been widely used in many fields such as computer vision, speech recognition, and natural language processing, and their effective information recognition ability has been sufficiently verified.
The invention is based on national natural science fund, project name: the data-driven three-dimensional animation generation method study based on the deep learning (project number: 61602222).
Three-dimensional animation data has been widely used in the fields of digital entertainment, movie and television, medicine, education, etc. because three-dimensional animation representation information has the characteristics of accuracy, authenticity and operability. For example, in the aspect of movie production, a 3D cartoon "catch monster" and "mahogany coming" shown on the bottom of 2015 successively strike a homemade movie box office record, which is one of the widely accepted impressions of three-dimensional animation. Driven by the strong demand of many application fields, three-dimensional animation has gradually become an important research object in the field of computer graphics. With the rapid development of motion capture and three-dimensional model scanning technologies, the acquisition modes of three-dimensional animation are diversified, researchers widely research and design methods for generating three-dimensional animation, and the generation of three-dimensional animation is no longer an expensive technology. As can be easily predicted, three-dimensional animation data is becoming another mainstream data carrier following characters, voice, pictures, and video, and research on methods for generating three-dimensional animation is becoming one of the main research subjects in the field of computer graphics.
With the popularization of three-dimensional animation, a method for generating animation conveniently and accurately becomes a research hotspot and a key point of the current academic circles at home and abroad. At present, there are professional modeling and animation tools such as 3ds Max, Maya, and 3D Blender, and there are also methods for scanning three-dimensional objects one frame by one frame to obtain three-dimensional animation. With the gradual development and maturity of motion capture technology and high-precision three-dimensional static model scanning technology and the lower and lower cost, the method for realizing three-dimensional animation generation by fusing motion capture technology and three-dimensional model and loading motion capture data for grid model has been widely researched and designed by researchers in recent years. With the increasing attention and research on three-dimensional animation generation methods, the existing online three-dimensional animation data gradually grow, so that the data-driven three-dimensional animation generation method becomes possible. The method is characterized in that features in training data are extracted, a learning algorithm model is fused, model parameters are perfected through the training data, and therefore similar data with the training data features are directly generated through the trained model. It can be seen that three-dimensional animation generation technologies are diversified, and three-dimensional animation data sets are gradually increased, so that an efficient and convenient data-driven three-dimensional animation generation method becomes an inevitable trend. Deep learning algorithms have been of great interest and application in recent years as data-driven machine learning methods. In 2016, 3 months, google AlphaGo developed based on deep learning technology won the game with absolute advantage in the "man-machine war" with the korean weiqi master lithage. Deep learning algorithms also provide a new opportunity and challenge for developing animation generation techniques. On one hand, based on the neural network model obtained by similar three-dimensional animation data training, a grid model is given as input, and the trained neural network model is used for generating a new three-dimensional animation, so that the time cost is lower, and the output result is more similar to the characteristics of the training data. On the other hand, the application of the existing deep neural network model to the generation of three-dimensional animation still faces some problems: 1) the descriptors describing the dynamic characteristics of the three-dimensional animation are limited; 2) along with the increase of the input data scale, the operation amount index of the neural network is increased, and the three-dimensional animation is usually large in data scale and cannot be directly used as the input of the neural network; 3) there is no uniform method or criterion for evaluating the generation effect of the three-dimensional animation. With the continuous increase of the demand of the application industries and fields on the three-dimensional animation, the requirements on the efficiency and the quality of the three-dimensional animation generation method are higher and higher. The above problems greatly limit the design of the three-dimensional animation generation method, and become one of the major bottlenecks in the development of the three-dimensional animation-related industry. How to design a method which can generate three-dimensional animation efficiently and conveniently while taking the above problems into consideration is a main problem to be solved by the research.
Research aiming at three-dimensional animation production and generation technology is also widely concerned by scholars at home and abroad, and the technology for generating the three-dimensional animation is gradually developed towards diversification, convenience and low cost. In general, the main categories and advantages and disadvantages of the existing three-dimensional animation generation technologies are listed as follows:
1. frame-frame scanning, which mainly relies on static scanning of a three-dimensional model, and three-dimensional animation is obtained by scanning a series of postures of a certain motion of a human body to obtain a three-dimensional model sequence. The latest human Body Scanning devices and systems in the industry were developed in international conference "3D Body Scanning Technologies" held by lucerne, switzerland 10 months in 2015, most of which could complete the Scanning of the human Body static model at the speed of second. The method has the advantages that the data is true, but as the point cloud model is usually obtained by scanning, a large amount of post-processing is needed to obtain a grid model and maintain the topological structure of the grid between frames;
2. the principle of the method is to establish an optimization algorithm model, so that the transposing and the transforming of each patch in the given mesh model to the corresponding patch in each frame of the given three-dimensional animation are optimized. The advantage of this method is that the generated three-dimensional animation is very similar to the given animation, but the optimization needs to be solved for each operation, resulting in a large amount of calculation. In order to further optimize the operand, one method is based on a space-time division method and transposes and deforms each rigid space-time division module (a mesh patch and a vertex in the module are relatively static);
3. motion capture system, three-dimensional animation can be generated directly by a motion capture system, whose principle is to capture the spatial trajectory of the markers on the motion demonstration object. Motion capture systems such as Vicon and Qualisys can quickly generate real three-dimensional animations with stable topologies, but the cost of many millions is prohibitive for most users;
4. and (3) a linear hybrid algorithm, wherein the core of the method is to calculate a linear mapping relation between each mesh patch (or vertex) and each bone (or characteristic point), so that a three-dimensional animation is generated by driving a mesh model through the bones (or characteristic points). Given a grid model, a method maps a depth image acquired by a Kinect camera to a three-dimensional grid model, and the mapping relation is gradually optimized and updated according to real-time image data and a maximum expectation algorithm (EM algorithm), so that Kinect video-driven three-dimensional animation is realized for the given grid. Similarly, there are also many existing methods for mapping the facial expression feature points in the video to a parameterized mesh model controllable by the feature points, thereby implementing video-driven three-dimensional animation. The research result in the direction enables the three-dimensional animation generation mode to be more convenient and fast, and is beneficial to promoting the development of a plurality of related three-dimensional animation applications, such as the control of actors on virtual characters in 3D movies.
With the development of three-dimensional animation production and generation technologies, the increasing three-dimensional animation database provides sufficient data preparation for researching data-driven three-dimensional animation methods. It has been proposed to generate a complete animation from one action to another by concatenating similar poses (poses) as transitions of different three-dimensional animated sub-sequences of actions. However, it is well known that a typical data-driven algorithm is a learning algorithm that extracts features from training data. In recent years, deep learning algorithms have been widely used by researchers for pattern recognition and are applied to design a variety of data generation methods including generating picture descriptions, images, language dialogs, text, and the like. Some methods extract the content in the common photo and the style of the artistic picture by means of a convolutional neural network, and convert the common photo into the picture with artistic style by linear mixing of the content and the style. In some methods, a long-term memory model is used as a hidden unit of a recurrent neural network, so that the problems of gradient explosion and disappearance are avoided, and an automatic generation algorithm of a handwritten text is realized. One approach is to generate pictures by analyzing the process of people observing the picture features, in conjunction with a coding recurrent neural network to compress the picture and a decoding recurrent neural network to decode the output of the network. And modeling the behavior of the motion capture data by using a recurrent neural network, and predicting the motion trend of each joint point so as to realize prediction and generate the motion capture data. The common point of some newly proposed methods is that an algorithm for generating time series data such as texts and conversations is designed based on a deep neural network model. The method has great heuristic significance for researching a three-dimensional animation generation algorithm based on a deep neural network model.
By combining related research at home and abroad, we analyze and find that research based on a deep learning algorithm mainly focuses on the network structure design of a deep neural network model and the application aspects including pattern recognition of data such as images, texts and conversations and data generation, but is seriously insufficient in the discussion and research of related problems such as behavior analysis oriented to three-dimensional animation and application of data generation. In the research and system of the three-dimensional animation generation method based on the deep cycle neural network algorithm, the existing three-dimensional animation data set is used as the drive, the deep learning algorithm is used as the basis of the data analysis and feature extraction method, the front-edge work is carried out in the data-driven three-dimensional animation generation direction, and the breakthrough is made on the technology of relevant hot spots and difficult points.
Disclosure of Invention
The invention aims to provide a three-dimensional animation generation method based on a deep circulation neural network algorithm, which is a local feature descriptor oriented to a grid model and an animation and irrelevant to global displacement and rotation.
The invention is realized in this way, refer to figure 1, give out the technical route that the method is supposed to take, describe and regard existing three-dimensional cartoon data set as the drive, on the basis of extracting the dynamic characteristic of three-dimensional cartoon, fuse the technology of the neural network model of the depth to produce the three-dimensional cartoon, carry on the technical route of relevant theory and algorithm design, first, from theoretical basic research, design the three-dimensional cartoon dynamic descriptor in order to quantize the dynamic behavior information of the mesh patch of the three-dimensional cartoon, and then calculate the similarity between the three-dimensional cartoons, fuse the neural network model of the depth and propose the three-dimensional cartoon generating algorithm, including designing the coding method of the three-dimensional cartoon in order to meet the efficiency of the data processing of the neural network model, and designing the decoding method of the output result of the neural network model in order to finally produce the three-dimensional cartoon, the method is from characteristic extraction to, and forming a complete three-dimensional animation generation method based on the deep neural network model.
The invention has the technical effects that: 1. the invention has the main beneficial effects that the data-driven three-dimensional animation generation method is provided, based on real three-dimensional animation data, a deep neural network technology is fused, the dynamic characteristics of training data are extracted and loaded to a given grid model, so that animation generation is realized, although the deep neural network training usually needs to process a larger training sample and the training calculation amount is larger, once the training is completed and all network parameters are obtained, the three-dimensional animation can be rapidly and automatically generated for the given grid only through the calculation of the neural network model, and thus, the data-driven three-dimensional animation generation method has obvious effect in the long run; 2. the dynamic descriptor for the grid surface patch based on the engineering dependent variable has the advantages of high efficiency, independence on the position and direction of the three-dimensional model in the global coordinate system, capability of accurately describing the deformation of the three-dimensional model and the like, and is suitable for various application scenes such as behavior-based three-dimensional animation corresponding relation calculation, similarity comparison, model restoration and the like.
Drawings
Fig. 1 is a technical route diagram of the present invention.
FIG. 2 is a schematic diagram of the deformation and dynamic descriptor design for a three-dimensional animated mesh patch of the present invention.
FIG. 3 is a diagram of the basic architecture of the present invention.
FIG. 4 is a schematic diagram of a deep recurrent neural network model according to the present invention.
FIG. 5 is a schematic diagram of a long-term and short-term memory model according to the present invention.
FIG. 6 is a schematic diagram of three-dimensional animation generation according to the present invention.
Detailed Description
The advantages of the present invention will be described in detail below with reference to the accompanying fig. 1-6 and examples, which are intended to help the reader to better understand the essence of the present invention, but are not intended to limit the scope of the invention.
The invention is realized in this way, refer to figure 1, give out the technical route that the method is supposed to take, describe and regard existing three-dimensional cartoon data set as the drive, on the basis of extracting the dynamic characteristic of three-dimensional cartoon, fuse the technology of the neural network model of the depth to produce the three-dimensional cartoon, carry on the technical route of relevant theory and algorithm design, first, from theoretical basic research, design the three-dimensional cartoon dynamic descriptor in order to quantize the dynamic behavior information of the mesh patch of the three-dimensional cartoon, and then calculate the similarity between the three-dimensional cartoons, fuse the neural network model of the depth and propose the three-dimensional cartoon generating algorithm, including designing the coding method of the three-dimensional cartoon in order to meet the efficiency of the data processing of the neural network model, and designing the decoding method of the output result of the neural network model in order to finally produce the three-dimensional cartoon, the method is from characteristic extraction to, and forming a complete three-dimensional animation generation method based on the deep neural network model.
Next, the details of the respective modules are described separately:
a three-dimensional animation dynamic descriptor,
for any three-dimensional animation, a natural Pose sight (Neutral Pose) is given as a reference Pose for calculating the deformation quantity of each frame of grid surface of the three-dimensional animation.
First, assume a three-dimensional animation comprises T frames, each frame comprises a mesh animation of M patches and V vertices, i.e. a three-dimensional animation is regarded as a deformed and topologically fixed three-dimensional mesh, and thus, the V-th vertex in the T-th frame can be represented as
Figure BDA0001243377880000081
The mth patch in the tth frame can be represented as
Figure BDA0001243377880000082
M-1, …, M, V-1, …, V, T-1, …, T, natural gesture area may be considered frame 0, i.e., T-0.
Referring to FIG. 2, the top left represents a patch in the natural pose mesh
Figure BDA0001243377880000083
In the t-th frame after the top-right conversion for the patch
Figure BDA0001243377880000084
Wherein the transformation comprises affine transformation F and translational transformation d, and the expression is as follows:
Figure BDA0001243377880000085
wherein v isiAnd
Figure BDA0001243377880000086
respectively represent patches atBefore and after transformation, corresponding vertex of the patch is noteworthy that translation transformation d is not a cause of patch deformation and needs to be removed from the expression, researchers usually obtain a unit normal vector by calculating the cross product of two edge vectors, and use the end point of the unit normal vector as the fourth vertex of the patch, so as to obtain a fourth equation similar to equation (1), and obtain a new fourth equation by respectively subtracting the new fourth equation from the expression in equation (1)
Figure BDA0001243377880000087
Wherein V ═ V1-v4 v2-v4 v3-v4],
Figure BDA0001243377880000088
And then, solving affine transformation F according to a deformation gradient theory in continuous medium mechanics to obtain the deformation of the surface patch.
Notably, the above affine transformation
Figure BDA0001243377880000091
Is a three-dimensional matrix, when the number of the surface patches of the grid model is large or the number of frames is large, the calculation amount required to be solved becomes a burden, and for further optimizing the operation performance, the surface patches are firstly put forward
Figure BDA0001243377880000092
Is spatially transposed to
Figure BDA0001243377880000093
In the same plane, and aligning a vertex and an edge, as shown in FIG. 2, to obtain a patch
Figure BDA0001243377880000094
And three vertexes corresponding to the three vertexes
Figure BDA0001243377880000095
With the translation transformation d removed, equation (1) can be simplified as follows:
Figure BDA0001243377880000096
the second and third equations are subtracted by the first equation to obtain a new equation
Figure BDA0001243377880000097
Wherein V ═ V2-v1 v3-v1],
Figure BDA0001243377880000098
According to the principles of continuous medium mechanics, the affine transformation F contains the rotational strain R and the tensile-shrinkage strain U of the patch, i.e. F ═ RU, since the preprocessing shown in fig. 2 has removed the rotational strain R, the process is therefore not free of the rotational strain R
Figure BDA0001243377880000099
By Principal Component Analysis (PCA) of F, two characteristic values λ1And λ2Respectively, characterizing the stretching and shrinking of the dough sheet, wherein1≥1,0<λ2Less than or equal to 1. It should be noted that, although the newly proposed method for designing the dependent variables has one more spatial transposition operation than the method of the expression (1) so as to solve the two-dimensional transformation matrix, the total operation amount is usually smaller than the principal component analysis for solving the three-dimensional matrix, and thus, the efficiency of the three-dimensional animation dynamic descriptor and the optimization method thereof designed by the invention is improved more remarkably for extracting the large-scale three-dimensional animation characteristics.
Thus, the deformation of each mesh patch in the time domain and the space domain in a three-dimensional animation can be formed by a binary group
Figure BDA00012433778800000910
Representing, respectively representing the stretching and shrinking of the surface patch, and further converting into the deformation of the mesh vertex because the similarity of the three-dimensional animation is studied subsequently and the mesh vertex is taken as a basic unit
Figure BDA00012433778800000911
For each vertex, find all patches containing the vertex, and then separately for λ1And λ2And calculating the average value.
A method for calculating the similarity of three-dimensional animation,
the descriptor of the three-dimensional animation provides a new way for comparing the three-dimensional animation and calculating the similarity, which is very important for designing a reasonable neural network loss function and evaluating the three-dimensional animation effect output by the invention.
For two given three-dimensional animations
Figure BDA00012433778800000912
And
Figure BDA00012433778800000913
the complete corresponding relation of the vertexes of the mesh model corresponding to the natural posture (note: many research achievements in the computer graphics field can be used for preprocessing the complete corresponding relation between the computational meshes) is expressed as dv→C(dv) Wherein d isvTo represent
Figure BDA00012433778800000914
The v-th vertex of (C) (d)v) To represent
Figure BDA00012433778800000915
All of (A) and (d)vCorresponding set of vertices, three-dimensional animation
Figure BDA00012433778800000916
The v-th vertex in the t-th frame and
Figure BDA00012433778800000917
the difference between the corresponding vertices in (b) is defined as follows:
Figure BDA00012433778800000918
wherein V is 1, …, V, T is 1, …, T.
Figure BDA00012433778800000919
To represent
Figure BDA00012433778800000920
Neutralization of
Figure BDA00012433778800000921
Middle vertex
Figure BDA00012433778800000922
Corresponding k-th vertex, k being 1, …, kv,(λ12) Representing a binary representation of the deformation descriptor of the v-th vertex in the t-th frame proposed by the invention, thereby a three-dimensional animation of two
Figure BDA0001243377880000101
And
Figure BDA0001243377880000102
the difference between the two is calculated as follows,
Figure BDA0001243377880000103
i.e., the sum of the descriptor differences between corresponding vertices in all time and space domains, the distance calculation method described above only considers the differences between corresponding vertices in all time and space domains
Figure BDA0001243377880000104
Corresponds to one vertex in
Figure BDA0001243377880000105
The possibility of multiple vertices, while ignoring the opposite, i.e.
Figure BDA0001243377880000106
Corresponds to one vertex in
Figure BDA0001243377880000107
A plurality of vertices, and thus, the above distance definition is asymmetric, i.e.,
s(H0,H1)≠s(H1,H0),
to avoid this, the calculation can be done separately and averaged, i.e.,
Figure BDA0001243377880000108
by the method, the similarity of the designed three-dimensional animation meets the nonnegativity and the symmetry, because the grid surface patches react with local deformation, the action information of the animation is reflected by the observation of all the surface patches in a time-space domain, namely, the similarity between the three-dimensional animations reflects the action similarity to a certain extent, the similarity calculation method is an open framework, and other existing descriptors can be introduced according to needs to carry out linear combination according to importance, such as geodesic distance, shape diameter functions and the like, so that the more perfect similarity is defined, and a foundation is laid for designing a loss function of a neural network in a subsequent sequence.
The encoding and decoding of the three-dimensional animation,
as a machine learning tool, the deep neural network model needs to simplify large-scale three-dimensional animation input data to be used as algorithm input, and in the research of natural language processing, researchers propose a word vector method, such as a one-hot vector encoding method, to encode words. Different from words, the three-dimensional animation generation method is designed, each input network node corresponds to one frame of grid model, the data complexity and the scale are far larger than one word, high-dimensional data can seriously increase the algorithm calculation complexity, reduce the algorithm efficiency and even cause the problems of small samples and dimension disasters, and therefore the three-dimensional animation generation method provides the following encoding mode which is suitable for a neural network for the three-dimensional animation based on a dimension reduction method.
Three-dimensional animation is regarded as a deformed grid model in the time domain, wherein the t frame grid can be expressed as a vector
Figure BDA0001243377880000109
V is 1, …, V, wherein
Figure BDA00012433778800001010
Representing vertices
Figure BDA00012433778800001011
Thus, a three-dimensional animation can be represented as a matrix H with dimensions T × 3V, i.e., containing T3V-dimensional samples, each of which is a frame of three-dimensional model.
Many researches are made on a high-dimensional sample dimension reduction tool, wherein a classical dimension reduction method includes Principal Component Analysis (PCA), equidistant Feature Mapping (ISOMAP), Local Linear Embedding (LLE), Laplacian Eigenmapping (LE), and the like, wherein the PCA method is based on a matrix decomposition method, that is, original high-dimensional data is expressed as a Linear combination of a low-dimensional representation of a sample and a set of basis vectors (eigenvectors), and has the characteristics of high calculation speed and convenience in data reduction, and taking the PCA method as an example, a matrix H of a three-dimensional animation can be decomposed as follows:
H=xT×dBd×3V,(6)
where x is the low-dimensional representation of each frame mesh, B is the basis vector matrix and is to be used for data reduction, and d represents the reduced low-dimension, whereby the t-th frame mesh of the three-dimensional animation is represented as a vector after being reduced in dimension
Figure BDA0001243377880000111
d is determined according to the proportion of the main component in all the components (i.e. the feature vector). For three-dimensional animation, d is generally less than or equal to 3 on the premise of ensuring that the proportion is not less than 90%.
Obviously, d is much less than 3V, so that the neural network can adequately receive three-dimensional animation as input. And B needs to be retained as basis vector matrix (see fig. 1) and used for decoding the network output result, i.e. decoding
Figure BDA0001243377880000112
Wherein
Figure BDA0001243377880000113
Representing the network model output, HoRepresenting the three-dimensional animation generated after decoding. Due to the use of a common Bd×3VThe encoding and decoding mode ensures the consistency of the input and output three-dimensional animation data in the basis vector space.
A three-dimensional animation generation method based on a deep learning algorithm,
and a fusion deep learning algorithm, wherein a loss function of the deep neural network model is designed based on the three-dimensional animation similarity calculation method, and a training method of the deep neural network model is discussed, so that a data-driven three-dimensional animation generation algorithm is realized.
The traditional neural network is connected in a one-way mode from an input layer to a hidden node layer and then to an output layer, and nodes between each layer are not connected, so that the application of the neural network in a plurality of problems is limited, and particularly, the neural network is not careful when timing problems (speech recognition, text generation and the like) are processed, because the problems often have strong timing dependence. The deep cyclic neural Network (RNN) is different, and it constructs the connection between hidden layer nodes, thereby implementing the memory function of the Network to the preamble information, and establishing the correlation between the current output of a sequence and the preamble output.
Referring to fig. 3, the invention provides a three-dimensional animation generation method based on an RNN model, the model comprises an input layer, a hidden layer and an output layer, firstly, encoding preprocessing is performed on the input three-dimensional animation at the input layer, low-dimensional time sequence characteristics are extracted as the input of corresponding nodes of a neural network, corresponding low-dimensional time sequence output is obtained at the output nodes through the processing of the hidden layer, and decoding operation is performed on the output layer in combination with a grid model, so that dynamic information is loaded on a given three-dimensional grid, and thus the three-dimensional animation is obtained. (Note: not all the connections between nodes are shown in FIG. 3 for readability of the schematic, and the RNN model detailed connection diagram is shown in FIG. 4.)
By combining the characteristics of RNN, the three-dimensional animation generation algorithm has the main characteristics as follows:
and (3) extracting implicit characteristics of training data, wherein the multilayer feedforward neural network can approach a nonlinear function infinitely in principle as is well known. The model shown in fig. 3, except for the input layer and the output layer, has a hidden layer comprising a plurality of hidden nodes, and the hidden nodes are correlated with each other and affect the result of the output node together, so that the model has more sufficient hidden feature extraction capability than a basic feedforward neural network.
The calculation cost is reduced through parameter sharing, for the neural network, the calculation amount (especially in the training process) increases exponentially along with the number of network nodes, and the network structure of the RNN model is obviously much more complex than that of the traditional neural network structure. However, one of the main features of RNN is parameter sharing, as shown in fig. 3, the weight parameters on all edges of each time unit (corresponding to each column) are the same as the corresponding nodes in other time units, which greatly reduces the amount of computation in the network training process without losing the generality of the network structure, and also makes it more feasible to improve the learning accuracy by increasing the number of network layers.
The memory property and the three-dimensional animation are taken as a time sequence, the current data has dependency on preamble information, a traditional neural network model can only independently model single-frame data, but has no influence on information in a time dimension, as can be seen from an RNN model in FIG. 3, each hidden unit not only receives an input node and hidden nodes of a current time unit (current column), but also is influenced by the hidden nodes corresponding to the preambles (of course, the corresponding hidden nodes in the subsequent sequence are also influenced). This dependency across time units can be understood as implying the memory of the nodes.
By introducing the RNN model, a method for generating three-dimensional animation driven by design data becomes possible, and the method has many advantages, and for the designed three-dimensional animation generating algorithm based on the RNN model, the discussion of each functional part is as follows:
a) RNN-based three-dimensional animation generation algorithm network model structure,
as shown in fig. 1 and fig. 3, the three-dimensional animation is encoded to obtain low-dimensional data, the low-dimensional data is used as the input of the RNN model, the data passes through the nodes of the hidden layer, the same low-dimensional data is obtained at the output layer, the topological structure of the mesh model to which the animation is to be loaded is fused, and the three-dimensional animation is output. In FIG. 3, each input node corresponds to a frame grid, i.e., a vector
Figure BDA0001243377880000121
And the state of the hidden layer is determined by inputting the current word vector and the state (namely the historical state) of the corresponding node of the hidden layer at the last moment (the previous frame). The number of the output layer nodes is the same as that of the input layer nodes, and the output layer nodes and the input layer nodes are T.
The detailed connections between the layers in the RNN model are shown in fig. 4, where d is 2, and 3 hidden nodes in the hidden layer are taken as examples, and the input value of each node in the input layer is 2, that is, the input of the t-th node is a vector
Figure BDA0001243377880000122
The states of the nodes of the hidden layer and the connection relation between the nodes and the input nodes can be represented by the following formula:
Figure BDA0001243377880000123
Figure BDA0001243377880000124
Figure BDA0001243377880000125
where b represents a correction constant and f () represents an activation function. Common activation functions for neural network models are sigmoid function and tanh function, which generally gives better convergence in RNN model.
And (3) output layer state:
Figure BDA0001243377880000126
for the RNN model architecture, the training process is actually a process of confirming the strength (or called weight) of the network connection. Generally, the RNN defines a loss function according to the result of an output layer, and solves an optimization problem by fitting training data, so that optimal network parameters are obtained through training.
b) The function of the loss is a function of,
the optimization goal of the neural network model is usually to implement training of the neural network by defining a loss function and gradually adjusting network parameters according to errors, and the commonly used loss functions include Mean Squared Error (MSE) and softmax cost function. Since the three-dimensional animation similarity calculation method defined in equation (3,4,5) is closer to the MSE, the loss function is defined with reference to the MSE as follows:
the three-dimensional animation similarity calculation method defined in formula (5) is based on a new dynamic descriptor, and it is difficult to ensure the reasonableness of the evaluation of the generated three-dimensional animation since only the local shape variable relationship is considered and the geometric adjacency relationship between the mesh vertices is ignored. Referring to the definition of the similarity of the three-dimensional animation behaviors in the formulas (2) to (5), the similarity of the two three-dimensional animation vertex trajectories can be further defined as the superposition of the spatial Euler distances between all the corresponding vertices, and the expression is as follows:
Figure BDA0001243377880000131
similarly, to ensure the symmetry of the point trajectory distance, the mean value is calculated as follows:
Figure BDA0001243377880000132
thus, the target penalty function of the three-dimensional animation generation algorithm model is defined as follows:
L(x)=α·simi_dynamic(H0,H1)+(1-α)·simi_traj(H0,H1), (10)
wherein simi _ dynamic (H)0,H1) The similarity of dynamic behaviors between two three-dimensional animations is reflected; simi _ traj (H)0,H1) The difference of corresponding vertex trajectories between the two three-dimensional animations is represented, so that the geometric difference between the two three-dimensional animations is reflected more directly; alpha is an e [ 01 ∈]The method is a linear scale parameter and is used for controlling the influence degree of two influencing factors on the similarity of the three-dimensional animation.
It is worth noting that the penalty function defined in expression (10) takes into account the three-dimensional animation behavior and the geometric similarity (or difference) through dynamic descriptors and point trajectories, respectively, and in fact, there are still other optional descriptors for the behavior and geometric feature description of the three-dimensional animation. For example, global descriptors such as Shape histograms (Shape histograms), Spin images (Spin images), Spherical Harmonics maps (sphere Harmonics) and the like may be used to compare mesh morphologies. By being compatible with other descriptors and perfecting the loss function in the formula (10) according to the above process, the algorithm efficiency is improved to a certain extent. Of course, the calculation amount of the algorithm will also increase synchronously, and it can be seen that the loss function defined in the formula (10) is an open framework, and the optional three-dimensional model descriptor can be configured according to the performance requirement and the actual calculation condition.
c) An RNN model training method based on LSTM,
fig. 4 is a more detailed description of the RNN model in fig. 3, and as can be seen from fig. 4, the RNN is very similar to a Feed-Forward Neural Network (FNN) model, so that the model can be trained by referring to a conventional Back Propagation (BP) algorithm, however, when the model shown in fig. 4 is regarded as a normal FNN, the training time consumption exponential growth of the BP algorithm is caused or even cannot be solved due to the long time dimension. Due to the parameter sharing characteristic of the RNN structure, the connection weight value at each time is the same, so that the training of the model can be completed in reasonable time.
Another problem may be encountered during training: after using a batch of training samples, we need to derive and correct the weights by using gradient descent (gradient component) based on the calculated error, and the local gradient search space may have a huge gradient similar to a "cliff" or a "null" with a derivative close to 0, which is called "explosion and disappearance of gradient", however, in recent years, the RNN model of fusing Short Term Memory (LSTM), which is widely discussed by researchers, has been very effective to avoid this problem, as shown in fig. 5, and the LSTM model includes a Memory unit for storing information. The memory unit is composed of three gate neurons including an input gate, a forgetting gate and an output gate, and is used for respectively controlling writing, storing and reading operations. These three gates are all logic units (logic units). The other part of the neural network allows the contents to be written into the memory cell when the input gate is 1, and allows the contents to be read from the memory cell when the output gate is 1. When the forgetting gate is 1, the memory unit writes the content into itself, and when it is 0, the content in the memory unit is cleared, and by means of the control mechanism of the LSTM, the BP algorithm is executed from the moment a value is loaded into the memory unit until the value is read out from the memory unit and then cleared, in which process there is no need to worry about the burst or disappearance of the gradient because the error derivative is locally held.
By the scheme, the input three-dimensional animation is coded so as to apply the processing of the deep neural network model, the calculation method of the three-dimensional animation similarity is integrated into the loss function design of the deep neural network model, and the data-driven three-dimensional animation generation method is finally realized by decoding the network output.
The above-mentioned embodiments are merely illustrative of the preferred embodiments of the present invention, and do not limit the scope of the present invention, and various modifications and improvements of the technical solution of the present invention by those skilled in the art should fall within the protection scope defined by the claims of the present invention without departing from the spirit of the present invention.

Claims (1)

1. A three-dimensional animation generation method based on a deep circulation neural network algorithm is characterized in that an existing three-dimensional animation data set is used as a drive, on the basis of extracting dynamic features of three-dimensional animation, a three-dimensional animation is generated by combining a deep neural network model technology, relevant theory and algorithm design is carried out,
firstly, designing a three-dimensional animation dynamic descriptor to quantize dynamic behavior information of a three-dimensional animation grid surface patch, further calculating similarity between three-dimensional animations, fusing a deep neural network model and providing a three-dimensional animation generation algorithm, wherein the three-dimensional animation generation algorithm comprises designing a three-dimensional animation coding method so as to meet the data processing efficiency of the neural network model and designing a decoding method of a neural network model output result so as to finally generate the three-dimensional animation, and the method comprises the steps of extracting characteristics to result output, evaluating a three-dimensional animation generation effect and training a neural network model to form a complete three-dimensional animation generation method of a base deep neural network model;
wherein designing the three-dimensional animation dynamic descriptor comprises:
for any three-dimensional animation, giving a natural posture as a reference posture for calculating the deformation quantity of the grid surface of each frame of the three-dimensional animation; starting from a single surface patch of the three-dimensional grid model, designing a three-dimensional animation dynamic descriptor capable of representing the dependent variable of the surface patch of each frame of the grid model; the method specifically comprises the following steps:
assuming that a three-dimensional animation includes T frames, each frame includes a mesh animation of M patches and V vertexes, the V-th vertex in the T-th frame can be expressed as
Figure FDA0002822380550000011
The mth patch in the tth frame can be represented as
Figure FDA0002822380550000012
Figure FDA0002822380550000013
Natural gestures can be viewed as frame 0, i.e., t equals 0, written as
Figure FDA0002822380550000014
First, each patch is removed by preprocessing
Figure FDA0002822380550000015
Relative to the natural posture patch
Figure FDA0002822380550000016
Obtaining a dough sheet by the rotational strain R
Figure FDA0002822380550000017
Solving affine transformation F to obtain patch
Figure FDA0002822380550000018
The obtained F is subjected to principal component analysis, and two characteristic values lambda representing the stretching and shrinking of the surface patch are solved1And λ2(ii) a Thus, the deformation of each mesh patch in the time domain and the space domain in a three-dimensional animation can be formed by a binary group
Figure FDA0002822380550000019
Represents; wherein the preprocessing comprises combining the dough pieces
Figure FDA00028223805500000110
Is spatially transposed to
Figure FDA00028223805500000111
The same plane is aligned with a vertex and an edge, thereby obtaining a patch
Figure FDA00028223805500000112
And three vertexes corresponding to the three vertexes
Figure FDA00028223805500000113
The designing of the three-dimensional animation coding method so as to meet the efficiency of data processing of the neural network model and the designing of the decoding method of the output result of the neural network model so as to finally generate the three-dimensional animation comprises the following steps: three-dimensional animation is regarded as a deformed grid model in the time domain, wherein the t frame grid can be expressed as a vector
Figure FDA0002822380550000021
Wherein
Figure FDA0002822380550000022
Representing vertices
Figure FDA0002822380550000023
The spatial coordinates of (a); thus, a three-dimensional animation can be represented as a matrix H with dimensions T × 3V, i.e., containing T3V dimensional samples, each sample being a frame of three-dimensional model;
by adopting a matrix decomposition method, original high-dimensional data is expressed as a linear combination of a low-dimensional representation of a sample and a group of basis vectors, and a matrix H of the three-dimensional animation can be decomposed as follows:
H=xT×dBd×3V
wherein x is a low-dimensional representation of each frame grid, B is a basis vector matrix and is to be used for data reduction, and d represents a reduced low-dimensional representation; thus, the t-th frame mesh of the three-dimensional animation is represented as a vector after being reduced in dimension
Figure FDA0002822380550000024
Wherein d is far less than 3V, so that the neural network can fully receive the three-dimensional animation as input; b is retained as a basis vector matrix and used for decoding the network output results, i.e.
Figure FDA0002822380550000025
Wherein
Figure FDA0002822380550000026
Representing the network model output, HoRepresenting the three-dimensional animation generated after decoding.
CN201710143013.8A 2017-03-10 2017-03-10 Three-dimensional animation generation method based on deep cycle neural network algorithm Active CN106971414B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710143013.8A CN106971414B (en) 2017-03-10 2017-03-10 Three-dimensional animation generation method based on deep cycle neural network algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710143013.8A CN106971414B (en) 2017-03-10 2017-03-10 Three-dimensional animation generation method based on deep cycle neural network algorithm

Publications (2)

Publication Number Publication Date
CN106971414A CN106971414A (en) 2017-07-21
CN106971414B true CN106971414B (en) 2021-02-23

Family

ID=59328890

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710143013.8A Active CN106971414B (en) 2017-03-10 2017-03-10 Three-dimensional animation generation method based on deep cycle neural network algorithm

Country Status (1)

Country Link
CN (1) CN106971414B (en)

Families Citing this family (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108230431B (en) * 2018-01-24 2022-07-12 深圳市云之梦科技有限公司 Human body action animation generation method and system of two-dimensional virtual image
CN108769361B (en) * 2018-04-03 2020-10-27 华为技术有限公司 Control method of terminal wallpaper, terminal and computer-readable storage medium
CN108734182B (en) * 2018-06-13 2022-04-05 大连海事大学 Interest feature identification detection method based on small data sample learning
CN109602421A (en) * 2019-01-04 2019-04-12 平安科技(深圳)有限公司 Health monitor method, device and computer readable storage medium
CN109925715B (en) * 2019-01-29 2021-11-16 腾讯科技(深圳)有限公司 Virtual water area generation method and device and terminal
CN111028335B (en) * 2019-11-26 2021-10-29 浙江大学 Point cloud data block surface patch reconstruction method based on deep learning
CN112991498B (en) * 2019-12-13 2023-05-23 上海懿百教育科技有限公司 System and method for rapidly generating lens animation
CN111340917B (en) * 2020-02-11 2023-02-28 腾讯科技(深圳)有限公司 Three-dimensional animation generation method and device, storage medium and computer equipment
CN111696178A (en) * 2020-05-06 2020-09-22 广东康云科技有限公司 Method, device and medium for generating portrait three-dimensional model and simulated portrait animation
CN112214915B (en) * 2020-09-25 2024-03-12 汕头大学 Method for determining nonlinear stress-strain relation of material
CN112529268B (en) * 2020-11-28 2023-06-27 广西大学 Medium-short term load prediction method and device based on manifold learning
CN112668559B (en) * 2021-03-15 2021-06-18 冠传网络科技(南京)有限公司 Multi-mode information fusion short video emotion judgment device and method
CN113436299B (en) * 2021-07-26 2023-04-07 网易(杭州)网络有限公司 Animation generation method, animation generation device, storage medium and electronic equipment
CN117333592B (en) * 2023-12-01 2024-03-08 北京妙音数科股份有限公司 AI digital population type animation drawing system based on big data fusion training model

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101216948A (en) * 2008-01-14 2008-07-09 浙江大学 Cartoon animation fabrication method based on video extracting and reusing
CN103218842A (en) * 2013-03-12 2013-07-24 西南交通大学 Voice synchronous-drive three-dimensional face mouth shape and face posture animation method
CN104541306A (en) * 2013-08-02 2015-04-22 奥克兰单一服务有限公司 System for neurobehavioural animation
CN106203624A (en) * 2016-06-23 2016-12-07 上海交通大学 Vector Quantization based on deep neural network and method
CN106447748A (en) * 2016-09-14 2017-02-22 厦门幻世网络科技有限公司 Method and device for generating animation data

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7372463B2 (en) * 2004-04-09 2008-05-13 Paul Vivek Anand Method and system for intelligent scalable animation with intelligent parallel processing engine and intelligent animation engine

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101216948A (en) * 2008-01-14 2008-07-09 浙江大学 Cartoon animation fabrication method based on video extracting and reusing
CN103218842A (en) * 2013-03-12 2013-07-24 西南交通大学 Voice synchronous-drive three-dimensional face mouth shape and face posture animation method
CN104541306A (en) * 2013-08-02 2015-04-22 奥克兰单一服务有限公司 System for neurobehavioural animation
CN106203624A (en) * 2016-06-23 2016-12-07 上海交通大学 Vector Quantization based on deep neural network and method
CN106447748A (en) * 2016-09-14 2017-02-22 厦门幻世网络科技有限公司 Method and device for generating animation data

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
一种高效的动画生成技术;林书新;《视频应用与工程》;20111231;第35卷(第21期);第1-5节 *
基于单张照片的人脸动画系统;杜志军等;《第四届全国数字娱乐与艺术会议》;20100929;全文 *
基于神经网络的单张照片三维人脸建模;郭洋;《中国优秀硕士学位论文全文数据库信息科技辑》;20110915;第1,19-49页 *

Also Published As

Publication number Publication date
CN106971414A (en) 2017-07-21

Similar Documents

Publication Publication Date Title
CN106971414B (en) Three-dimensional animation generation method based on deep cycle neural network algorithm
Yang et al. Weakly-supervised disentangling with recurrent transformations for 3d view synthesis
CN110728219B (en) 3D face generation method based on multi-column multi-scale graph convolution neural network
CN109086869B (en) Human body action prediction method based on attention mechanism
Xin et al. Arch: Adaptive recurrent-convolutional hybrid networks for long-term action recognition
Zhang et al. Facial expression retargeting from human to avatar made easy
CN112085836A (en) Three-dimensional face reconstruction method based on graph convolution neural network
CN111798369A (en) Face aging image synthesis method for generating confrontation network based on circulation condition
CN111028319B (en) Three-dimensional non-photorealistic expression generation method based on facial motion unit
CN113051420B (en) Robot vision man-machine interaction method and system based on text generation video
Li et al. Face sketch synthesis using regularized broad learning system
CN110310351A (en) A kind of 3 D human body skeleton cartoon automatic generation method based on sketch
CN111462274A (en) Human body image synthesis method and system based on SMP L model
CN115880724A (en) Light-weight three-dimensional hand posture estimation method based on RGB image
Xu Fast modelling algorithm for realistic three-dimensional human face for film and television animation
Pradhyumna A survey of modern deep learning based generative adversarial networks (gans)
Wu et al. MPCT: Multiscale Point Cloud Transformer with a Residual Network
CN117315069A (en) Human body posture migration method based on image feature alignment
CN116485962A (en) Animation generation method and system based on contrast learning
CN114155560A (en) Light weight method of high-resolution human body posture estimation model based on space dimension reduction
Kim et al. MHCanonNet: Multi-Hypothesis Canonical lifting Network for self-supervised 3D human pose estimation in the wild video
Qinran et al. Video‐Driven 2D Character Animation
Guo Design and Development of an Intelligent Rendering System for New Year's Paintings Color Based on B/S Architecture
Dhondse et al. Generative adversarial networks as an advancement in 2D to 3D reconstruction techniques
Jia et al. Facial expression synthesis based on motion patterns learned from face database

Legal Events

Date Code Title Description
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20200403

Address after: 330000 Jiangxi city of Nanchang Province Economic and Technological Development Zone East Shuanggang Street No. 808

Applicant after: East China Jiaotong University

Address before: 330013 Jiangxi province Nanchang Shuanggang economic and Technological Development Zone No. 605 West Street, the new village of 6 row 3, 202

Applicant before: JIANGXI DUDAFEI TECHNOLOGY Co.,Ltd.

GR01 Patent grant
GR01 Patent grant