CN117974852A - Intelligent binding method and device for facial images and computer equipment - Google Patents
Intelligent binding method and device for facial images and computer equipment Download PDFInfo
- Publication number
- CN117974852A CN117974852A CN202311543676.0A CN202311543676A CN117974852A CN 117974852 A CN117974852 A CN 117974852A CN 202311543676 A CN202311543676 A CN 202311543676A CN 117974852 A CN117974852 A CN 117974852A
- Authority
- CN
- China
- Prior art keywords
- facial
- image
- time sequence
- face
- binding
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 230000001815 facial effect Effects 0.000 title claims abstract description 294
- 238000000034 method Methods 0.000 title claims abstract description 29
- 238000000605 extraction Methods 0.000 claims abstract description 54
- 238000010191 image analysis Methods 0.000 claims abstract description 12
- 238000004364 calculation method Methods 0.000 claims abstract description 7
- 238000012545 processing Methods 0.000 claims abstract description 5
- 238000013528 artificial neural network Methods 0.000 claims description 86
- 238000013527 convolutional neural network Methods 0.000 claims description 30
- 230000014509 gene expression Effects 0.000 claims description 26
- 238000005259 measurement Methods 0.000 claims description 16
- 238000012549 training Methods 0.000 claims description 12
- 230000006870 function Effects 0.000 claims description 11
- 238000010276 construction Methods 0.000 claims description 9
- 102100033620 Calponin-1 Human genes 0.000 claims description 3
- 101000945318 Homo sapiens Calponin-1 Proteins 0.000 claims description 3
- 238000013500 data storage Methods 0.000 claims description 3
- 238000013135 deep learning Methods 0.000 claims description 3
- 230000004927 fusion Effects 0.000 claims description 3
- 238000003703 image analysis method Methods 0.000 claims description 3
- 230000000694 effects Effects 0.000 abstract description 14
- 230000008921 facial expression Effects 0.000 description 41
- 230000036544 posture Effects 0.000 description 7
- 238000010586 diagram Methods 0.000 description 2
- 230000003068 static effect Effects 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000006073 displacement reaction Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T13/00—Animation
- G06T13/20—3D [Three Dimensional] animation
- G06T13/40—3D [Three Dimensional] animation of characters, e.g. humans, animals or virtual beings
-
- A—HUMAN NECESSITIES
- A63—SPORTS; GAMES; AMUSEMENTS
- A63F—CARD, BOARD, OR ROULETTE GAMES; INDOOR GAMES USING SMALL MOVING PLAYING BODIES; VIDEO GAMES; GAMES NOT OTHERWISE PROVIDED FOR
- A63F13/00—Video games, i.e. games using an electronically generated display having two or more dimensions
- A63F13/60—Generating or modifying game content before or while executing the game program, e.g. authoring tools specially adapted for game development or game-integrated level editor
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0464—Convolutional networks [CNN, ConvNet]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- A—HUMAN NECESSITIES
- A63—SPORTS; GAMES; AMUSEMENTS
- A63F—CARD, BOARD, OR ROULETTE GAMES; INDOOR GAMES USING SMALL MOVING PLAYING BODIES; VIDEO GAMES; GAMES NOT OTHERWISE PROVIDED FOR
- A63F2300/00—Features of games using an electronically generated display having two or more dimensions, e.g. on a television screen, showing representations related to the game
- A63F2300/60—Methods for processing data by generating or executing the game program
- A63F2300/6009—Methods for processing data by generating or executing the game program for importing or creating game content, e.g. authoring tools during game development, adapting content to different platforms, use of a scripting language to create content
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2200/00—Indexing scheme for image data processing or generation, in general
- G06T2200/04—Indexing scheme for image data processing or generation, in general involving 3D image data
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Computing Systems (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Life Sciences & Earth Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Multimedia (AREA)
- Image Analysis (AREA)
- Image Processing (AREA)
Abstract
The invention relates to the technical field of image processing, in particular to an intelligent binding method and device for facial images and computer equipment, comprising the following steps of: acquiring a face video; obtaining a group of second face images through a time sequence prediction network according to the group of first face images; extracting features of the first face facial image and the second face facial image at each time sequence through a feature extraction network, and correspondingly obtaining binding facial features at each time sequence; carrying out facial image analysis on the binding facial features at each time sequence position through a facial image restoration model to obtain a third facial image; and carrying out facial image binding parameter calculation according to the third facial image at each time sequence through a parameter calculation network to obtain facial image binding parameters at each time sequence. The invention ensures that the facial binding gives consideration to time sequence fluency and space authenticity and improves the facial binding effect.
Description
Technical Field
The invention relates to the technical field of image processing, in particular to an intelligent binding method and device for facial images and computer equipment.
Background
Facial binding is an important step in game or animation, and can enable a game character or an animation character to have facial expression, for example, in the existing battle chess hand game on the market, when the character expression is made, a lot of skeleton points are usually built in 3ds max or Maya and covered, and the skeleton scaling, rotation and displacement are utilized to change the appearance of the face.
However, when the 3ds max or Maya is used for making the expression, the subjective effect of a producer is judged when the producer designs in the earlier stage due to manual control, the actual effect of facial expression binding is difficult to accurately grasp, the authenticity of the facial expression binding is directly reduced, and the continuous expression adjustment on the game or the animation is difficult to accurately adjust in the later stage due to better time continuity, so that the facial expression binding in the game animation has a clamping effect, the effect is poor, and if each expression is adjusted in place for the time continuity, the facial point-by-point adjustment needs to be carried out frame by frame, so that not only is much time wasted, but also the effect is poor.
Therefore, in the prior art, due to subjective effect judgment of a producer, the authenticity of facial expression binding is reduced, and the continuous expression on a game or an animation is difficult to adjust accurately, so that the facial expression binding in the game animation has a clamping effect.
Disclosure of Invention
The invention aims to provide an intelligent binding method for facial images, which aims to solve the technical problems that the authenticity of facial expression binding in the prior art is reduced, and the facial expression binding in game animation has a clamping effect.
In order to solve the technical problems, the invention specifically provides the following technical scheme:
an intelligent binding method for facial images comprises the following steps:
acquiring a face video, wherein the face video comprises a group of first face images connected according to time sequence;
Obtaining a group of second face images through a time sequence prediction network according to the group of first face images, wherein each second face image is positioned at the time sequence of each first face image, the second face images correspond to the time sequence prediction result of the group of first face images, and the time sequence prediction network is a neural network;
Performing feature extraction on the first face image and the second face image at each time sequence through a feature extraction network, and correspondingly obtaining binding face features at each time sequence, wherein the feature extraction network is a neural network;
carrying out facial image analysis on the binding facial features at each time sequence position through a facial image restoration model to obtain a third facial image, wherein the facial image restoration model is a neural network;
Carrying out facial image binding parameter measurement according to the third facial image at each time sequence through a parameter measurement network to obtain facial image binding parameters at each time sequence, wherein the parameter measurement network is a neural network, the facial image binding parameters correspond to Blender Shape binding parameters applied to a FLAME facial model, and the FLAME facial model is used for realizing automatic binding of the facial model to the facial image according to the Blender Shape binding parameters.
As a preferred embodiment of the present invention, obtaining a set of second face images through a time-series prediction network according to a set of first face images includes:
acquiring a time sequence of each first face image, and sequentially taking the time sequence as a prediction time sequence;
Deep learning is carried out on each first face facial image positioned at a pre-time sequence of the prediction time sequence by utilizing an LSTM network, so that a time sequence prediction model of a second face facial image at the prediction time sequence is obtained;
And predicting the second face image at each prediction time sequence by using a time sequence prediction model at each prediction time sequence to obtain a group of second face images.
As a preferred embodiment of the present invention, the expression of the time sequence prediction network is:
G2t=LSTM({G11,G12,G13,…,G1t-1});
t∈[3,N];
G21=G11,G22=G12;
Wherein, G2 t is the second face image at the t-th time sequence, G1 1,G12,G13,…,G1t-1 is the first face image at the 1,2,3, t-1 time sequences respectively, N is the total number of time sequences in the face video, G2 1 is the second face image at the 1 st time sequence, G2 2 is the second face image at the 2 nd time sequence, and t is the counting variable.
As a preferred solution of the present invention, feature extraction is performed on a first face image and a second face image at each time sequence through a feature extraction network, so as to obtain binding face features at each time sequence, including:
The first face facial image and the second face facial image at each time sequence are simultaneously input into a feature extraction network, and the feature extraction network outputs the binding facial features at each time sequence.
As a preferred embodiment of the present invention, the construction of the feature extraction network includes:
Randomly selecting a first face image and a second face image at a plurality of time sequences to serve as a first sample image and a second sample image respectively;
Inputting the first sample image at each timing to a first CNN neural network, outputting facial features of the first sample image by the first CNN neural network;
Inputting the second sample image at each time sequence to a second CNN neural network, outputting facial features of the second sample image by the second CNN neural network;
taking the difference between the facial features of the first sample image and the facial features of the second sample image as a loss function;
Based on the minimization of the loss function, learning and training the first CNN neural network and the second CNN neural network to obtain the feature extraction network;
The expression of the feature extraction network is as follows:
H3t=H1torH2t;
Loss=MSE(H1t,H2t);
Where H3 t is the binding facial feature at the t-th time, H1 t is the facial feature of the first sample image at the t-th time, H2 t is the facial feature of the second sample image at the t-th time, G1 t is the first sample image at the t-th time, G2 t is the second sample image at the t-th time, CNN1 is the first CNN neural network, CNN 2 is the second CNN neural network, loss is the Loss function value, MSE (H1 t,H2t) is the mean square error between H1 t and H2 t, t is the count variable, or is the mathematical identifier of either.
As a preferred scheme of the invention, the facial image analysis is carried out on the binding facial features at each time sequence position through a facial image restoration model to obtain a third facial image, which comprises the following steps:
The bound facial features at each timing are input to a facial image restoration model, and a third facial image at each timing is output by the facial image restoration model.
As a preferred embodiment of the present invention, the construction of the face image restoration model includes:
Randomly selecting a first face image and a second face image at a plurality of time sequences to serve as a first sample image and a second sample image respectively;
Acquiring binding facial features corresponding to the first sample image and the second sample image at each time sequence by using a feature extraction network, and taking the binding facial features as binding sample facial features at each time sequence;
Carrying out mean value fusion on the first sample image and the second sample image at each time sequence to obtain a third sample image;
taking each time sequence binding sample facial feature as an input item of SRCNN neural network, and taking a third sample image at each time sequence as an output item of SRCNN neural network;
Learning and training an input item of the SRCNN neural network and an output item of the SRCNN neural network by utilizing the SRCNN neural network to obtain a facial image restoration model;
The expression of the facial image restoration model is:
G3t=SRCNN(H3t);
Where G3 t is the third sample image at the t-th timing, H3 t is the binding sample facial feature at the t-th timing, SRCNN is SRCNN neural network, H3 t=H1torH2t,H1t is the facial feature of the first sample image at the t-th timing, H2 t is the facial feature of the second sample image at the t-th timing, and t is the count variable.
As a preferred scheme of the invention, according to the third facial image at each time sequence, the facial image binding parameters are calculated through the parameter calculation network, so as to obtain the facial image binding parameters at each time sequence, and the method comprises the following steps:
Inputting the third facial image at each time sequence into a parameter measuring network, and outputting facial image binding parameters at each time sequence by the parameter measuring network;
the construction of the parameter measuring network comprises the following steps:
marking the facial image binding parameters of the first sample image to obtain the facial image binding parameters of the first sample image;
taking the first sample image as an input item of the BP neural network, and taking the facial image binding parameter of the first sample image as an output item of the BP neural network;
Learning and training an input item of the BP neural network and an output item of the BP neural network by using the BP neural network to obtain the parameter measuring and calculating network;
St=BP(G3t);
In the formula, S t is a facial image binding parameter at the t time sequence, G3 t is a third facial image at the t time sequence, BP is a BP neural network, and t is a counting variable.
As a preferred embodiment of the present invention, the present invention provides an intelligent binding apparatus for facial images, including:
The data acquisition module is used for acquiring a face video, wherein the face video comprises a group of first face images connected according to time sequence;
The data processing module is used for obtaining a group of second face images through a time sequence prediction network according to a group of first face images;
The method comprises the steps of carrying out feature extraction on a first face image and a second face image at each time sequence through a feature extraction network, and correspondingly obtaining binding face features at each time sequence, wherein the feature extraction network is a neural network;
the facial image analysis method comprises the steps of carrying out facial image analysis on binding facial features at each time sequence position through a facial image restoration model to obtain a third facial image, wherein the facial image restoration model is a neural network; and
The parameter measurement network is a neural network, the facial image binding parameters correspond to Blender Shape binding parameters applied to a FLAME face model, and the FLAME face model is used for realizing automatic binding of the face model to the facial image according to the Blender Shape binding parameters;
the data storage module is used for storing a time sequence prediction network, a feature extraction network, a facial image restoration model and a parameter measurement network.
As a preferred aspect of the present invention, there is provided a computer apparatus,
At least one processor; and
A memory communicatively coupled to the at least one processor;
Wherein the memory stores instructions executable by the at least one processor to cause a computer device to perform the facial image intelligent binding method.
Compared with the prior art, the invention has the following beneficial effects:
According to the facial expression timing sequence prediction method, the facial image is subjected to timing sequence prediction to obtain the facial image predicted value representing the facial expression timing sequence rule, feature extraction is carried out based on the facial image predicted value and the facial image true value, and the binding parameters are calculated to obtain the facial image binding parameters, so that the facial binding gives consideration to timing sequence fluency and space authenticity, and the facial binding effect is improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below. It will be apparent to those of ordinary skill in the art that the drawings in the following description are exemplary only and that other implementations can be obtained from the extensions of the drawings provided without inventive effort.
FIG. 1 is a flowchart of a facial image intelligent binding method provided by an embodiment of the invention;
FIG. 2 is a block diagram of a facial image intelligent binding device according to an embodiment of the present invention;
Fig. 3 is an internal structure diagram of a computer device according to an embodiment of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
As shown in fig. 1, the present invention provides an intelligent binding method for facial images, comprising the following steps:
Acquiring a face video, wherein the face video comprises a group of first face images connected according to time sequence;
Obtaining a group of second face images through a time sequence prediction network according to the group of first face images, wherein each second face image is positioned at the time sequence of each first face image, the second face images correspond to the time sequence prediction result of the group of first face images, and the time sequence prediction network is a neural network;
performing feature extraction on the first facial image and the second facial image at each time sequence through a feature extraction network, and correspondingly obtaining binding facial features (such as facial expression features, facial posture features and other facial features) at each time sequence, wherein the feature extraction network is a neural network;
Carrying out facial image analysis on the binding facial features at each time sequence position through a facial image restoration model to obtain a third facial image, wherein the facial image restoration model is a neural network;
And carrying out facial image binding parameter measurement according to the third facial image at each time sequence through a parameter measurement network to obtain facial image binding parameters at each time sequence, wherein the parameter measurement network is a neural network, the facial image binding parameters correspond to Blender Shape binding parameters applied to a FLAME facial model, and the FLAME facial model is used for realizing automatic binding of the facial model to the facial image according to the Blender Shape binding parameters.
In order to realize facial binding, the facial expression and the gesture feature of the face are comprehensively mastered through the facial video, and then the facial binding of the face model is carried out through the facial expression and the gesture feature mastered in the facial video, so that the facial image binding is more detailed, and the display effect on the face model is better.
According to the invention, facial binding is carried out by using facial videos of the face, the facial binding required by binding can be more fully mastered, and binding is carried out in time sequence, so that the facial binding of the face model is changed from static binding to dynamic binding, namely, the face model has dynamic facial expression and facial gesture, and is more suitable for use by roles of animation and games, the dynamic expression requirement of the animation game is met, the dynamic facial binding can be obtained at one time, dynamic expression and gesture are not required to be manufactured from the static expression and gesture, and the facial binding efficiency in the roles is greatly improved.
Furthermore, in order to enable the dynamic expression and the dynamic gesture of the face model to have a better binding effect, the invention analyzes the face video on one hand, learns the change rule of the face expression and the gesture by utilizing the LSTM neural network, predicts the face expression and the gesture on a single time sequence based on the learned change rule of the face expression and the gesture, and predicts the face expression and the gesture on the single time sequence according with the change rule, so that the face binding is carried out through the predicted image of the face expression and the gesture on the single time sequence, and the face binding can be enabled to obtain time continuity of the face expression on each time sequence and the gesture on each time sequence in the face model, namely the dynamic expression and the dynamic gesture on the face model are smoother.
On the other hand, although the prediction of the facial expression and the gesture on a single time sequence is obtained based on the prediction of the real facial expression and the gesture in the facial video, the prediction of the facial expression and the gesture on the single time sequence is virtual in nature, so if the facial binding is carried out based on the predicted image of the facial expression and the gesture on the single time sequence, the reality of the dynamic expression and the dynamic gesture on the facial model is reduced, and in order to keep the reality of the dynamic expression and the dynamic gesture on the facial model, the facial binding is carried out based on the facial image with the real facial expression and the gesture in the facial video, so that the facial binding obtains the reality of the facial expression and the gesture on each time sequence in the facial model, namely, the dynamic expression and the dynamic gesture on the facial model are more real.
The invention carries out facial binding together based on the facial image with real facial expression and gesture and the predictive image with virtual facial expression and gesture, can keep the time continuity of dynamic expression and dynamic gesture on the facial model while keeping the space reality of the dynamic expression and the dynamic gesture on the facial model, so that the facial binding has both reality and fluency, and has better facial binding effect.
According to the invention, the facial video is analyzed, the LSTM neural network is utilized to learn the change rule of facial expression and gesture, and the facial expression and gesture of a person on a single time sequence are predicted based on the learned change rule of facial expression and gesture, and the method specifically comprises the following steps:
obtaining a group of second face images through a time sequence prediction network according to the group of first face images, wherein the method comprises the following steps:
acquiring a time sequence of each first face image, and sequentially taking the time sequence as a prediction time sequence;
Deep learning is carried out on each first face facial image positioned at a pre-time sequence of the prediction time sequence by utilizing an LSTM network, so that a time sequence prediction model of a second face facial image at the prediction time sequence is obtained;
And predicting the second face image at each prediction time sequence by using a time sequence prediction model at each prediction time sequence to obtain a group of second face images.
The expression of the timing prediction network is:
G2t=LSTM({G11,G12,G13,…,G1t-1});
t∈[3,N];
G21=G11,G22=G12;
Wherein, G2 t is the second face image at the t-th time sequence, G1 1,G12,G13,…,G1t-1 is the first face image at the 1,2,3, t-1 time sequences respectively, N is the total number of time sequences in the face video, G2 1 is the second face image at the 1 st time sequence, G2 2 is the second face image at the 2 nd time sequence, and t is the counting variable.
The invention carries out facial binding together based on the facial image with real facial expression and gesture and the predictive image with virtual facial expression and gesture, can keep the space reality of the dynamic expression and the dynamic gesture on the facial model and simultaneously keep the time continuity of the dynamic expression and the dynamic gesture on the facial model, and is concretely as follows:
Performing feature extraction on the first face image and the second face image at each time sequence through a feature extraction network to correspondingly obtain binding face features at each time sequence, including:
The first face facial image and the second face facial image at each time sequence are simultaneously input into a feature extraction network, and the feature extraction network outputs the binding facial features at each time sequence.
The construction of the feature extraction network comprises the following steps:
Randomly selecting a first face image and a second face image at a plurality of time sequences to serve as a first sample image and a second sample image respectively;
Inputting the first sample image at each timing to a first CNN neural network, outputting facial features of the first sample image by the first CNN neural network;
Inputting the second sample image at each time sequence to a second CNN neural network, outputting facial features of the second sample image by the second CNN neural network;
taking the difference between the facial features of the first sample image and the facial features of the second sample image as a loss function;
based on the minimization of the loss function, learning and training the first CNN neural network and the second CNN neural network to obtain a feature extraction network;
the expression of the feature extraction network is:
H3t=H1torH2t;
Loss=MSE(H1t,H2t);
Where H3 t is the binding facial feature at the t-th time, H1 t is the facial feature of the first sample image at the t-th time, H2 t is the facial feature of the second sample image at the t-th time, G1 t is the first sample image at the t-th time, G2 t is the second sample image at the t-th time, CNN1 is the first CNN neural network, CNN 2 is the second CNN neural network, loss is the Loss function value, MSE (H1 t,H2t) is the mean square error between H1 t and H2 t, t is the count variable, or is the mathematical identifier of either.
When facial features are extracted (facial expression features, facial posture features and the like), the facial features are extracted through the constructed feature extraction network, wherein the network structure of the feature extraction network comprises two CNN neural networks which are respectively used for extracting facial features in facial images (first facial images) with real facial expressions and postures and extracting facial postures in predicted images (second facial images) with virtual facial expressions and postures, when training is performed, the output difference of the two CNN neural networks is used as a loss function for carrying out network training, and the output difference of the two CNN neural networks is used as a loss function for carrying out network training, so that the facial features in facial images (first facial images) with real facial expressions and postures output in the feature extraction network and the facial features in facial images (first facial images) with real facial expressions and postures are the highest in similarity, and the facial features output in the feature extraction network are provided with both the first facial images and the second facial images, namely, the facial features output in the feature extraction network have real time sequence and the smoothness.
The facial image (the third facial image) restored by the facial features output in the feature extraction network also has time sequence fluency and space authenticity, so that facial image binding parameters obtained based on the third facial image can enable facial expression and gesture in the bound facial model to have time sequence fluency and space authenticity.
Carrying out facial image analysis on the binding facial features at each time sequence position through a facial image restoration model to obtain a third facial image, wherein the facial image analysis comprises the following steps:
The bound facial features at each timing are input to a facial image restoration model, and a third facial image at each timing is output by the facial image restoration model.
The construction of the face image restoration model comprises the following steps:
Randomly selecting a first face image and a second face image at a plurality of time sequences to serve as a first sample image and a second sample image respectively;
Acquiring binding facial features corresponding to the first sample image and the second sample image at each time sequence by using a feature extraction network, and taking the binding facial features as binding sample facial features at each time sequence;
Carrying out mean value fusion on the first sample image and the second sample image at each time sequence to obtain a third sample image;
taking each time sequence binding sample facial feature as an input item of SRCNN neural network, and taking a third sample image at each time sequence as an output item of SRCNN neural network;
Learning and training an input item of the SRCNN neural network and an output item of the SRCNN neural network by utilizing the SRCNN neural network to obtain a facial image restoration model;
The expression of the face image restoration model is:
G3t=SRCNN(H3t);
Where G3 t is the third sample image at the t-th timing, H3 t is the binding sample facial feature at the t-th timing, SRCNN is SRCNN neural network, H3 t=H1torH2t,H1t is the facial feature of the first sample image at the t-th timing, H2 t is the facial feature of the second sample image at the t-th timing, and t is the count variable.
Carrying out face image binding parameter calculation according to the third face image at each time sequence through a parameter calculation network to obtain face image binding parameters at each time sequence, wherein the face image binding parameters comprise:
Inputting the third facial image at each time sequence into a parameter measuring network, and outputting facial image binding parameters at each time sequence by the parameter measuring network;
The construction of the parameter measurement network comprises the following steps:
marking the facial image binding parameters of the first sample image to obtain the facial image binding parameters of the first sample image;
taking the first sample image as an input item of the BP neural network, and taking the facial image binding parameter of the first sample image as an output item of the BP neural network;
learning and training an input item of the BP neural network and an output item of the BP neural network by using the BP neural network to obtain a parameter measuring and calculating network;
S t=BP(G3t);
In the formula, S t is a facial image binding parameter at the t time sequence, G3 t is a third facial image at the t time sequence, BP is a BP neural network, and t is a counting variable.
As shown in fig. 2, the present invention provides an intelligent binding apparatus for facial images, comprising:
the data acquisition module is used for acquiring a facial video which comprises a group of first facial images connected according to time sequence;
The data processing module is used for obtaining a group of second face images through a time sequence prediction network according to a group of first face images;
The method comprises the steps that a first face facial image and a second face facial image at each time sequence are subjected to feature extraction through a feature extraction network, binding facial features at each time sequence are correspondingly obtained, and the feature extraction network is a neural network;
The facial image analysis method comprises the steps of carrying out facial image analysis on binding facial features at each time sequence position through a facial image restoration model to obtain a third facial image, wherein the facial image restoration model is a neural network; and
The parameter measurement network is a neural network, the face image binding parameters correspond to Blender Shape binding parameters applied to a FLAME face model, and the FLAME face model is used for realizing automatic binding of the face model to the face image according to the Blender Shape binding parameters;
the data storage module is used for storing a time sequence prediction network, a feature extraction network, a facial image restoration model and a parameter measurement network.
As shown in fig. 3, the present invention provides a computer device,
At least one processor; and
A memory communicatively coupled to the at least one processor;
The memory stores instructions executable by the at least one processor to cause the computer device to perform a facial image intelligent binding method.
According to the facial expression timing sequence prediction method, the facial image is subjected to timing sequence prediction to obtain the facial image predicted value representing the facial expression timing sequence rule, feature extraction is carried out based on the facial image predicted value and the facial image true value, and the binding parameters are calculated to obtain the facial image binding parameters, so that the facial binding gives consideration to timing sequence fluency and space authenticity, and the facial binding effect is improved.
The above embodiments are only exemplary embodiments of the present application and are not intended to limit the present application, the scope of which is defined by the claims. Various modifications and equivalent arrangements of this application will occur to those skilled in the art, and are intended to be within the spirit and scope of the application.
Claims (10)
1. An intelligent binding method for facial images is characterized in that: the method comprises the following steps:
acquiring a face video, wherein the face video comprises a group of first face images connected according to time sequence;
Obtaining a group of second face images through a time sequence prediction network according to the group of first face images, wherein each second face image is positioned at the time sequence of each first face image, the second face images correspond to the time sequence prediction result of the group of first face images, and the time sequence prediction network is a neural network;
Performing feature extraction on the first face image and the second face image at each time sequence through a feature extraction network, and correspondingly obtaining binding face features at each time sequence, wherein the feature extraction network is a neural network;
carrying out facial image analysis on the binding facial features at each time sequence position through a facial image restoration model to obtain a third facial image, wherein the facial image restoration model is a neural network;
Carrying out facial image binding parameter measurement according to the third facial image at each time sequence through a parameter measurement network to obtain facial image binding parameters at each time sequence, wherein the parameter measurement network is a neural network, the facial image binding parameters correspond to Blender Shape binding parameters applied to a FLAME facial model, and the FLAME facial model is used for realizing automatic binding of the facial model to the facial image according to the Blender Shape binding parameters.
2. The intelligent binding method for facial images according to claim 1, wherein: obtaining a group of second face images through a time sequence prediction network according to the group of first face images, wherein the method comprises the following steps:
acquiring a time sequence of each first face image, and sequentially taking the time sequence as a prediction time sequence;
Deep learning is carried out on each first face facial image positioned at a pre-time sequence of the prediction time sequence by utilizing an LSTM network, so that a time sequence prediction model of a second face facial image at the prediction time sequence is obtained;
And predicting the second face image at each prediction time sequence by using a time sequence prediction model at each prediction time sequence to obtain a group of second face images.
3. The intelligent binding method for facial images according to claim 2, wherein:
The expression of the time sequence prediction network is as follows:
G2t=LSTM({G11,G12,G13,…,G1t-1});
t∈[3,N];
G21=G11,G22=G12;
Wherein, G2 t is the second face image at the t-th time sequence, G1 1,G12,G13,…,G1t-1 is the first face image at the 1,2,3, t-1 time sequences respectively, N is the total number of time sequences in the face video, G2 1 is the second face image at the 1 st time sequence, G2 2 is the second face image at the 2 nd time sequence, and t is the counting variable.
4. A facial image intelligent binding method according to claim 3, wherein: performing feature extraction on the first face image and the second face image at each time sequence through a feature extraction network to correspondingly obtain binding face features at each time sequence, including:
The first face facial image and the second face facial image at each time sequence are simultaneously input into a feature extraction network, and the feature extraction network outputs the binding facial features at each time sequence.
5. The intelligent binding method for facial images according to claim 4, wherein: the construction of the feature extraction network comprises the following steps:
Randomly selecting a first face image and a second face image at a plurality of time sequences to serve as a first sample image and a second sample image respectively;
Inputting the first sample image at each timing to a first CNN neural network, outputting facial features of the first sample image by the first CNN neural network;
Inputting the second sample image at each time sequence to a second CNN neural network, outputting facial features of the second sample image by the second CNN neural network;
taking the difference between the facial features of the first sample image and the facial features of the second sample image as a loss function;
Based on the minimization of the loss function, learning and training the first CNN neural network and the second CNN neural network to obtain the feature extraction network;
The expression of the feature extraction network is as follows:
H3t=H1torH2t;
Loss=MSE(H1t,H2t);
Where H3 t is the binding facial feature at the t-th time, H1 t is the facial feature of the first sample image at the t-th time, H2 t is the facial feature of the second sample image at the t-th time, G1 t is the first sample image at the t-th time, G2 t is the second sample image at the t-th time, CNN1 is the first CNN neural network, CNN 2 is the second CNN neural network, loss is the Loss function value, MSE (H1 t,H2t) is the mean square error between H1 t and H2 t, t is the count variable, or is the mathematical identifier of either.
6. The intelligent binding method for facial images according to claim 5, wherein: carrying out facial image analysis on the binding facial features at each time sequence position through a facial image restoration model to obtain a third facial image, wherein the facial image analysis comprises the following steps:
The bound facial features at each timing are input to a facial image restoration model, and a third facial image at each timing is output by the facial image restoration model.
7. The intelligent binding method for facial images according to claim 6, wherein: the construction of the facial image restoration model comprises the following steps:
Randomly selecting a first face image and a second face image at a plurality of time sequences to serve as a first sample image and a second sample image respectively;
Acquiring binding facial features corresponding to the first sample image and the second sample image at each time sequence by using a feature extraction network, and taking the binding facial features as binding sample facial features at each time sequence;
Carrying out mean value fusion on the first sample image and the second sample image at each time sequence to obtain a third sample image;
taking each time sequence binding sample facial feature as an input item of SRCNN neural network, and taking a third sample image at each time sequence as an output item of SRCNN neural network;
Learning and training an input item of the SRCNN neural network and an output item of the SRCNN neural network by utilizing the SRCNN neural network to obtain a facial image restoration model;
The expression of the facial image restoration model is:
G3t=SRCNN(H3t);
Where G3 t is the third sample image at the t-th timing, H3 t is the binding sample facial feature at the t-th timing, SRCNN is SRCNN neural network, H3 t=H1torH2t,H1t is the facial feature of the first sample image at the t-th timing, H2 t is the facial feature of the second sample image at the t-th timing, and t is the count variable.
8. The intelligent binding method for facial images according to claim 7, wherein: carrying out face image binding parameter calculation according to the third face image at each time sequence through a parameter calculation network to obtain face image binding parameters at each time sequence, wherein the face image binding parameters comprise:
Inputting the third facial image at each time sequence into a parameter measuring network, and outputting facial image binding parameters at each time sequence by the parameter measuring network;
the construction of the parameter measuring network comprises the following steps:
marking the facial image binding parameters of the first sample image to obtain the facial image binding parameters of the first sample image;
taking the first sample image as an input item of the BP neural network, and taking the facial image binding parameter of the first sample image as an output item of the BP neural network;
Learning and training an input item of the BP neural network and an output item of the BP neural network by using the BP neural network to obtain the parameter measuring and calculating network;
St=BP(G3t);
In the formula, S t is a facial image binding parameter at the t time sequence, G3 t is a third facial image at the t time sequence, BP is a BP neural network, and t is a counting variable.
9. An intelligent binding apparatus for facial images, comprising:
The data acquisition module is used for acquiring a face video, wherein the face video comprises a group of first face images connected according to time sequence;
The data processing module is used for obtaining a group of second face images through a time sequence prediction network according to a group of first face images;
The method comprises the steps of carrying out feature extraction on a first face image and a second face image at each time sequence through a feature extraction network, and correspondingly obtaining binding face features at each time sequence, wherein the feature extraction network is a neural network;
the facial image analysis method comprises the steps of carrying out facial image analysis on binding facial features at each time sequence position through a facial image restoration model to obtain a third facial image, wherein the facial image restoration model is a neural network; and
The parameter measurement network is a neural network, the facial image binding parameters correspond to Blender Shape binding parameters applied to a FLAME face model, and the FLAME face model is used for realizing automatic binding of the face model to the facial image according to the Blender Shape binding parameters;
the data storage module is used for storing a time sequence prediction network, a feature extraction network, a facial image restoration model and a parameter measurement network.
10. A computer device, characterized in that,
At least one processor; and
A memory communicatively coupled to the at least one processor;
wherein the memory stores instructions executable by the at least one processor to cause a computer device to perform the method of any of claims 1-8.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311543676.0A CN117974852A (en) | 2023-11-17 | 2023-11-17 | Intelligent binding method and device for facial images and computer equipment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311543676.0A CN117974852A (en) | 2023-11-17 | 2023-11-17 | Intelligent binding method and device for facial images and computer equipment |
Publications (1)
Publication Number | Publication Date |
---|---|
CN117974852A true CN117974852A (en) | 2024-05-03 |
Family
ID=90852071
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202311543676.0A Pending CN117974852A (en) | 2023-11-17 | 2023-11-17 | Intelligent binding method and device for facial images and computer equipment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117974852A (en) |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20210264563A1 (en) * | 2019-04-26 | 2021-08-26 | Tencent Technology (Shenzhen) Company Limited | Method and apparatus for displaying face of virtual role, computer device, and readable storage medium |
CN114820917A (en) * | 2021-01-29 | 2022-07-29 | 宿迁硅基智能科技有限公司 | Automatic facial skeleton binding migration method and system based on fbx file |
CN115063847A (en) * | 2022-04-29 | 2022-09-16 | 网易(杭州)网络有限公司 | Training method and device for facial image acquisition model |
CN115311394A (en) * | 2022-07-27 | 2022-11-08 | 湖南芒果无际科技有限公司 | Method, system, equipment and medium for driving digital human face animation |
CN116863043A (en) * | 2023-05-25 | 2023-10-10 | 度小满科技(北京)有限公司 | Face dynamic capture driving method and device, electronic equipment and readable storage medium |
-
2023
- 2023-11-17 CN CN202311543676.0A patent/CN117974852A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20210264563A1 (en) * | 2019-04-26 | 2021-08-26 | Tencent Technology (Shenzhen) Company Limited | Method and apparatus for displaying face of virtual role, computer device, and readable storage medium |
CN114820917A (en) * | 2021-01-29 | 2022-07-29 | 宿迁硅基智能科技有限公司 | Automatic facial skeleton binding migration method and system based on fbx file |
CN115063847A (en) * | 2022-04-29 | 2022-09-16 | 网易(杭州)网络有限公司 | Training method and device for facial image acquisition model |
CN115311394A (en) * | 2022-07-27 | 2022-11-08 | 湖南芒果无际科技有限公司 | Method, system, equipment and medium for driving digital human face animation |
CN116863043A (en) * | 2023-05-25 | 2023-10-10 | 度小满科技(北京)有限公司 | Face dynamic capture driving method and device, electronic equipment and readable storage medium |
Non-Patent Citations (1)
Title |
---|
AI技术聚合 扎眼的阳光: "3D人脸模型Flame —-《Learning a model of facial shape and expression from 4D scans》论文讲解及代码注释", pages 1 - 15, Retrieved from the Internet <URL:https://aitechtogether.com/article/17540.html> * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110503074A (en) | Information labeling method, apparatus, equipment and the storage medium of video frame | |
CN110245550B (en) | Human face noise data set CNN training method based on total cosine distribution | |
CN113762133A (en) | Self-weight fitness auxiliary coaching system, method and terminal based on human body posture recognition | |
CN107679522A (en) | Action identification method based on multithread LSTM | |
US20220237917A1 (en) | Video comparison method and apparatus, computer device, and storage medium | |
CN110490173B (en) | Intelligent action scoring system based on 3D somatosensory model | |
Xu et al. | Learning self-supervised space-time CNN for fast video style transfer | |
CN114373050A (en) | Chemistry experiment teaching system and method based on HoloLens | |
CN114360018A (en) | Rendering method and device of three-dimensional facial expression, storage medium and electronic device | |
CN113282840B (en) | Comprehensive training acquisition management platform | |
CN110287912A (en) | Method, apparatus and medium are determined based on the target object affective state of deep learning | |
CN116701706B (en) | Data processing method, device, equipment and medium based on artificial intelligence | |
CN113066074A (en) | Visual saliency prediction method based on binocular parallax offset fusion | |
CN117711066A (en) | Three-dimensional human body posture estimation method, device, equipment and medium | |
CN117974852A (en) | Intelligent binding method and device for facial images and computer equipment | |
CN112101306B (en) | Fine facial expression capturing method and device based on RGB image | |
CN113723233B (en) | Student learning participation assessment method based on hierarchical time sequence multi-example learning | |
CN116259104A (en) | Intelligent dance action quality assessment method, device and system | |
CN115719497A (en) | Student concentration degree identification method and system | |
Palanimeera et al. | Yoga posture recognition by learning spatial-temporal feature with deep learning techniques | |
Chen et al. | Movement Evaluation Algorithm‐Based Form Tracking Technology and Optimal Control of Limbs for Dancers | |
CN114785978A (en) | Video image quality determination method for video conference | |
CN113706650A (en) | Image generation method based on attention mechanism and flow model | |
CN112200739A (en) | Video processing method and device, readable storage medium and electronic equipment | |
CN111597997A (en) | Computer control teaching equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |