CN117974852A - Intelligent binding method and device for facial images and computer equipment - Google Patents

Intelligent binding method and device for facial images and computer equipment Download PDF

Info

Publication number
CN117974852A
CN117974852A CN202311543676.0A CN202311543676A CN117974852A CN 117974852 A CN117974852 A CN 117974852A CN 202311543676 A CN202311543676 A CN 202311543676A CN 117974852 A CN117974852 A CN 117974852A
Authority
CN
China
Prior art keywords
facial
image
time sequence
face
binding
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311543676.0A
Other languages
Chinese (zh)
Inventor
王子傑
田越
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Huichang Shuyu Technology Development Co ltd
Original Assignee
Beijing Huichang Shuyu Technology Development Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Huichang Shuyu Technology Development Co ltd filed Critical Beijing Huichang Shuyu Technology Development Co ltd
Priority to CN202311543676.0A priority Critical patent/CN117974852A/en
Publication of CN117974852A publication Critical patent/CN117974852A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T13/00Animation
    • G06T13/203D [Three Dimensional] animation
    • G06T13/403D [Three Dimensional] animation of characters, e.g. humans, animals or virtual beings
    • AHUMAN NECESSITIES
    • A63SPORTS; GAMES; AMUSEMENTS
    • A63FCARD, BOARD, OR ROULETTE GAMES; INDOOR GAMES USING SMALL MOVING PLAYING BODIES; VIDEO GAMES; GAMES NOT OTHERWISE PROVIDED FOR
    • A63F13/00Video games, i.e. games using an electronically generated display having two or more dimensions
    • A63F13/60Generating or modifying game content before or while executing the game program, e.g. authoring tools specially adapted for game development or game-integrated level editor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • AHUMAN NECESSITIES
    • A63SPORTS; GAMES; AMUSEMENTS
    • A63FCARD, BOARD, OR ROULETTE GAMES; INDOOR GAMES USING SMALL MOVING PLAYING BODIES; VIDEO GAMES; GAMES NOT OTHERWISE PROVIDED FOR
    • A63F2300/00Features of games using an electronically generated display having two or more dimensions, e.g. on a television screen, showing representations related to the game
    • A63F2300/60Methods for processing data by generating or executing the game program
    • A63F2300/6009Methods for processing data by generating or executing the game program for importing or creating game content, e.g. authoring tools during game development, adapting content to different platforms, use of a scripting language to create content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2200/00Indexing scheme for image data processing or generation, in general
    • G06T2200/04Indexing scheme for image data processing or generation, in general involving 3D image data

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

The invention relates to the technical field of image processing, in particular to an intelligent binding method and device for facial images and computer equipment, comprising the following steps of: acquiring a face video; obtaining a group of second face images through a time sequence prediction network according to the group of first face images; extracting features of the first face facial image and the second face facial image at each time sequence through a feature extraction network, and correspondingly obtaining binding facial features at each time sequence; carrying out facial image analysis on the binding facial features at each time sequence position through a facial image restoration model to obtain a third facial image; and carrying out facial image binding parameter calculation according to the third facial image at each time sequence through a parameter calculation network to obtain facial image binding parameters at each time sequence. The invention ensures that the facial binding gives consideration to time sequence fluency and space authenticity and improves the facial binding effect.

Description

Intelligent binding method and device for facial images and computer equipment
Technical Field
The invention relates to the technical field of image processing, in particular to an intelligent binding method and device for facial images and computer equipment.
Background
Facial binding is an important step in game or animation, and can enable a game character or an animation character to have facial expression, for example, in the existing battle chess hand game on the market, when the character expression is made, a lot of skeleton points are usually built in 3ds max or Maya and covered, and the skeleton scaling, rotation and displacement are utilized to change the appearance of the face.
However, when the 3ds max or Maya is used for making the expression, the subjective effect of a producer is judged when the producer designs in the earlier stage due to manual control, the actual effect of facial expression binding is difficult to accurately grasp, the authenticity of the facial expression binding is directly reduced, and the continuous expression adjustment on the game or the animation is difficult to accurately adjust in the later stage due to better time continuity, so that the facial expression binding in the game animation has a clamping effect, the effect is poor, and if each expression is adjusted in place for the time continuity, the facial point-by-point adjustment needs to be carried out frame by frame, so that not only is much time wasted, but also the effect is poor.
Therefore, in the prior art, due to subjective effect judgment of a producer, the authenticity of facial expression binding is reduced, and the continuous expression on a game or an animation is difficult to adjust accurately, so that the facial expression binding in the game animation has a clamping effect.
Disclosure of Invention
The invention aims to provide an intelligent binding method for facial images, which aims to solve the technical problems that the authenticity of facial expression binding in the prior art is reduced, and the facial expression binding in game animation has a clamping effect.
In order to solve the technical problems, the invention specifically provides the following technical scheme:
an intelligent binding method for facial images comprises the following steps:
acquiring a face video, wherein the face video comprises a group of first face images connected according to time sequence;
Obtaining a group of second face images through a time sequence prediction network according to the group of first face images, wherein each second face image is positioned at the time sequence of each first face image, the second face images correspond to the time sequence prediction result of the group of first face images, and the time sequence prediction network is a neural network;
Performing feature extraction on the first face image and the second face image at each time sequence through a feature extraction network, and correspondingly obtaining binding face features at each time sequence, wherein the feature extraction network is a neural network;
carrying out facial image analysis on the binding facial features at each time sequence position through a facial image restoration model to obtain a third facial image, wherein the facial image restoration model is a neural network;
Carrying out facial image binding parameter measurement according to the third facial image at each time sequence through a parameter measurement network to obtain facial image binding parameters at each time sequence, wherein the parameter measurement network is a neural network, the facial image binding parameters correspond to Blender Shape binding parameters applied to a FLAME facial model, and the FLAME facial model is used for realizing automatic binding of the facial model to the facial image according to the Blender Shape binding parameters.
As a preferred embodiment of the present invention, obtaining a set of second face images through a time-series prediction network according to a set of first face images includes:
acquiring a time sequence of each first face image, and sequentially taking the time sequence as a prediction time sequence;
Deep learning is carried out on each first face facial image positioned at a pre-time sequence of the prediction time sequence by utilizing an LSTM network, so that a time sequence prediction model of a second face facial image at the prediction time sequence is obtained;
And predicting the second face image at each prediction time sequence by using a time sequence prediction model at each prediction time sequence to obtain a group of second face images.
As a preferred embodiment of the present invention, the expression of the time sequence prediction network is:
G2t=LSTM({G11,G12,G13,…,G1t-1});
t∈[3,N];
G21=G11,G22=G12
Wherein, G2 t is the second face image at the t-th time sequence, G1 1,G12,G13,…,G1t-1 is the first face image at the 1,2,3, t-1 time sequences respectively, N is the total number of time sequences in the face video, G2 1 is the second face image at the 1 st time sequence, G2 2 is the second face image at the 2 nd time sequence, and t is the counting variable.
As a preferred solution of the present invention, feature extraction is performed on a first face image and a second face image at each time sequence through a feature extraction network, so as to obtain binding face features at each time sequence, including:
The first face facial image and the second face facial image at each time sequence are simultaneously input into a feature extraction network, and the feature extraction network outputs the binding facial features at each time sequence.
As a preferred embodiment of the present invention, the construction of the feature extraction network includes:
Randomly selecting a first face image and a second face image at a plurality of time sequences to serve as a first sample image and a second sample image respectively;
Inputting the first sample image at each timing to a first CNN neural network, outputting facial features of the first sample image by the first CNN neural network;
Inputting the second sample image at each time sequence to a second CNN neural network, outputting facial features of the second sample image by the second CNN neural network;
taking the difference between the facial features of the first sample image and the facial features of the second sample image as a loss function;
Based on the minimization of the loss function, learning and training the first CNN neural network and the second CNN neural network to obtain the feature extraction network;
The expression of the feature extraction network is as follows:
H3t=H1torH2t
Loss=MSE(H1t,H2t);
Where H3 t is the binding facial feature at the t-th time, H1 t is the facial feature of the first sample image at the t-th time, H2 t is the facial feature of the second sample image at the t-th time, G1 t is the first sample image at the t-th time, G2 t is the second sample image at the t-th time, CNN1 is the first CNN neural network, CNN 2 is the second CNN neural network, loss is the Loss function value, MSE (H1 t,H2t) is the mean square error between H1 t and H2 t, t is the count variable, or is the mathematical identifier of either.
As a preferred scheme of the invention, the facial image analysis is carried out on the binding facial features at each time sequence position through a facial image restoration model to obtain a third facial image, which comprises the following steps:
The bound facial features at each timing are input to a facial image restoration model, and a third facial image at each timing is output by the facial image restoration model.
As a preferred embodiment of the present invention, the construction of the face image restoration model includes:
Randomly selecting a first face image and a second face image at a plurality of time sequences to serve as a first sample image and a second sample image respectively;
Acquiring binding facial features corresponding to the first sample image and the second sample image at each time sequence by using a feature extraction network, and taking the binding facial features as binding sample facial features at each time sequence;
Carrying out mean value fusion on the first sample image and the second sample image at each time sequence to obtain a third sample image;
taking each time sequence binding sample facial feature as an input item of SRCNN neural network, and taking a third sample image at each time sequence as an output item of SRCNN neural network;
Learning and training an input item of the SRCNN neural network and an output item of the SRCNN neural network by utilizing the SRCNN neural network to obtain a facial image restoration model;
The expression of the facial image restoration model is:
G3t=SRCNN(H3t);
Where G3 t is the third sample image at the t-th timing, H3 t is the binding sample facial feature at the t-th timing, SRCNN is SRCNN neural network, H3 t=H1torH2t,H1t is the facial feature of the first sample image at the t-th timing, H2 t is the facial feature of the second sample image at the t-th timing, and t is the count variable.
As a preferred scheme of the invention, according to the third facial image at each time sequence, the facial image binding parameters are calculated through the parameter calculation network, so as to obtain the facial image binding parameters at each time sequence, and the method comprises the following steps:
Inputting the third facial image at each time sequence into a parameter measuring network, and outputting facial image binding parameters at each time sequence by the parameter measuring network;
the construction of the parameter measuring network comprises the following steps:
marking the facial image binding parameters of the first sample image to obtain the facial image binding parameters of the first sample image;
taking the first sample image as an input item of the BP neural network, and taking the facial image binding parameter of the first sample image as an output item of the BP neural network;
Learning and training an input item of the BP neural network and an output item of the BP neural network by using the BP neural network to obtain the parameter measuring and calculating network;
St=BP(G3t);
In the formula, S t is a facial image binding parameter at the t time sequence, G3 t is a third facial image at the t time sequence, BP is a BP neural network, and t is a counting variable.
As a preferred embodiment of the present invention, the present invention provides an intelligent binding apparatus for facial images, including:
The data acquisition module is used for acquiring a face video, wherein the face video comprises a group of first face images connected according to time sequence;
The data processing module is used for obtaining a group of second face images through a time sequence prediction network according to a group of first face images;
The method comprises the steps of carrying out feature extraction on a first face image and a second face image at each time sequence through a feature extraction network, and correspondingly obtaining binding face features at each time sequence, wherein the feature extraction network is a neural network;
the facial image analysis method comprises the steps of carrying out facial image analysis on binding facial features at each time sequence position through a facial image restoration model to obtain a third facial image, wherein the facial image restoration model is a neural network; and
The parameter measurement network is a neural network, the facial image binding parameters correspond to Blender Shape binding parameters applied to a FLAME face model, and the FLAME face model is used for realizing automatic binding of the face model to the facial image according to the Blender Shape binding parameters;
the data storage module is used for storing a time sequence prediction network, a feature extraction network, a facial image restoration model and a parameter measurement network.
As a preferred aspect of the present invention, there is provided a computer apparatus,
At least one processor; and
A memory communicatively coupled to the at least one processor;
Wherein the memory stores instructions executable by the at least one processor to cause a computer device to perform the facial image intelligent binding method.
Compared with the prior art, the invention has the following beneficial effects:
According to the facial expression timing sequence prediction method, the facial image is subjected to timing sequence prediction to obtain the facial image predicted value representing the facial expression timing sequence rule, feature extraction is carried out based on the facial image predicted value and the facial image true value, and the binding parameters are calculated to obtain the facial image binding parameters, so that the facial binding gives consideration to timing sequence fluency and space authenticity, and the facial binding effect is improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below. It will be apparent to those of ordinary skill in the art that the drawings in the following description are exemplary only and that other implementations can be obtained from the extensions of the drawings provided without inventive effort.
FIG. 1 is a flowchart of a facial image intelligent binding method provided by an embodiment of the invention;
FIG. 2 is a block diagram of a facial image intelligent binding device according to an embodiment of the present invention;
Fig. 3 is an internal structure diagram of a computer device according to an embodiment of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
As shown in fig. 1, the present invention provides an intelligent binding method for facial images, comprising the following steps:
Acquiring a face video, wherein the face video comprises a group of first face images connected according to time sequence;
Obtaining a group of second face images through a time sequence prediction network according to the group of first face images, wherein each second face image is positioned at the time sequence of each first face image, the second face images correspond to the time sequence prediction result of the group of first face images, and the time sequence prediction network is a neural network;
performing feature extraction on the first facial image and the second facial image at each time sequence through a feature extraction network, and correspondingly obtaining binding facial features (such as facial expression features, facial posture features and other facial features) at each time sequence, wherein the feature extraction network is a neural network;
Carrying out facial image analysis on the binding facial features at each time sequence position through a facial image restoration model to obtain a third facial image, wherein the facial image restoration model is a neural network;
And carrying out facial image binding parameter measurement according to the third facial image at each time sequence through a parameter measurement network to obtain facial image binding parameters at each time sequence, wherein the parameter measurement network is a neural network, the facial image binding parameters correspond to Blender Shape binding parameters applied to a FLAME facial model, and the FLAME facial model is used for realizing automatic binding of the facial model to the facial image according to the Blender Shape binding parameters.
In order to realize facial binding, the facial expression and the gesture feature of the face are comprehensively mastered through the facial video, and then the facial binding of the face model is carried out through the facial expression and the gesture feature mastered in the facial video, so that the facial image binding is more detailed, and the display effect on the face model is better.
According to the invention, facial binding is carried out by using facial videos of the face, the facial binding required by binding can be more fully mastered, and binding is carried out in time sequence, so that the facial binding of the face model is changed from static binding to dynamic binding, namely, the face model has dynamic facial expression and facial gesture, and is more suitable for use by roles of animation and games, the dynamic expression requirement of the animation game is met, the dynamic facial binding can be obtained at one time, dynamic expression and gesture are not required to be manufactured from the static expression and gesture, and the facial binding efficiency in the roles is greatly improved.
Furthermore, in order to enable the dynamic expression and the dynamic gesture of the face model to have a better binding effect, the invention analyzes the face video on one hand, learns the change rule of the face expression and the gesture by utilizing the LSTM neural network, predicts the face expression and the gesture on a single time sequence based on the learned change rule of the face expression and the gesture, and predicts the face expression and the gesture on the single time sequence according with the change rule, so that the face binding is carried out through the predicted image of the face expression and the gesture on the single time sequence, and the face binding can be enabled to obtain time continuity of the face expression on each time sequence and the gesture on each time sequence in the face model, namely the dynamic expression and the dynamic gesture on the face model are smoother.
On the other hand, although the prediction of the facial expression and the gesture on a single time sequence is obtained based on the prediction of the real facial expression and the gesture in the facial video, the prediction of the facial expression and the gesture on the single time sequence is virtual in nature, so if the facial binding is carried out based on the predicted image of the facial expression and the gesture on the single time sequence, the reality of the dynamic expression and the dynamic gesture on the facial model is reduced, and in order to keep the reality of the dynamic expression and the dynamic gesture on the facial model, the facial binding is carried out based on the facial image with the real facial expression and the gesture in the facial video, so that the facial binding obtains the reality of the facial expression and the gesture on each time sequence in the facial model, namely, the dynamic expression and the dynamic gesture on the facial model are more real.
The invention carries out facial binding together based on the facial image with real facial expression and gesture and the predictive image with virtual facial expression and gesture, can keep the time continuity of dynamic expression and dynamic gesture on the facial model while keeping the space reality of the dynamic expression and the dynamic gesture on the facial model, so that the facial binding has both reality and fluency, and has better facial binding effect.
According to the invention, the facial video is analyzed, the LSTM neural network is utilized to learn the change rule of facial expression and gesture, and the facial expression and gesture of a person on a single time sequence are predicted based on the learned change rule of facial expression and gesture, and the method specifically comprises the following steps:
obtaining a group of second face images through a time sequence prediction network according to the group of first face images, wherein the method comprises the following steps:
acquiring a time sequence of each first face image, and sequentially taking the time sequence as a prediction time sequence;
Deep learning is carried out on each first face facial image positioned at a pre-time sequence of the prediction time sequence by utilizing an LSTM network, so that a time sequence prediction model of a second face facial image at the prediction time sequence is obtained;
And predicting the second face image at each prediction time sequence by using a time sequence prediction model at each prediction time sequence to obtain a group of second face images.
The expression of the timing prediction network is:
G2t=LSTM({G11,G12,G13,…,G1t-1});
t∈[3,N];
G21=G11,G22=G12
Wherein, G2 t is the second face image at the t-th time sequence, G1 1,G12,G13,…,G1t-1 is the first face image at the 1,2,3, t-1 time sequences respectively, N is the total number of time sequences in the face video, G2 1 is the second face image at the 1 st time sequence, G2 2 is the second face image at the 2 nd time sequence, and t is the counting variable.
The invention carries out facial binding together based on the facial image with real facial expression and gesture and the predictive image with virtual facial expression and gesture, can keep the space reality of the dynamic expression and the dynamic gesture on the facial model and simultaneously keep the time continuity of the dynamic expression and the dynamic gesture on the facial model, and is concretely as follows:
Performing feature extraction on the first face image and the second face image at each time sequence through a feature extraction network to correspondingly obtain binding face features at each time sequence, including:
The first face facial image and the second face facial image at each time sequence are simultaneously input into a feature extraction network, and the feature extraction network outputs the binding facial features at each time sequence.
The construction of the feature extraction network comprises the following steps:
Randomly selecting a first face image and a second face image at a plurality of time sequences to serve as a first sample image and a second sample image respectively;
Inputting the first sample image at each timing to a first CNN neural network, outputting facial features of the first sample image by the first CNN neural network;
Inputting the second sample image at each time sequence to a second CNN neural network, outputting facial features of the second sample image by the second CNN neural network;
taking the difference between the facial features of the first sample image and the facial features of the second sample image as a loss function;
based on the minimization of the loss function, learning and training the first CNN neural network and the second CNN neural network to obtain a feature extraction network;
the expression of the feature extraction network is:
H3t=H1torH2t
Loss=MSE(H1t,H2t);
Where H3 t is the binding facial feature at the t-th time, H1 t is the facial feature of the first sample image at the t-th time, H2 t is the facial feature of the second sample image at the t-th time, G1 t is the first sample image at the t-th time, G2 t is the second sample image at the t-th time, CNN1 is the first CNN neural network, CNN 2 is the second CNN neural network, loss is the Loss function value, MSE (H1 t,H2t) is the mean square error between H1 t and H2 t, t is the count variable, or is the mathematical identifier of either.
When facial features are extracted (facial expression features, facial posture features and the like), the facial features are extracted through the constructed feature extraction network, wherein the network structure of the feature extraction network comprises two CNN neural networks which are respectively used for extracting facial features in facial images (first facial images) with real facial expressions and postures and extracting facial postures in predicted images (second facial images) with virtual facial expressions and postures, when training is performed, the output difference of the two CNN neural networks is used as a loss function for carrying out network training, and the output difference of the two CNN neural networks is used as a loss function for carrying out network training, so that the facial features in facial images (first facial images) with real facial expressions and postures output in the feature extraction network and the facial features in facial images (first facial images) with real facial expressions and postures are the highest in similarity, and the facial features output in the feature extraction network are provided with both the first facial images and the second facial images, namely, the facial features output in the feature extraction network have real time sequence and the smoothness.
The facial image (the third facial image) restored by the facial features output in the feature extraction network also has time sequence fluency and space authenticity, so that facial image binding parameters obtained based on the third facial image can enable facial expression and gesture in the bound facial model to have time sequence fluency and space authenticity.
Carrying out facial image analysis on the binding facial features at each time sequence position through a facial image restoration model to obtain a third facial image, wherein the facial image analysis comprises the following steps:
The bound facial features at each timing are input to a facial image restoration model, and a third facial image at each timing is output by the facial image restoration model.
The construction of the face image restoration model comprises the following steps:
Randomly selecting a first face image and a second face image at a plurality of time sequences to serve as a first sample image and a second sample image respectively;
Acquiring binding facial features corresponding to the first sample image and the second sample image at each time sequence by using a feature extraction network, and taking the binding facial features as binding sample facial features at each time sequence;
Carrying out mean value fusion on the first sample image and the second sample image at each time sequence to obtain a third sample image;
taking each time sequence binding sample facial feature as an input item of SRCNN neural network, and taking a third sample image at each time sequence as an output item of SRCNN neural network;
Learning and training an input item of the SRCNN neural network and an output item of the SRCNN neural network by utilizing the SRCNN neural network to obtain a facial image restoration model;
The expression of the face image restoration model is:
G3t=SRCNN(H3t);
Where G3 t is the third sample image at the t-th timing, H3 t is the binding sample facial feature at the t-th timing, SRCNN is SRCNN neural network, H3 t=H1torH2t,H1t is the facial feature of the first sample image at the t-th timing, H2 t is the facial feature of the second sample image at the t-th timing, and t is the count variable.
Carrying out face image binding parameter calculation according to the third face image at each time sequence through a parameter calculation network to obtain face image binding parameters at each time sequence, wherein the face image binding parameters comprise:
Inputting the third facial image at each time sequence into a parameter measuring network, and outputting facial image binding parameters at each time sequence by the parameter measuring network;
The construction of the parameter measurement network comprises the following steps:
marking the facial image binding parameters of the first sample image to obtain the facial image binding parameters of the first sample image;
taking the first sample image as an input item of the BP neural network, and taking the facial image binding parameter of the first sample image as an output item of the BP neural network;
learning and training an input item of the BP neural network and an output item of the BP neural network by using the BP neural network to obtain a parameter measuring and calculating network;
S t=BP(G3t);
In the formula, S t is a facial image binding parameter at the t time sequence, G3 t is a third facial image at the t time sequence, BP is a BP neural network, and t is a counting variable.
As shown in fig. 2, the present invention provides an intelligent binding apparatus for facial images, comprising:
the data acquisition module is used for acquiring a facial video which comprises a group of first facial images connected according to time sequence;
The data processing module is used for obtaining a group of second face images through a time sequence prediction network according to a group of first face images;
The method comprises the steps that a first face facial image and a second face facial image at each time sequence are subjected to feature extraction through a feature extraction network, binding facial features at each time sequence are correspondingly obtained, and the feature extraction network is a neural network;
The facial image analysis method comprises the steps of carrying out facial image analysis on binding facial features at each time sequence position through a facial image restoration model to obtain a third facial image, wherein the facial image restoration model is a neural network; and
The parameter measurement network is a neural network, the face image binding parameters correspond to Blender Shape binding parameters applied to a FLAME face model, and the FLAME face model is used for realizing automatic binding of the face model to the face image according to the Blender Shape binding parameters;
the data storage module is used for storing a time sequence prediction network, a feature extraction network, a facial image restoration model and a parameter measurement network.
As shown in fig. 3, the present invention provides a computer device,
At least one processor; and
A memory communicatively coupled to the at least one processor;
The memory stores instructions executable by the at least one processor to cause the computer device to perform a facial image intelligent binding method.
According to the facial expression timing sequence prediction method, the facial image is subjected to timing sequence prediction to obtain the facial image predicted value representing the facial expression timing sequence rule, feature extraction is carried out based on the facial image predicted value and the facial image true value, and the binding parameters are calculated to obtain the facial image binding parameters, so that the facial binding gives consideration to timing sequence fluency and space authenticity, and the facial binding effect is improved.
The above embodiments are only exemplary embodiments of the present application and are not intended to limit the present application, the scope of which is defined by the claims. Various modifications and equivalent arrangements of this application will occur to those skilled in the art, and are intended to be within the spirit and scope of the application.

Claims (10)

1. An intelligent binding method for facial images is characterized in that: the method comprises the following steps:
acquiring a face video, wherein the face video comprises a group of first face images connected according to time sequence;
Obtaining a group of second face images through a time sequence prediction network according to the group of first face images, wherein each second face image is positioned at the time sequence of each first face image, the second face images correspond to the time sequence prediction result of the group of first face images, and the time sequence prediction network is a neural network;
Performing feature extraction on the first face image and the second face image at each time sequence through a feature extraction network, and correspondingly obtaining binding face features at each time sequence, wherein the feature extraction network is a neural network;
carrying out facial image analysis on the binding facial features at each time sequence position through a facial image restoration model to obtain a third facial image, wherein the facial image restoration model is a neural network;
Carrying out facial image binding parameter measurement according to the third facial image at each time sequence through a parameter measurement network to obtain facial image binding parameters at each time sequence, wherein the parameter measurement network is a neural network, the facial image binding parameters correspond to Blender Shape binding parameters applied to a FLAME facial model, and the FLAME facial model is used for realizing automatic binding of the facial model to the facial image according to the Blender Shape binding parameters.
2. The intelligent binding method for facial images according to claim 1, wherein: obtaining a group of second face images through a time sequence prediction network according to the group of first face images, wherein the method comprises the following steps:
acquiring a time sequence of each first face image, and sequentially taking the time sequence as a prediction time sequence;
Deep learning is carried out on each first face facial image positioned at a pre-time sequence of the prediction time sequence by utilizing an LSTM network, so that a time sequence prediction model of a second face facial image at the prediction time sequence is obtained;
And predicting the second face image at each prediction time sequence by using a time sequence prediction model at each prediction time sequence to obtain a group of second face images.
3. The intelligent binding method for facial images according to claim 2, wherein:
The expression of the time sequence prediction network is as follows:
G2t=LSTM({G11,G12,G13,…,G1t-1});
t∈[3,N];
G21=G11,G22=G12
Wherein, G2 t is the second face image at the t-th time sequence, G1 1,G12,G13,…,G1t-1 is the first face image at the 1,2,3, t-1 time sequences respectively, N is the total number of time sequences in the face video, G2 1 is the second face image at the 1 st time sequence, G2 2 is the second face image at the 2 nd time sequence, and t is the counting variable.
4. A facial image intelligent binding method according to claim 3, wherein: performing feature extraction on the first face image and the second face image at each time sequence through a feature extraction network to correspondingly obtain binding face features at each time sequence, including:
The first face facial image and the second face facial image at each time sequence are simultaneously input into a feature extraction network, and the feature extraction network outputs the binding facial features at each time sequence.
5. The intelligent binding method for facial images according to claim 4, wherein: the construction of the feature extraction network comprises the following steps:
Randomly selecting a first face image and a second face image at a plurality of time sequences to serve as a first sample image and a second sample image respectively;
Inputting the first sample image at each timing to a first CNN neural network, outputting facial features of the first sample image by the first CNN neural network;
Inputting the second sample image at each time sequence to a second CNN neural network, outputting facial features of the second sample image by the second CNN neural network;
taking the difference between the facial features of the first sample image and the facial features of the second sample image as a loss function;
Based on the minimization of the loss function, learning and training the first CNN neural network and the second CNN neural network to obtain the feature extraction network;
The expression of the feature extraction network is as follows:
H3t=H1torH2t
Loss=MSE(H1t,H2t);
Where H3 t is the binding facial feature at the t-th time, H1 t is the facial feature of the first sample image at the t-th time, H2 t is the facial feature of the second sample image at the t-th time, G1 t is the first sample image at the t-th time, G2 t is the second sample image at the t-th time, CNN1 is the first CNN neural network, CNN 2 is the second CNN neural network, loss is the Loss function value, MSE (H1 t,H2t) is the mean square error between H1 t and H2 t, t is the count variable, or is the mathematical identifier of either.
6. The intelligent binding method for facial images according to claim 5, wherein: carrying out facial image analysis on the binding facial features at each time sequence position through a facial image restoration model to obtain a third facial image, wherein the facial image analysis comprises the following steps:
The bound facial features at each timing are input to a facial image restoration model, and a third facial image at each timing is output by the facial image restoration model.
7. The intelligent binding method for facial images according to claim 6, wherein: the construction of the facial image restoration model comprises the following steps:
Randomly selecting a first face image and a second face image at a plurality of time sequences to serve as a first sample image and a second sample image respectively;
Acquiring binding facial features corresponding to the first sample image and the second sample image at each time sequence by using a feature extraction network, and taking the binding facial features as binding sample facial features at each time sequence;
Carrying out mean value fusion on the first sample image and the second sample image at each time sequence to obtain a third sample image;
taking each time sequence binding sample facial feature as an input item of SRCNN neural network, and taking a third sample image at each time sequence as an output item of SRCNN neural network;
Learning and training an input item of the SRCNN neural network and an output item of the SRCNN neural network by utilizing the SRCNN neural network to obtain a facial image restoration model;
The expression of the facial image restoration model is:
G3t=SRCNN(H3t);
Where G3 t is the third sample image at the t-th timing, H3 t is the binding sample facial feature at the t-th timing, SRCNN is SRCNN neural network, H3 t=H1torH2t,H1t is the facial feature of the first sample image at the t-th timing, H2 t is the facial feature of the second sample image at the t-th timing, and t is the count variable.
8. The intelligent binding method for facial images according to claim 7, wherein: carrying out face image binding parameter calculation according to the third face image at each time sequence through a parameter calculation network to obtain face image binding parameters at each time sequence, wherein the face image binding parameters comprise:
Inputting the third facial image at each time sequence into a parameter measuring network, and outputting facial image binding parameters at each time sequence by the parameter measuring network;
the construction of the parameter measuring network comprises the following steps:
marking the facial image binding parameters of the first sample image to obtain the facial image binding parameters of the first sample image;
taking the first sample image as an input item of the BP neural network, and taking the facial image binding parameter of the first sample image as an output item of the BP neural network;
Learning and training an input item of the BP neural network and an output item of the BP neural network by using the BP neural network to obtain the parameter measuring and calculating network;
St=BP(G3t);
In the formula, S t is a facial image binding parameter at the t time sequence, G3 t is a third facial image at the t time sequence, BP is a BP neural network, and t is a counting variable.
9. An intelligent binding apparatus for facial images, comprising:
The data acquisition module is used for acquiring a face video, wherein the face video comprises a group of first face images connected according to time sequence;
The data processing module is used for obtaining a group of second face images through a time sequence prediction network according to a group of first face images;
The method comprises the steps of carrying out feature extraction on a first face image and a second face image at each time sequence through a feature extraction network, and correspondingly obtaining binding face features at each time sequence, wherein the feature extraction network is a neural network;
the facial image analysis method comprises the steps of carrying out facial image analysis on binding facial features at each time sequence position through a facial image restoration model to obtain a third facial image, wherein the facial image restoration model is a neural network; and
The parameter measurement network is a neural network, the facial image binding parameters correspond to Blender Shape binding parameters applied to a FLAME face model, and the FLAME face model is used for realizing automatic binding of the face model to the facial image according to the Blender Shape binding parameters;
the data storage module is used for storing a time sequence prediction network, a feature extraction network, a facial image restoration model and a parameter measurement network.
10. A computer device, characterized in that,
At least one processor; and
A memory communicatively coupled to the at least one processor;
wherein the memory stores instructions executable by the at least one processor to cause a computer device to perform the method of any of claims 1-8.
CN202311543676.0A 2023-11-17 2023-11-17 Intelligent binding method and device for facial images and computer equipment Pending CN117974852A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311543676.0A CN117974852A (en) 2023-11-17 2023-11-17 Intelligent binding method and device for facial images and computer equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311543676.0A CN117974852A (en) 2023-11-17 2023-11-17 Intelligent binding method and device for facial images and computer equipment

Publications (1)

Publication Number Publication Date
CN117974852A true CN117974852A (en) 2024-05-03

Family

ID=90852071

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311543676.0A Pending CN117974852A (en) 2023-11-17 2023-11-17 Intelligent binding method and device for facial images and computer equipment

Country Status (1)

Country Link
CN (1) CN117974852A (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210264563A1 (en) * 2019-04-26 2021-08-26 Tencent Technology (Shenzhen) Company Limited Method and apparatus for displaying face of virtual role, computer device, and readable storage medium
CN114820917A (en) * 2021-01-29 2022-07-29 宿迁硅基智能科技有限公司 Automatic facial skeleton binding migration method and system based on fbx file
CN115063847A (en) * 2022-04-29 2022-09-16 网易(杭州)网络有限公司 Training method and device for facial image acquisition model
CN115311394A (en) * 2022-07-27 2022-11-08 湖南芒果无际科技有限公司 Method, system, equipment and medium for driving digital human face animation
CN116863043A (en) * 2023-05-25 2023-10-10 度小满科技(北京)有限公司 Face dynamic capture driving method and device, electronic equipment and readable storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210264563A1 (en) * 2019-04-26 2021-08-26 Tencent Technology (Shenzhen) Company Limited Method and apparatus for displaying face of virtual role, computer device, and readable storage medium
CN114820917A (en) * 2021-01-29 2022-07-29 宿迁硅基智能科技有限公司 Automatic facial skeleton binding migration method and system based on fbx file
CN115063847A (en) * 2022-04-29 2022-09-16 网易(杭州)网络有限公司 Training method and device for facial image acquisition model
CN115311394A (en) * 2022-07-27 2022-11-08 湖南芒果无际科技有限公司 Method, system, equipment and medium for driving digital human face animation
CN116863043A (en) * 2023-05-25 2023-10-10 度小满科技(北京)有限公司 Face dynamic capture driving method and device, electronic equipment and readable storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
AI技术聚合 扎眼的阳光: "3D人脸模型Flame —-《Learning a model of facial shape and expression from 4D scans》论文讲解及代码注释", pages 1 - 15, Retrieved from the Internet <URL:https://aitechtogether.com/article/17540.html> *

Similar Documents

Publication Publication Date Title
CN110503074A (en) Information labeling method, apparatus, equipment and the storage medium of video frame
CN110245550B (en) Human face noise data set CNN training method based on total cosine distribution
CN113762133A (en) Self-weight fitness auxiliary coaching system, method and terminal based on human body posture recognition
CN107679522A (en) Action identification method based on multithread LSTM
US20220237917A1 (en) Video comparison method and apparatus, computer device, and storage medium
CN110490173B (en) Intelligent action scoring system based on 3D somatosensory model
Xu et al. Learning self-supervised space-time CNN for fast video style transfer
CN114373050A (en) Chemistry experiment teaching system and method based on HoloLens
CN114360018A (en) Rendering method and device of three-dimensional facial expression, storage medium and electronic device
CN113282840B (en) Comprehensive training acquisition management platform
CN110287912A (en) Method, apparatus and medium are determined based on the target object affective state of deep learning
CN116701706B (en) Data processing method, device, equipment and medium based on artificial intelligence
CN113066074A (en) Visual saliency prediction method based on binocular parallax offset fusion
CN117711066A (en) Three-dimensional human body posture estimation method, device, equipment and medium
CN117974852A (en) Intelligent binding method and device for facial images and computer equipment
CN112101306B (en) Fine facial expression capturing method and device based on RGB image
CN113723233B (en) Student learning participation assessment method based on hierarchical time sequence multi-example learning
CN116259104A (en) Intelligent dance action quality assessment method, device and system
CN115719497A (en) Student concentration degree identification method and system
Palanimeera et al. Yoga posture recognition by learning spatial-temporal feature with deep learning techniques
Chen et al. Movement Evaluation Algorithm‐Based Form Tracking Technology and Optimal Control of Limbs for Dancers
CN114785978A (en) Video image quality determination method for video conference
CN113706650A (en) Image generation method based on attention mechanism and flow model
CN112200739A (en) Video processing method and device, readable storage medium and electronic equipment
CN111597997A (en) Computer control teaching equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination