CN113095134B

CN113095134B - Facial expression extraction model generation method and device and facial image generation method and device

Info

Publication number: CN113095134B
Application number: CN202110251948.4A
Authority: CN
Inventors: 饶强; 黄旭为; 张国鑫
Original assignee: Beijing Dajia Internet Information Technology Co Ltd
Current assignee: Beijing Dajia Internet Information Technology Co Ltd
Priority date: 2021-03-08
Filing date: 2021-03-08
Publication date: 2024-03-29
Anticipated expiration: 2041-03-08
Also published as: CN113095134A

Abstract

The disclosure relates to facial expression extraction model generation and facial image generation methods and devices. The facial expression extraction model generation method comprises the steps of obtaining a three-dimensional facial material image and a source facial image; projecting the three-dimensional face material image to obtain a two-dimensional face material image corresponding to the three-dimensional face material image; migrating the facial expression in the source facial image to the two-dimensional facial material image to obtain a target material image; performing three-dimensional reconstruction according to the target material image to obtain a target three-dimensional face model; performing solution optimization on the target three-dimensional face model to obtain expression characteristic information corresponding to the target material image; constructing a training sample according to the target material image and expression characteristic information corresponding to the target material image; and training a preset neural network based on the training sample to obtain a facial expression extraction model. The method and the device improve the speed and accuracy of acquiring the expression characteristic information.

Description

Facial expression extraction model generation method and device and facial image generation method and device

Technical Field

The disclosure relates to the technical field of computer vision, in particular to a facial expression extraction model generation method and a facial image generation method and device.

Background

In the related technology, the expression of the real face can be migrated to the three-dimensional face material based on the three-dimensional reconstruction technology, so that the two-dimensional face material image corresponding to the migrated target three-dimensional face material can make the same expression along with the real face. In order to realize expression migration, expression feature information needs to be extracted, the expression feature information can be used for describing the action of a facial expression and the amplitude of the expression, the extraction accuracy of the facial expression feature information has obvious influence on migration effect, and the extraction accuracy and speed of the expression feature information in the related technology need to be further improved.

Disclosure of Invention

The disclosure provides a facial expression extraction model generation method and a facial image generation device, so as to at least solve the problem of low extraction accuracy of expression characteristic information in the related technology. The technical scheme of the present disclosure is as follows:

according to a first aspect of an embodiment of the present disclosure, there is provided a facial expression extraction model generation method, including:

acquiring a three-dimensional face material image and a source face image;

projecting the three-dimensional face material image to obtain a two-dimensional face material image corresponding to the three-dimensional face material image;

Migrating the facial expression in the source facial image to the two-dimensional facial material image to obtain a target material image;

performing three-dimensional reconstruction according to the target material image to obtain a target three-dimensional face model;

performing solution optimization on the target three-dimensional face model to obtain expression characteristic information corresponding to the target material image;

constructing a training sample according to the target material image and expression characteristic information corresponding to the target material image;

and training a preset neural network based on the training sample to obtain a facial expression extraction model.

In an exemplary embodiment, the preset neural network is an expression correction network;

training a preset neural network based on the training sample to obtain a facial expression extraction model, wherein the training sample comprises the following steps:

inputting the target material images in the training samples into an expression prediction network to obtain expression prediction information;

inputting the expression prediction information into the expression correction network to obtain corrected expression information;

obtaining correction loss according to the difference value between the corrected expression information and the expression characteristic information in the training sample;

training the expression correction network according to the correction loss;

And determining the expression prediction network and the trained expression correction network as the facial expression extraction model.

In an exemplary embodiment, the transferring the facial expression in the source face image to the two-dimensional face material image to obtain a target material image includes:

determining a reference face image in at least one of the source face images; the reference face image is the source face image with the minimum difference between the facial expression and the non-expression state;

inputting each source face image into a first-order motion model to extract expression change parameters, and obtaining expression change parameters corresponding to each source face image; the expression change parameters represent the change quantity of the expression description parameters of each source facial image relative to the expression description parameters of the reference facial image;

inputting the two-dimensional face material image into the first-order motion model for expression parameter extraction processing to obtain expression description parameters of the two-dimensional face material image;

obtaining expression parameters of the target material according to the expression change parameters and the expression description parameters of the two-dimensional face material image;

And generating the target material image according to the target material expression parameter and the target face material image.

In an exemplary embodiment, the expression change parameters include a key point position change parameter and a motion state change parameter; the expression description parameters comprise key point position parameters and motion state description parameters, and the target material expression parameters are obtained according to the expression change parameters and the expression description parameters of the two-dimensional face material images, and the method comprises the following steps:

obtaining the key point position parameters of the target material according to the key point position change parameters and the key point position parameters of the two-dimensional face material image;

obtaining the motion state description parameters of the target materials according to the motion state change parameters and the motion state description parameters of the two-dimensional face material images;

and obtaining the expression parameters of the target material according to the key point position parameters of the target material and the motion state description parameters of the target material.

In an exemplary embodiment, the performing a solution optimization on the target three-dimensional face model to obtain expression feature information corresponding to the target material image includes:

Acquiring an energy item corresponding to each key point in the target three-dimensional face model, wherein the energy item takes expression characteristic information as an independent variable;

and calculating the energy items corresponding to the key points by taking the sum value of the energy items corresponding to the minimum key points as a target to obtain expression characteristic information corresponding to the target material image.

According to a second aspect of the embodiments of the present disclosure, there is provided a face image generating method, including:

acquiring a reference face image and a three-dimensional face material;

inputting the reference face image into a facial expression extraction model to obtain expression characteristic information of the reference face image;

carrying out three-dimensional reconstruction according to the three-dimensional face material and the expression characteristic information of the reference face image to obtain a three-dimensional mapping face model;

projecting the three-dimensional mapping face model to obtain a target face image;

the facial expression extraction model is obtained according to the facial expression extraction model generation method of any one of the first aspect.

In an exemplary embodiment, the inputting the reference facial image into a facial expression extraction model to obtain expression feature information of the reference facial image includes:

Inputting the reference facial image into an expression prediction network of the facial expression extraction model to obtain expression prediction information of the reference facial image;

and inputting the expression prediction information of the reference face image into an expression correction network of the facial expression extraction model to obtain corrected expression information of the reference face image, and determining the corrected expression information of the reference face image as expression characteristic information of the reference face image.

According to a third aspect of the embodiments of the present disclosure, there is provided a facial expression extraction model generating apparatus, including:

an image acquisition module configured to perform acquisition of a three-dimensional face material image and a source face image;

the two-dimensional face material image acquisition module is configured to perform projection on the three-dimensional face material image to obtain a two-dimensional face material image corresponding to the three-dimensional face material image;

the target material image acquisition module is configured to perform migration of the facial expression in the source facial image to the two-dimensional facial material image to obtain a target material image;

the three-dimensional reconstruction module is configured to execute three-dimensional reconstruction according to the target material image to obtain a target three-dimensional face model;

The solution optimization module is configured to perform solution optimization on the target three-dimensional face model to obtain expression characteristic information corresponding to the target material image;

the training sample construction module is configured to execute construction of a training sample according to the target material image and expression characteristic information corresponding to the target material image;

and the training module is configured to execute training of a preset neural network based on the training sample to obtain a facial expression extraction model.

In an exemplary embodiment, the preset neural network includes an expression correction network, and the training module includes;

the expression prediction unit is configured to input target material images in the training samples into an expression prediction network to obtain expression prediction information;

the expression information prediction unit after correction is configured to input the expression prediction information into the expression correction network to obtain expression information after correction;

a correction loss calculation unit configured to perform obtaining a correction loss according to a difference value between the corrected expression information and expression characteristic information in the training sample;

an expression correction network training unit configured to perform training of the expression correction network according to the correction loss;

And a facial expression extraction model determination unit configured to perform determination of the expression prediction network and the trained expression correction network as the facial expression extraction model.

In an exemplary embodiment, the target material image acquisition module includes:

a reference face image determination unit configured to perform determination of a reference face image in at least one of the source face images; the reference face image is the source face image with the minimum difference between the facial expression and the non-expression state;

the expression change parameter acquisition unit is configured to execute the process of inputting each source face image into a first-order motion model to extract expression change parameters, so as to obtain expression change parameters corresponding to each source face image; the expression change parameters represent the change quantity of the expression description parameters of each source facial image relative to the expression description parameters of the reference facial image;

the expression description parameter acquisition unit is configured to input the two-dimensional face material image into the first-order motion model for expression parameter extraction processing, so as to obtain expression description parameters of the two-dimensional face material image;

The target material expression parameter determining unit is configured to execute the expression description parameters according to the expression change parameters and the two-dimensional face material images to obtain target material expression parameters;

and a target material image generation unit configured to perform generation of the target material image according to the target material expression parameter and the target face material image.

In an exemplary embodiment, the expression change parameters include a key point position change parameter and a motion state change parameter; the expression description parameters comprise key point position parameters and motion state description parameters, and the target material expression parameter determining unit is configured to execute the key point position parameters according to the key point position change parameters and the key point position parameters of the two-dimensional face material image to obtain the key point position parameters of the target material; obtaining the motion state description parameters of the target materials according to the motion state change parameters and the motion state description parameters of the two-dimensional face material images; and obtaining the expression parameters of the target material according to the key point position parameters of the target material and the motion state description parameters of the target material.

In an exemplary embodiment, the solution optimization module is configured to perform obtaining an energy item corresponding to each key point in the target three-dimensional face model, where the energy item uses expression feature information as an independent variable; and calculating the energy items corresponding to the key points by taking the sum value of the energy items corresponding to the minimum key points as a target to obtain expression characteristic information corresponding to the target material image.

According to a fourth aspect of the embodiments of the present disclosure, there is provided a face image generating apparatus, including:

an image material acquisition module configured to perform acquisition of a reference face image and a three-dimensional face material;

the expression characteristic information acquisition module is configured to input the reference face image into a face expression extraction model to obtain expression characteristic information of the reference face image;

the three-dimensional mapping face model construction module is configured to execute three-dimensional reconstruction according to the three-dimensional face material and the expression characteristic information of the reference face image to obtain a three-dimensional mapping face model

The target face image output module is configured to perform projection on the three-dimensional mapping face model to obtain a target face image;

In an exemplary embodiment, the expression feature information obtaining module is configured to perform an expression prediction network that inputs the reference facial image into the facial expression extraction model, so as to obtain expression prediction information of the reference facial image; and inputting the expression prediction information of the reference face image into an expression correction network of the facial expression extraction model to obtain corrected expression information of the reference face image, and determining the corrected expression information of the reference face image as expression characteristic information of the reference face image.

According to a fifth aspect of embodiments of the present disclosure, there is provided an electronic device, comprising:

a processor;

a memory for storing the processor-executable instructions;

wherein the processor is configured to execute the instructions to implement the facial expression extraction model generation method according to any one of the first aspect or the facial image generation method according to any one of the second aspect.

According to a sixth aspect of embodiments of the present disclosure, a computer readable storage medium, when instructions in the computer readable storage medium are executed by a processor of an electronic device, enables the electronic device to perform the facial expression extraction model generation method as set forth in any one of the first aspects or the facial image generation method as set forth in any one of the second aspects.

According to a seventh aspect of embodiments of the present disclosure, there is provided a computer program product comprising a computer program which, when executed by a processor, implements the facial expression extraction model generation method as described in any one of the first aspects or the facial image generation method as described in any one of the second aspects.

The technical scheme provided by the embodiment of the disclosure at least brings the following beneficial effects:

according to the method and the device, the three-dimensional expression characteristic information corresponding to the face image can be obtained in real time by generating the expression extraction model, complicated three-dimensional modeling and solution optimization processes are not needed, the speed of obtaining the expression characteristic information is improved, the accuracy of the expression characteristic information is obviously improved, the target face image can be generated based on the obtained expression extraction model, the generated target face image can restore the expression in the reference face image with higher accuracy, the generation speed is high, and the effect of carrying out expression migration in real time is achieved.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the disclosure and together with the description, serve to explain the principles of the disclosure and do not constitute an undue limitation on the disclosure.

FIG. 1 is an application environment diagram illustrating a face image generation method according to an exemplary embodiment;

FIG. 2 is a flowchart illustrating a method of facial expression extraction model generation, according to an exemplary embodiment;

FIG. 3 is a flowchart illustrating a method of acquiring a source face image, according to an exemplary embodiment;

fig. 4 is a flowchart showing obtaining a target material image by migrating a facial expression in a source face image according to another exemplary embodiment;

FIG. 5 is a flowchart illustrating the calculation of target material expression parameters according to an exemplary embodiment;

fig. 6 is a flowchart illustrating calculation of expression feature information corresponding to a target material image by a solution optimization method according to an exemplary embodiment;

FIG. 7 is a flowchart illustrating a facial expression extraction model obtained by training a preset neural network, according to an exemplary embodiment;

FIG. 8 is a flowchart illustrating a face image generation method according to an exemplary embodiment;

FIG. 9 is a block diagram illustrating a facial expression extraction model generation apparatus according to one exemplary embodiment;

FIG. 10 is a block diagram of a face image generation apparatus according to an exemplary embodiment;

fig. 11 is a block diagram of an electronic device, according to an example embodiment.

Detailed Description

In order to enable those skilled in the art to better understand the technical solutions of the present disclosure, the technical solutions of the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings.

It should be noted that the terms "first," "second," and the like in the description and claims of the present disclosure and in the foregoing figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the disclosure described herein may be capable of operation in sequences other than those illustrated or described herein. The implementations described in the following exemplary examples are not representative of all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with some aspects of the present disclosure as detailed in the accompanying claims.

At present, based on a three-dimensional reconstruction technology, the expression of a face can be mapped to a three-dimensional face material, so that the three-dimensional face material can make the same expression along with a real face, such as an animeji application of apples and an Emoji magic table of fast hands. In the application of migrating the facial expression to the three-dimensional facial material, in order to restore the facial expression actions, such as mouth opening, mouth tilting and the like, the expression characteristic information needs to be extracted, but the extraction accuracy is not high in the related art, which is generally based on deep learning.

In order to extract more accurate expression characteristic information from a facial image, the disclosure provides a facial expression extraction model generation method and a facial expression extraction model generated based on the facial expression extraction model generation method, and provides a facial image generation method.

Referring to fig. 1, an application environment diagram of a face image generation method according to an exemplary embodiment is shown, where the application environment may include a terminal 110 and a server 120.

The terminal 110 may be, but is not limited to, a smart phone, a tablet computer, a notebook computer, a desktop computer, etc. The terminal 110 may have a client running therein that is served by the server 120 in the background. The client may select a three-dimensional face material from the three-dimensional face material library provided by the server 120, acquire a reference face image, send the acquired reference face image and the three-dimensional face material to the server 120, and acquire and display a target face image returned by the server 120. For example, the client may capture a reference face image of a laugh of a user, select a three-dimensional face material of a person in the three-dimensional face material library, send the reference face image of the laugh of the user and the three-dimensional face material of the person to the server 120, obtain a target face image of the laugh of the person returned by the server 120, and display the target face image.

The server 120 may synthesize a target face image according to the reference face image and the three-dimensional face material transmitted from the client 110, and send the target face image to the client 110. The server 120 may also be used to train a facial expression extraction model.

The server 120 shown in fig. 1 may be an independent physical server, or may be a server cluster or a distributed system formed by a plurality of physical servers, and the terminal 110 and the server 120 may be connected through a wired network or a wireless network.

Fig. 2 is a flowchart illustrating a facial expression extraction model generation method according to an exemplary embodiment, and as shown in fig. 2, the facial expression extraction model generation method is applied to the server 120 shown in fig. 1 for explanation, including the following steps.

In step S10, a three-dimensional face material image and a source face image are acquired.

In the embodiment of the disclosure, the three-dimensional face material image can be selected from a preset three-dimensional face material library, and the content in the three-dimensional face material image library is not limited, so that each three-dimensional face material image comprises a face.

In the embodiment of the present disclosure, the source face image may be derived from a video file or may be captured by an image capturing device, and the present disclosure is not limited to the source of the source face image. The number of source face images is not limited in this disclosure, and may be one or more.

In some embodiments, the at least one source face image may be obtained from a video file. As shown in fig. 3, fig. 3 is a flowchart illustrating a method of acquiring a source face image according to an exemplary embodiment, including:

in step S11, a video file is acquired.

In step S12, the video file is parsed to obtain an image frame sequence.

In step S13, image frames including a face in the image frame sequence are sequentially extracted, so as to obtain at least one source face image.

According to the embodiment of the disclosure, at least one source face image can be obtained by analyzing the existing video file, and the abundant source face image acquisition sources are beneficial to optimizing the quality of the training sample, so that the acquisition speed of the training sample is improved, and the training speed is improved.

In step S20, the three-dimensional face material image is projected to obtain a two-dimensional face material image corresponding to the three-dimensional face material image.

Specifically, the method comprises the step of carrying out plane projection on the three-dimensional face material image to obtain a two-dimensional face material image corresponding to the three-dimensional face material image. For the method of planar projection, reference may be made to the related art, and this will not be repeated.

In step S30, the facial expression in the source face image is migrated to the two-dimensional face material image, so as to obtain a target material image.

In the embodiment of the disclosure, the source face image and the two-dimensional face material image may be input into a first-order motion model, and the migration of the expression may be implemented based on the first-order motion model, so as to obtain the target material image.

A first-order motion model (First Order Motion Model, FOMM) in the present disclosure may be used to perform expression migration, and illustratively, a source face image and a two-dimensional face material image are input to the first-order motion model, and the first-order motion model may migrate the expression of the source face image to the two-dimensional face material image, so as to obtain a target material image having the expression of the source face image. The idea of the first-order motion model is to build a complex motion model by using a group of self-learning key points and local affine transformation, so as to solve the problem of poor generation quality of the traditional model under the condition of dynamic change of a large target posture. In an embodiment of the present disclosure, the first-order motion model may include an expression feature extraction module and an image generation module; the expression feature extraction module can be used for extracting expression features of faces in the input pictures, and the image generation module can realize expression migration by using the input pictures.

The method comprises the steps of inputting a source face image of a facial mask and a face material of a person into a first-order motion model, extracting first expression description parameters corresponding to facial mask expressions of the source face image of the facial mask through an expression feature extraction module in the first-order motion model, extracting second expression description parameters of the face material, and obtaining third expression description parameters of facial mask expressions of the face material according to the first expression description parameters and the second expression description parameters by an image generation module in the first-order motion model, and generating target face material according to the third expression description parameters and the face material.

The first-order motion model output expression description parameters in the embodiment of the disclosure comprise key point position parameters and motion state description parameters. The motion state description parameter may be expressed by a jacobian matrix, which may be a first-order expansion coefficient matrix of a variable quantity within a preset range of key points.

The source face image in an embodiment of the present disclosure may be identified as s _i (i is a positive integer less than or equal to N, N is the total number of source face images), and the corresponding target material image can be identified as d _j (j is a positive integer less than or equal to N, N is the total number of source face images), if i=j, s _i And d _j Has the same expression. In the embodiment of the disclosure, expression migration is performed through a first-order motion model, and the structure, parameters and training method of the first-order motion model can refer to related technologies, which are not described herein again.

For example, there are three source face images, namely cry, facial distortion and smile, and three target material images are corresponding, and the three source face images have cry, facial distortion and smile expressions.

In some embodiments, as shown in fig. 4, fig. 4 is a flowchart illustrating migration of a facial expression in the source face image to the two-dimensional face material image to obtain a target material image according to an exemplary embodiment, where the flowchart includes:

in step S31, a reference face image is determined from at least one of the source face images; and the reference face image is the source face image with the least difference between the facial expression and the non-expression state.

The embodiment of the disclosure is not limited to a specific selection manner of the reference face image, and a source face image closest to the non-expression face image may be selected from the at least one source face image as the reference face image. For example, three source face images respectively have cry, facial distortion and smiling expression, and the smiling expression is closest to the non-expression, so that the source face image with the smiling expression can be determined as the reference face image. For another example, if there are 4 source face images, which are cry, laugh, frowning and no expression, the source face image without expression is used as the reference face image.

In some embodiments, the reference face image may be automatically determined based on the first-order motion model, the first-order motion model obtains expression description parameters of each of the at least one source face image, determines a source face image with the smallest difference from the non-expression state according to the expression description parameters of each of the at least one source face image, and determines the source face image as the reference face image.

In step S32, inputting each of the source face images into a first-order motion model to perform expression change parameter extraction processing, so as to obtain expression change parameters corresponding to each of the source face images; the expression change parameter characterizes the change amount of the expression description parameter of each source face image relative to the expression description parameter of the reference face image.

For each source face image, the first-order motion model can output expression description parameters of the source face image, and can determine expression change parameters according to the expression description parameters of the source face image and the expression description parameters of the reference face image. In the embodiment of the disclosure, the expression description parameters include a key point position parameter and a motion state description parameter, and correspondingly, the expression change parameters include a key point position change parameter and a motion state change parameter, wherein the motion state change parameter can also be expressed by using a jacobian matrix.

For a source face image s _i The corresponding expression description parameters can be obtained, and concretely, the expression description parameters comprise a key point position parameter kp _i And a motion state description parameter jacob _i The method comprises the steps of carrying out a first treatment on the surface of the Key point location parameters of the reference face image may use kp _ref The motion state description parameter of the reference face image can be represented by using jacob _ref And (3) representing.

The expression change parameter of the source face image with respect to the reference face image can be expressed as Δkp _i And Δjacob _i Specifically, Δkp _i ＝kp _i -kp _ref ；Δjacob _i ＝jacob _i *jacob _ref ^-1 。

In step S33, the two-dimensional face material image is input into the first-order motion model to perform expression parameter extraction processing, so as to obtain expression description parameters of the two-dimensional face material image.

After the two-dimensional face material image is input into the first-order motion model, the first-order motion model can obtain expression description parameters of the two-dimensional face material image, including key point position parameters kp ^m And a motion description state description parameter jacob ^m M represents two-dimensional face material.

In step S34, according to the expression change parameter and the expression description parameter of the two-dimensional face material image, a target material expression parameter is obtained.

In some embodiments, as shown in fig. 5, fig. 5 is a flowchart illustrating obtaining expression parameters of a target material according to the expression change parameters and the expression description parameters of the two-dimensional face material image according to an exemplary embodiment, where the flowchart includes:

In step S341, the key point position parameter of the target material is obtained according to the key point position change parameter and the key point position parameter of the two-dimensional face material image.

In some embodiments, the formula kp may be followed _i ^m ＝kp ^m +sΔkp _i Obtaining key point position parameters of the target material, wherein kp _i ^m Key point position parameter, kp, representing target material corresponding to the source face image ^m Is the key point position parameter of the two-dimensional face material image, delta kp _i And the key point position change parameter of the source face image relative to the reference face image is represented, and s represents a proportionality constant determined according to the face area of the two-dimensional face material.

In step S342, the motion state description parameter of the target material is obtained according to the motion state variation parameter and the motion state description parameter of the two-dimensional face material image.

In some embodiments, the formula jacob may be followed _i ^m ＝jacob ^m *Δjacob _i Obtaining a motion state description parameter of a target material, wherein jacob _i ^m A motion state description parameter representing a target material image corresponding to the source face image, jacob ^m Describing parameters delta jacob for motion states of two-dimensional face materials _i And the motion state change parameters of the source face image relative to the reference face image are represented.

In step S343, the expression parameters of the target material are obtained according to the key point position parameters of the target material and the motion state description parameters of the target material.

In some embodiments, the method may be according to formula I _i ^m ＝M ₁ (kp _ref ,kp _i ^m ,jacob _i ^m ) And obtaining the target material image. Wherein I is _i ^m Representing a source face image s _i Corresponding target material image M ₁ Method for generating image representing target material, the present disclosure is not limited to M ₁ Can realize M by a trained first-order motion model ₁ Is provided. kp (kP) _ref The key point position parameter in the expression description parameter representing the reference source face image can be obtained based on a first-order motion model.

According to the embodiment of the disclosure, the key point position parameters and the motion state description parameters of the related image output by the first-order motion model are fully utilized, the target material expression parameters for generating the target material image are obtained through calculation, the expression capacity of the target material expression parameters for expressing the source face image is improved, and the fidelity of the target material image in restoring the expression of the source face image is finally improved.

In step S35, the target material image is generated based on the target material expression parameter and the target face material image.

According to the embodiment of the disclosure, the target material image with the same expression as the source face image can be generated based on the first-order motion model, and the generation quality of the target material image can be ensured.

In step S40, three-dimensional reconstruction is performed according to the target material image, so as to obtain a target three-dimensional face model.

In the embodiment of the disclosure, a specific method for performing three-dimensional reconstruction is not limited, and for example, three-dimensional reconstruction may be performed based on 3DMM (3D Morphable Model, three-dimensional face variability model). In the field of computer vision, reconstructing the shape and texture of a three-dimensional face from a single face picture is an important research topic. The 3DMM (3D Morphable Model, three-dimensional face variability model) based method can successfully reconstruct three-dimensional faces from a single face picture.

In step S50, the target three-dimensional face model is subjected to a solution optimization to obtain expression feature information corresponding to the target material image.

In the embodiment of the disclosure, by performing solution optimization on the target three-dimensional face model, three-dimensional expression characteristic information can be obtained, and the three-dimensional expression characteristic information corresponds to the target material. For example, if three target material images are respectively provided with cry, facial distortion and smile expression, three groups of expression characteristic information can be obtained through corresponding solution optimization, and the three groups of expression characteristic information respectively correspond to the cry three-dimensional expression characteristic information, the facial distortion three-dimensional expression characteristic information and the smile three-dimensional expression characteristic information.

In some embodiments, the energy item of the target three-dimensional face model may be obtained, and the expression feature information corresponding to the target material image may be obtained by performing a minimum solution optimization on the energy item of the target three-dimensional face model. Specifically, as shown in fig. 6, fig. 6 is a flowchart illustrating that the target three-dimensional face model is subjected to solution optimization to obtain expression feature information corresponding to the target material image according to an exemplary embodiment, where the flowchart includes:

in step S51, an energy item corresponding to each key point in the target three-dimensional face model is obtained, where the energy item uses expression feature information as an independent variable.

Specifically, the energy term corresponding to each keypoint may be represented as And R represents a scale variable and a perceptual transformation matrix, respectively, t is a translation, s _k Is the key point coordinate, C _r Is bilinear face model, w ^T _id The face coefficients of the individual are related to two-dimensional face material images, w ^T _exp Expression feature information is represented.

In step S52, the sum value of the energy items corresponding to the key points is minimized, and the energy items corresponding to the key points are calculated to obtain the expression feature information corresponding to the target material image.

The embodiment of the disclosure is not limited to a specific solution, for example, a nonlinear optimization method, a least square optimization method and the like can be used for solution optimization, and expression characteristic information corresponding to the target material image can be obtained through solution optimization.

In the embodiment of the disclosure, by performing solution optimization on the target three-dimensional facial model, three-dimensional expression characteristic information of the target material image can be obtained, the three-dimensional expression characteristic information contains richer expression details, a training sample is constructed based on the expression characteristic information, and the accuracy of the facial expression extraction model obtained by training can be improved.

In step S60, a training sample is constructed according to the target material image and expression feature information corresponding to the target material image.

In step 70, training a preset neural network based on the training samples to obtain a facial expression extraction model.

In some embodiments, a preset neural network may be trained, that is, an image may be input to the trained preset neural network, and three-dimensional expression feature information of the image may be directly output. In order to obtain the expression extraction model through training, a training sample for training the expression extraction model needs to be constructed. In this embodiment of the present disclosure, the training samples may include the target material image and expression feature information corresponding to the target material image, and a new training sample may be obtained by changing a source face image and/or a three-dimensional face material image, so as to generate a training sample set including a large number of the training samples, and training the preset neural network according to the training sample set, so as to obtain the expression extraction model. The structure of the neural network is not limited in this disclosure.

In other embodiments, an expression extraction network may be trained, that is, an image is input, three-dimensional expression prediction information may be obtained based on an existing expression prediction network in the expression extraction network, and then the expression prediction information may be corrected based on the preset neural network in the expression extraction network, so as to obtain three-dimensional expression feature information with higher accuracy. In the embodiment of the disclosure, the preset neural network is actually used for performing the expression correction, which is referred to as an expression correction network in the embodiment of the disclosure, and the existing expression prediction network and the trained correction network may be determined as the expression extraction model. In order to obtain the expression extraction model through training, a training sample for training the expression extraction model needs to be constructed, in this case, please refer to fig. 7, fig. 7 is a flowchart illustrating training a preset neural network based on the training sample to obtain a facial expression extraction model according to an exemplary embodiment, which includes:

in step S71, the target material image in the training sample is input into an expression prediction network to obtain expression prediction information.

In the embodiment of the disclosure, a new training sample can be obtained by changing the source face image and/or the three-dimensional face material image, so as to form a training sample set, and the expression correction network can be trained according to each training sample in the set.

In step S72, the expression prediction information is input into the expression correction network, and corrected expression information is obtained.

In step S73, a correction loss is obtained according to the difference between the corrected expression information and the expression characteristic information in the training sample.

In step S74, the expression correction network is trained according to the correction loss.

Specifically, parameters of the expression correction network may be adjusted according to the correction loss feedback until a preset training stop condition is reached. In one embodiment, if the correction loss is greater than or equal to a preset training threshold, the parameters of the network to be trained are adjusted based on the correction loss, and if the loss value is less than the training threshold, training is completed. Embodiments of the present disclosure are not limited to a particular training threshold, and may be, for example, 0.1 or 0.5. In another possible embodiment, the number of parameter adjustments of the expression correction network may be counted, and if the number of parameter adjustments is greater than a preset number threshold, it may be determined that the preset training stop condition is reached, and the embodiment of the present disclosure does not limit a specific value of the preset number threshold.

In step S75, the expression prediction network and the trained expression correction network are determined as the facial expression extraction model.

Specifically, the trained expression prediction network and the expression correction network obtained based on the training in the step S71-75 are sequentially connected in series, so that the facial expression extraction model can be obtained. The structures of the expression prediction network and the expression correction network are not limited in the present disclosure.

In the embodiment of the disclosure, the output result of the existing expression prediction network can be corrected by training the expression prediction network, so that the precision of the finally output three-dimensional expression characteristic information is higher. The three-dimensional expression characteristic information with higher precision is obtained by multiplexing the existing expression prediction network and correcting the output result, the utilization rate of the existing expression prediction model is improved, and the acquisition cost of the high-precision three-dimensional expression characteristic information is reduced.

According to the embodiment of the disclosure, the three-dimensional expression characteristic information corresponding to the two-dimensional face image can be obtained in real time by training the expression extraction model, complicated three-dimensional modeling and solution optimization processes are not needed, the speed of obtaining the expression characteristic information is improved, and the accuracy of the expression characteristic information is obviously improved.

Based on the expression extraction model obtained by training, the disclosure further shows a face image generating method, as shown in fig. 8, fig. 8 is a flowchart of the face image generating method according to an exemplary embodiment, where the method includes:

In step S10-1, a reference face image and a three-dimensional face material are acquired.

In step S20-1, the reference facial image is input into a facial expression extraction model to obtain expression characteristic information of the reference facial image.

In one embodiment, the reference facial image may be directly input into the facial expression extraction model, to obtain the expression feature information of the output reference facial image.

In another embodiment, the reference facial image may be input into an expression prediction network of the expression extraction model to obtain expression prediction information; and inputting the expression prediction information into an expression correction network of the expression extraction model to obtain expression characteristic information. The expression characteristic information with high accuracy can be obtained by multiplexing the existing expression prediction network, the utilization rate of the existing expression prediction network is improved, and the expression characteristic information with high accuracy is obtained on the premise of avoiding great transformation of the existing network.

In step S30-1, three-dimensional reconstruction is performed according to the three-dimensional face material and the expression characteristic information of the reference face image, so as to obtain a three-dimensional mapping face model.

The expression feature information output by the facial expression extraction model in the embodiment of the disclosure is three-dimensional expression feature information, and the three-dimensional mapping face model can be obtained by performing three-dimensional reconstruction according to the three-dimensional face material and the expression feature information, and the three-dimensional mapping face model can be obtained by performing three-dimensional reconstruction according to 3DMM (3D Morphable Model, three-dimensional face variability model).

In step S40-1, the three-dimensional mapping face model is projected to obtain a target face image.

According to the embodiment of the disclosure, the expression in the reference face image can be migrated to the generated target face image, the target face image uses three-dimensional face materials to generate a reference, for example, the reference face image is crying of a user A, the three-dimensional face materials are crying of a user B, and the target face image is crying of the user B.

According to the embodiment of the disclosure, the reference face image is input into the trained facial expression extraction model, so that high-precision three-dimensional expression characteristic information is obtained, three-dimensional reconstruction is performed based on the expression characteristic information and the three-dimensional face material, and a two-dimensional target face image is obtained according to a reconstruction result, so that the target face image can highly restore the expression in the reference face image, the purpose that the target face image changes the expression in real time according to the expression change of the reference face image is achieved, and the three-dimensional facial expression extraction model has higher fidelity.

Fig. 9 is a block diagram illustrating a facial expression extraction model generation apparatus according to an exemplary embodiment. Referring to fig. 9, the apparatus includes:

an image acquisition module 10 configured to perform acquisition of a three-dimensional face material image and a source face image;

A two-dimensional face material image acquisition module 20 configured to perform projection of the three-dimensional face material image to obtain a two-dimensional face material image corresponding to the three-dimensional face material image;

a target material image acquisition module 30 configured to perform migration of a facial expression in the source face image to the two-dimensional face material image to obtain a target material image;

a three-dimensional reconstruction module 40 configured to perform three-dimensional reconstruction according to the target material image, so as to obtain a target three-dimensional face model;

the solution optimization module 50 is configured to perform solution optimization on the target three-dimensional face model to obtain expression characteristic information corresponding to the target material image;

a training sample construction module 60 configured to perform construction of a training sample from the target material image and expression feature information corresponding to the target material image;

the training module 70 is configured to perform training on a preset neural network based on the training samples, so as to obtain a facial expression extraction model.

A corrected expression information prediction unit configured to perform inputting the expression prediction information into the expression correction network to obtain corrected expression information;

a correction loss calculation unit configured to obtain a correction loss according to a difference between the corrected expression information and the expression feature information in the training sample;

In an exemplary embodiment, the target material image obtaining module includes:

a reference face image determining unit configured to perform determination of a reference face image among at least one of the above-described source face images; the reference face image is the source face image with the least difference between the facial expression and the non-expression state;

the expression change parameter acquisition unit is configured to execute the process of inputting each source face image into a first-order motion model to extract expression change parameters so as to obtain expression change parameters corresponding to each source face image; the expression change parameters represent the change quantity of the expression description parameters of each source facial image relative to the expression description parameters of the reference facial image;

and a target material image generation unit configured to generate the target material image based on the target material expression parameter and the target face material image.

In an exemplary embodiment, the expression change parameters include a key point position change parameter and a motion state change parameter; the expression description parameters comprise key point position parameters and motion state description parameters, and the target material expression parameter determining unit is configured to execute the key point position parameters according to the key point position change parameters and the key point position parameters of the two-dimensional face material image to obtain the key point position parameters of the target material; obtaining the motion state description parameters of the target materials according to the motion state change parameters and the motion state description parameters of the two-dimensional face material images; and obtaining the expression parameters of the target materials according to the key point position parameters of the target materials and the motion state description parameters of the target materials.

In an exemplary embodiment, the solution optimization module is configured to obtain an energy item corresponding to each key point in the target three-dimensional face model, where the energy item uses expression feature information as an independent variable; and calculating the energy items corresponding to the key points by taking the sum value of the energy items corresponding to the minimum key points as a target to obtain expression characteristic information corresponding to the target material images.

Fig. 10 is a block diagram illustrating a face image generation apparatus according to an exemplary embodiment. Referring to fig. 10, the apparatus includes:

an image material acquisition module 10-1 configured to perform acquisition of a reference face image and a three-dimensional face material;

the expression feature information obtaining module 20-1 is configured to perform inputting the reference face image into a facial expression extraction model to obtain expression feature information of the reference face image;

a three-dimensional mapping face model construction module 30-1 configured to perform three-dimensional reconstruction according to the three-dimensional face material and the expression feature information of the reference face image to obtain a three-dimensional mapping face model

The target face image output module 40-1 is configured to perform projection on the three-dimensional mapping face model to obtain a target face image;

The facial expression extraction model generation method is obtained by the facial expression extraction model generation method described in the facial expression extraction model method embodiment.

In an exemplary embodiment, the expression feature information obtaining module is configured to perform an expression prediction network that inputs the reference facial image into the facial expression extraction model to obtain expression prediction information of the reference facial image; and inputting the expression prediction information of the reference face image into an expression correction network of the facial expression extraction model to obtain corrected expression information of the reference face image, and determining the corrected expression information of the reference face image as expression characteristic information of the reference face image.

The specific manner in which the various modules perform the operations in the apparatus of the above embodiments have been described in detail in connection with the embodiments of the method, and will not be described in detail herein.

In an exemplary embodiment, there is also provided an electronic device including a processor; a memory for storing processor-executable instructions; the processor is configured to implement the steps of any of the facial expression extraction model generation method or the facial image generation method in the above embodiments when executing the instructions stored on the memory.

The electronic device may be a terminal, a server, or a similar computing device, taking the electronic device as an example of a server, fig. 11 is a block diagram of an electronic device that illustrates a facial expression extraction model generation method or a facial image generation method according to an exemplary embodiment, where the electronic device 1000 may vary greatly according to configuration or performance, and may include one or more central processing units (Central Processing Units, CPU) 1010 (the processor 1010 may include, but is not limited to, a processing device such as a microprocessor MCU or a programmable logic device FPGA), a memory 1030 for storing data, and one or more storage media 1020 (such as one or more mass storage devices) for storing applications 1023 or data 1022. Wherein the memory 1030 and storage medium 1020 can be transitory or persistent storage. The program stored on the storage medium 1020 may include one or more modules, each of which may include a series of instruction operations in the electronic device. Still further, the central processor 1010 may be configured to communicate with a storage medium 1020 and execute a series of instruction operations in the storage medium 1020 on the electronic device 1000. The electronic device 1000 can also include one or more power supplies 1060, one or more wired or wireless network interfaces 1050, one or more input/output interfaces 1040, and/or one or more operating systems 1021, such as Windows ServerTM, mac OS XTM, unixTM, linuxTM, freeBSDTM, and the like.

Input-output interface 1040 may be used to receive or transmit data via a network. Specific examples of the network described above may include a wireless network provided by a communication provider of the electronic device 1000. In one example, input-output interface 1040 includes a network adapter (Network Interface Controller, NIC) that may be connected to other network devices via base stations to communicate with the internet. In an exemplary embodiment, the input/output interface 100 may be a Radio Frequency (RF) module for communicating with the internet in a wireless manner.

It will be appreciated by those of ordinary skill in the art that the configuration shown in fig. 11 is merely illustrative and is not intended to limit the configuration of the electronic device described above. For example, electronic device 1000 may also include more or fewer components than shown in FIG. 11 or have a different configuration than shown in FIG. 11.

In an exemplary embodiment, there is also provided a computer-readable storage medium, which when executed by a processor of an electronic device, causes the electronic device to perform the facial expression extraction model generation method or the facial image generation method provided in any one of the above embodiments.

In an exemplary embodiment, a computer program product is also provided, the computer program product comprising computer instructions stored in a computer readable storage medium. The processor of the electronic device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions, so that the electronic device performs the facial expression extraction model generation method or the facial image generation method provided in any one of the above embodiments.

Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the various embodiments provided herein may include non-volatile and/or volatile memory. The nonvolatile memory can include Read Only Memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), memory bus direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), among others.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any adaptations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It is to be understood that the present disclosure is not limited to the precise arrangements and instrumentalities shown in the drawings, and that various modifications and changes may be effected without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims

1. The facial expression extraction model generation method is characterized by comprising the following steps of:

acquiring a three-dimensional face material image and a source face image, wherein the source face image is obtained from a video file or is shot by shooting equipment;

training a preset neural network based on the training sample to obtain a facial expression extraction model;

the step of transferring the facial expression in the source face image to the two-dimensional face material image to obtain a target material image comprises the following steps:

inputting each source face image into a first-order motion model to extract expression change parameters, and obtaining expression change parameters corresponding to each source face image; the expression change parameters represent the change quantity of the expression description parameters of each source facial image relative to the expression description parameters of the reference facial image, and the expression change parameters comprise key point position change parameters and motion state change parameters; the expression description parameters comprise key point position parameters and motion state description parameters;

obtaining the key point position parameters of the target material according to the key point position change parameters and the key point position parameters of the two-dimensional face material image; obtaining the motion state description parameters of the target materials according to the motion state change parameters and the motion state description parameters of the two-dimensional face material images; obtaining expression parameters of the target material according to the key point position parameters of the target material and the motion state description parameters of the target material;

2. The method of claim 1, wherein the preset neural network includes an expression correction network, and the training the preset neural network based on the training sample to obtain the facial expression extraction model includes:

training the expression correction network according to the correction loss;

3. The method for generating a facial expression extraction model according to claim 1, wherein the performing solution optimization on the target three-dimensional facial model to obtain expression feature information corresponding to the target material image includes:

4. A method for generating a face image, the method comprising:

acquiring a reference face image and a three-dimensional face material;

the facial expression extraction model is obtained according to the facial expression extraction model generation method of any one of claims 1-3.

5. The face image generating method as defined in claim 4, wherein the step of inputting the reference face image into a face expression extraction model to obtain expression feature information of the reference face image includes:

6. A facial expression extraction model generation device, characterized by comprising:

the image acquisition module is configured to acquire a three-dimensional face material image and a source face image, wherein the source face image is obtained from a video file or is shot by shooting equipment;

the training module is configured to execute training of a preset neural network based on the training sample to obtain a facial expression extraction model;

7. The facial expression extraction model generation apparatus according to claim 6, wherein the preset neural network includes an expression correction network, and the training module includes:

8. The facial expression extraction model generation apparatus according to claim 6, wherein the solution optimization module is configured to perform obtaining an energy term corresponding to each key point in the target three-dimensional facial model, the energy term taking expression feature information as an independent variable; and calculating the energy items corresponding to the key points by taking the sum value of the energy items corresponding to the minimum key points as a target to obtain expression characteristic information corresponding to the target material image.

9. A face image generation apparatus, the apparatus comprising:

10. The face image generation apparatus according to claim 9, wherein the expression feature information acquisition module is configured to perform an expression prediction network that inputs the reference face image into the face expression extraction model, to obtain expression prediction information of the reference face image; and inputting the expression prediction information of the reference face image into an expression correction network of the facial expression extraction model to obtain corrected expression information of the reference face image, and determining the corrected expression information of the reference face image as expression characteristic information of the reference face image.

11. An electronic device, comprising:

a processor;

a memory for storing the processor-executable instructions;

wherein the processor is configured to execute the instructions to implement the facial expression extraction model generation method of any one of claims 1 to 3 or the facial image generation method of claim 4 or 5.

12. A computer readable storage medium, which when executed by a processor of an electronic device, causes the electronic device to perform the facial expression extraction model generation method of any one of claims 1 to 3 or the facial image generation method of claim 4 or 5.

13. A computer program product comprising a computer program, characterized in that the computer program when executed by a processor implements the facial expression extraction model generation method of any one of claims 1 to 3 or the facial image generation method of claim 4 or 5.