CN113095134A

CN113095134A - Facial expression extraction model generation method and device, and facial image generation method and device

Info

Publication number: CN113095134A
Application number: CN202110251948.4A
Authority: CN
Inventors: 饶强; 黄旭为; 张国鑫
Original assignee: Beijing Dajia Internet Information Technology Co Ltd
Current assignee: Beijing Dajia Internet Information Technology Co Ltd
Priority date: 2021-03-08
Filing date: 2021-03-08
Publication date: 2021-07-09
Anticipated expiration: 2041-03-08
Also published as: CN113095134B

Abstract

The disclosure relates to a facial expression extraction model generation method and a facial image generation method and device. The facial expression extraction model generation method comprises the steps of obtaining a three-dimensional face material image and a source face image; projecting the three-dimensional face material image to obtain a two-dimensional face material image corresponding to the three-dimensional face material image; migrating the facial expression in the source facial image to the two-dimensional facial material image to obtain a target material image; performing three-dimensional reconstruction according to the target material image to obtain a target three-dimensional face model; performing solution optimization on the target three-dimensional face model to obtain expression characteristic information corresponding to the target material image; constructing a training sample according to the target material image and expression characteristic information corresponding to the target material image; and training a preset neural network based on the training sample to obtain a facial expression extraction model. The method and the device improve the acquiring speed and accuracy of the expression characteristic information.

Description

Facial expression extraction model generation method and device, and facial image generation method and device

Technical Field

The disclosure relates to the technical field of computer vision, in particular to methods and devices for generating a facial expression extraction model and generating a facial image.

Background

In the related technology, the expression of a real face can be migrated to a three-dimensional face material based on a three-dimensional reconstruction technology, so that a two-dimensional face material image corresponding to a target three-dimensional face material obtained by migration can follow the real face to make the same expression. In order to realize expression migration, expression feature information needs to be extracted, the expression feature information can be used for describing actions of facial expressions and amplitude of the expressions, the precision of extraction of the facial expression feature information has obvious influence on the migration effect, and the extraction precision and speed of the expression feature information in the related technology need to be further improved.

Disclosure of Invention

The disclosure provides a facial expression extraction model generation method and a facial image generation method and device, and aims to at least solve the problem of low extraction accuracy of expression characteristic information in the related technology. The technical scheme of the disclosure is as follows:

according to a first aspect of the embodiments of the present disclosure, a method for generating a facial expression extraction model is provided, including:

acquiring a three-dimensional face material image and a source face image;

projecting the three-dimensional face material image to obtain a two-dimensional face material image corresponding to the three-dimensional face material image;

migrating the facial expression in the source facial image to the two-dimensional facial material image to obtain a target material image;

performing three-dimensional reconstruction according to the target material image to obtain a target three-dimensional face model;

performing solution optimization on the target three-dimensional face model to obtain expression characteristic information corresponding to the target material image;

constructing a training sample according to the target material image and the expression characteristic information corresponding to the target material image;

and training a preset neural network based on the training sample to obtain a facial expression extraction model.

In an exemplary embodiment, the preset neural network is an expression correction network;

the training of the preset neural network based on the training sample to obtain the facial expression extraction model comprises the following steps:

inputting the target material image in the training sample into an expression prediction network to obtain expression prediction information;

inputting the expression prediction information into the expression correction network to obtain corrected expression information;

obtaining a correction loss according to a difference value between the corrected expression information and the expression characteristic information in the training sample;

training the expression correction network according to the correction loss;

and determining the expression prediction network and the trained expression correction network as the facial expression extraction model.

In an exemplary embodiment, the migrating the facial expression in the source face image to the two-dimensional face material image to obtain a target material image includes:

determining a reference facial image in at least one of the source facial images; the reference facial image is the source facial image with the smallest difference between facial expressions and non-expression states;

inputting each source face image into a first-order motion model to perform expression change parameter extraction processing, so as to obtain an expression change parameter corresponding to each source face image; the expression change parameters represent the variation of the expression description parameters of each source face image relative to the expression description parameters of the reference face image;

inputting the two-dimensional face material image into the first-order motion model to perform expression parameter extraction processing to obtain expression description parameters of the two-dimensional face material image;

obtaining expression parameters of a target material according to the expression change parameters and expression description parameters of the two-dimensional face material image;

and generating the target material image according to the target material expression parameters and the target face material image.

In an exemplary embodiment, the expression change parameters include a key point position change parameter and a motion state change parameter; the expression description parameters comprise key point position parameters and motion state description parameters, and the expression parameters of the target material are obtained according to the expression change parameters and the expression description parameters of the two-dimensional face material image, and the method comprises the following steps:

obtaining the position parameters of the key points of the target material according to the position change parameters of the key points and the position parameters of the key points of the two-dimensional face material image;

obtaining motion state description parameters of the target material according to the motion state change parameters and the motion state description parameters of the two-dimensional face material image;

and obtaining the expression parameters of the target material according to the key point position parameters of the target material and the motion state description parameters of the target material.

In an exemplary embodiment, the performing solution optimization on the target three-dimensional face model to obtain expression feature information corresponding to the target material image includes:

acquiring an energy item corresponding to each key point in the target three-dimensional face model, wherein the energy item takes expression characteristic information as an independent variable;

and solving the energy items corresponding to the key points by taking the minimized sum value of the energy items corresponding to the key points as a target to obtain the expression characteristic information corresponding to the target material image.

According to a second aspect of the embodiments of the present disclosure, there is provided a face image generation method, including:

acquiring a reference face image and a three-dimensional face material;

inputting the reference face image into a face expression extraction model to obtain expression characteristic information of the reference face image;

performing three-dimensional reconstruction according to the three-dimensional face material and the expression characteristic information of the reference face image to obtain a three-dimensional mapping face model;

projecting the three-dimensional mapping face model to obtain a target face image;

the facial expression extraction model is obtained according to the method for generating the facial expression extraction model in any one of the first aspect.

In an exemplary embodiment, the inputting the reference facial image into a facial expression extraction model to obtain the expression feature information of the reference facial image includes:

inputting the reference facial image into an expression prediction network of the facial expression extraction model to obtain expression prediction information of the reference facial image;

and inputting the expression prediction information of the reference facial image into an expression correction network of the facial expression extraction model to obtain corrected expression information of the reference facial image, and determining the corrected expression information of the reference facial image as the expression characteristic information of the reference facial image.

According to a third aspect of the embodiments of the present disclosure, there is provided a facial expression extraction model generation apparatus, including:

an image acquisition module configured to perform acquiring a three-dimensional face material image and a source face image;

the two-dimensional face material image acquisition module is configured to project the three-dimensional face material image to obtain a two-dimensional face material image corresponding to the three-dimensional face material image;

the target material image acquisition module is configured to execute the migration of the facial expression in the source face image to the two-dimensional face material image to obtain a target material image;

the three-dimensional reconstruction module is configured to perform three-dimensional reconstruction according to the target material image to obtain a target three-dimensional face model;

the solution optimization module is configured to perform solution optimization on the target three-dimensional face model to obtain expression characteristic information corresponding to the target material image;

the training sample construction module is configured to execute construction of a training sample according to the target material image and expression characteristic information corresponding to the target material image;

and the training module is configured to train a preset neural network based on the training sample to obtain a facial expression extraction model.

In an exemplary embodiment, the preset neural network includes an expression correction network, and the training module includes;

the expression prediction unit is configured to input the target material image in the training sample into an expression prediction network to obtain expression prediction information;

a corrected expression information prediction unit configured to perform input of the expression prediction information into the expression correction network to obtain corrected expression information;

a correction loss calculation unit configured to perform calculation to obtain a correction loss according to a difference value between the corrected expression information and the expression feature information in the training sample;

an expression correction network training unit configured to perform training of the expression correction network according to the correction loss;

a facial expression extraction model determination unit configured to perform determination of the expression prediction network and the trained expression correction network as the facial expression extraction model.

In an exemplary embodiment, the target material image obtaining module includes:

a reference facial image determination unit configured to perform determination of a reference facial image in at least one of the source facial images; the reference facial image is the source facial image with the smallest difference between facial expressions and non-expression states;

the expression change parameter acquisition unit is configured to input each source face image into a first-order motion model for expression change parameter extraction processing to obtain expression change parameters corresponding to each source face image; the expression change parameters represent the variation of the expression description parameters of each source face image relative to the expression description parameters of the reference face image;

the expression description parameter acquisition unit is configured to input the two-dimensional face material image into the first-order motion model for expression parameter extraction processing to obtain expression description parameters of the two-dimensional face material image;

the target material expression parameter determining unit is configured to execute expression description parameters according to the expression change parameters and the two-dimensional face material image to obtain target material expression parameters;

and the target material image generating unit is configured to generate the target material image according to the target material expression parameters and the target face material image.

In an exemplary embodiment, the expression change parameters include a key point position change parameter and a motion state change parameter; the expression description parameters comprise key point position parameters and motion state description parameters, and the target material expression parameter determining unit is configured to execute the key point position parameters according to the key point position change parameters and the key point position parameters of the two-dimensional face material image to obtain the key point position parameters of the target material; obtaining motion state description parameters of the target material according to the motion state change parameters and the motion state description parameters of the two-dimensional face material image; and obtaining the expression parameters of the target material according to the key point position parameters of the target material and the motion state description parameters of the target material.

In an exemplary embodiment, the solution optimization module is configured to perform the step of obtaining an energy item corresponding to each key point in the target three-dimensional face model, where the energy item takes expression feature information as an independent variable; and solving the energy items corresponding to the key points by taking the minimized sum value of the energy items corresponding to the key points as a target to obtain the expression characteristic information corresponding to the target material image.

According to a fourth aspect of the embodiments of the present disclosure, there is provided a face image generation apparatus including:

an image material acquisition module configured to perform acquisition of a reference face image and a three-dimensional face material;

the expression characteristic information acquisition module is configured to input the reference face image into a facial expression extraction model to obtain expression characteristic information of the reference face image;

a three-dimensional mapping face model construction module configured to perform three-dimensional reconstruction according to the three-dimensional face material and the expression feature information of the reference face image to obtain a three-dimensional mapping face model

The target face image output module is configured to project the three-dimensional mapping face model to obtain a target face image;

In an exemplary embodiment, the expression feature information obtaining module is configured to perform an expression prediction network that inputs the reference facial image into the facial expression extraction model, so as to obtain expression prediction information of the reference facial image; and inputting the expression prediction information of the reference facial image into an expression correction network of the facial expression extraction model to obtain corrected expression information of the reference facial image, and determining the corrected expression information of the reference facial image as the expression characteristic information of the reference facial image.

According to a fifth aspect of embodiments of the present disclosure, there is provided an electronic apparatus including:

a processor;

a memory for storing the processor-executable instructions;

wherein the processor is configured to execute the instructions to implement the facial expression extraction model generation method according to any one of the first aspect or the facial image generation method according to any one of the second aspect.

According to a sixth aspect of embodiments of the present disclosure, a computer-readable storage medium, wherein instructions, when executed by a processor of an electronic device, enable the electronic device to perform the facial expression extraction model generation method according to any one of the first aspects or the facial image generation method according to any one of the second aspects.

According to a seventh aspect of embodiments of the present disclosure, there is provided a computer program product comprising a computer program which, when executed by a processor, implements the facial expression extraction model generation method according to any one of the first aspects or the facial image generation method according to any one of the second aspects.

The technical scheme provided by the embodiment of the disclosure at least brings the following beneficial effects:

according to the method, the three-dimensional expression characteristic information corresponding to the facial image can be obtained in real time by generating the expression extraction model, and complicated three-dimensional modeling and optimization solving processes are not needed, so that the obtaining speed of the expression characteristic information is improved, the precision of the expression characteristic information is obviously improved, the target facial image can be generated based on the obtained expression extraction model, the generated target facial image can restore the expression in the reference facial image with higher precision, the generation speed is higher, and the effect of performing expression migration in real time is achieved.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the principles of the disclosure and are not to be construed as limiting the disclosure.

FIG. 1 is a diagram illustrating an application environment for a method for generating a face image according to an exemplary embodiment;

FIG. 2 is a flow diagram illustrating a method for generating a facial expression extraction model in accordance with an exemplary embodiment;

FIG. 3 is a flow diagram illustrating a method of acquiring a source face image in accordance with one illustrative embodiment;

FIG. 4 is a flowchart illustrating the migration of facial expressions in a source face image to obtain a target material image in accordance with another exemplary embodiment;

FIG. 5 is a flow diagram illustrating the calculation of target material expression parameters according to an exemplary embodiment;

FIG. 6 is a flowchart illustrating the calculation of expressive feature information corresponding to a target material image by a solution optimization method according to an exemplary embodiment;

FIG. 7 is a flowchart illustrating a process for deriving a facial expression extraction model by training a pre-defined neural network, according to an exemplary embodiment;

FIG. 8 is a flow diagram illustrating a method of generating a face image according to an exemplary embodiment;

FIG. 9 is a block diagram illustrating a facial expression extraction model generation apparatus in accordance with an exemplary embodiment;

FIG. 10 is a block diagram illustrating a face image generation apparatus according to an exemplary embodiment;

FIG. 11 is a block diagram illustrating an electronic device in accordance with an example embodiment.

Detailed Description

In order to make the technical solutions of the present disclosure better understood by those of ordinary skill in the art, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings.

It should be noted that the terms "first," "second," and the like in the description and claims of the present disclosure and in the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the disclosure described herein are capable of operation in sequences other than those illustrated or otherwise described herein. The implementations described in the exemplary embodiments below are not intended to represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.

At present, the expression of a face can be mapped to a three-dimensional face material based on a three-dimensional reconstruction technology, so that the three-dimensional face material can make the same expression along with a real face, such as an Animoji application of an apple and an Emoji magic watch of a fast hand. In an application of migrating a facial expression to a three-dimensional face material, in order to restore facial expression actions, such as opening mouth, tilting mouth and the like, expression feature information needs to be extracted, and in the related art, expression feature information is usually extracted based on deep learning, but the extraction accuracy is not high.

In order to extract more accurate expression characteristic information from a face image, the present disclosure provides a facial expression extraction model generation method, and a facial expression extraction model generated based on the facial expression extraction model generation method, and provides a face image generation method.

Referring to fig. 1, an application environment of a face image generation method according to an exemplary embodiment is shown, where the application environment may include a terminal 110 and a server 120.

The terminal 110 may be, but is not limited to, a smart phone, a tablet computer, a notebook computer, a desktop computer, and the like. The terminal 110 may have a client running therein, which is provided with a background service by the server 120. The client may select a three-dimensional face material from the three-dimensional face material library provided by the server 120, acquire a reference face image, send the acquired reference face image and the three-dimensional face material to the server 120, and acquire and display a target face image returned by the server 120. Illustratively, the client may capture a reference face image of a smile of a user, select a three-dimensional face material of a star from a three-dimensional face material library, send the reference face image of the smile of the user and the three-dimensional face material of the star to the server 120, obtain a target face image of the smile of the star returned by the server 120, and display the target face image.

The server 120 may synthesize a target face image according to the reference face image and the three-dimensional face material transmitted by the client 110, and transmit the target face image to the client 110. The server 120 may also be used to train the facial expression extraction model.

The server 120 shown in fig. 1 may be a single physical server, or may be a server cluster or a distributed system formed by a plurality of physical servers, and the terminal 110 and the server 120 may be connected through a wired network or a wireless network.

Fig. 2 is a flowchart illustrating a facial expression extraction model generation method according to an exemplary embodiment, and as shown in fig. 2, the facial expression extraction model generation method applied to the server 120 shown in fig. 1 is described, which includes the following steps.

In step S10, a three-dimensional face material image and a source face image are acquired.

In the disclosure, the three-dimensional face material image can be selected from a preset three-dimensional face material library, the three-dimensional face material library can comprise three-dimensional face materials of characters such as entertainment celebrities, sports celebrities, social celebrities, historical celebrities and the like, and can also comprise three-dimensional face materials of characters in works such as movies and cartoons and the like.

In the embodiment of the present disclosure, the source face image may be from a video file or may be obtained by shooting with a camera device. The number of source face images is not limited in this disclosure, and may be one or more.

In some embodiments, the at least one source facial image may be obtained from a video file. As shown in fig. 3, fig. 3 is a flowchart illustrating a method for acquiring a source face image according to an exemplary embodiment, including:

in step S11, a video file is acquired.

In step S12, the video file is parsed to obtain an image frame sequence.

In step S13, image frames including faces in the image frame sequence are sequentially extracted to obtain at least one source face image.

The embodiment of the disclosure can obtain at least one source face image by analyzing the existing video file, and the abundant source face image acquisition sources are beneficial to optimizing the quality of the training sample and increasing the acquisition speed of the training sample, thereby increasing the training speed.

In step S20, the three-dimensional face material image is projected to obtain a two-dimensional face material image corresponding to the three-dimensional face material image.

Specifically, the two-dimensional face material image corresponding to the three-dimensional face material image is obtained by performing plane projection on the three-dimensional face material image. For the method of planar projection, reference may be made to related technologies, and further description is omitted here.

In step S30, the facial expression in the source face image is migrated to the two-dimensional face material image, and a target material image is obtained.

In the embodiment of the present disclosure, the source face image and the two-dimensional face material image may be input into a first-order motion model, and the transfer of the expression is realized based on the first-order motion model, so as to obtain the target material image.

In the present disclosure, a First Order Motion Model (FOMM) may be used to perform expression migration, for example, a source face image and a two-dimensional face material image are input to the First Order Motion Model, and the First Order Motion Model may migrate the expression of the source face image to the two-dimensional face material image, so as to obtain a target material image with the expression of the source face image. The idea of the first-order motion model is to establish a complex motion model by using a group of self-learning key points and local affine transformation, and the purpose is to solve the problem that the traditional model has poor generation quality under the condition of large target posture dynamic change. In the disclosed embodiment, the first-order motion model may include an expression feature extraction module and an image generation module; the expression feature extraction module can be used for extracting expression features of faces in the input pictures, and the image generation module can realize expression migration by using the input pictures.

Illustratively, a source facial image of a skewed mouth and a star facial material are input into the first-order motion model, a first expression description parameter corresponding to a skewed mouth expression of the source facial image of the skewed mouth can be extracted through an expression feature extraction module in the first-order motion model, a second expression description parameter of the star facial material can also be extracted, then a third expression description parameter of a skewed mouth expression of the star facial material is obtained through an image generation module in the first-order motion model according to the first expression description parameter and the second expression description parameter, a target star facial material is generated according to the third expression description parameter and the star facial material, and the target star facial material also has the skewed mouth expression.

The expression description parameters output by the first-order motion model in the embodiment of the disclosure comprise a key point position parameter and a motion state description parameter. The motion state description parameters can be expressed by a Jacobian matrix, and the Jacobian matrix can be a first-order expansion coefficient matrix of variation in a preset range of key points.

The source face image in the disclosed embodiment can be identified as s_i(i is a positive integer less than or equal to N, N being the total number of source face images), and accordingly, the target material image can be identified as d_j(j is a positive integer less than or equal to N, N being the total number of source face images), if i ═ j, then s_iAnd d_jHave the same expression. In the embodiment of the present disclosure, expression migration is performed through the first-order motion model, and the structure, parameters, and training method of the first-order motion model may refer to related technologies, which are not described herein again.

Illustratively, the source face images are three in total, namely crying, mouth bending and smiling, and correspondingly, the target material images are also three in total and have crying, mouth bending and smiling expressions respectively.

In some embodiments, as shown in fig. 4, fig. 4 is a flowchart illustrating transferring facial expressions in the source face image to the two-dimensional face material images to obtain target material images according to an exemplary embodiment, and includes:

in step S31, a reference face image is determined from at least one of the source face images; the reference face image is the source face image with the smallest difference between the facial expression and the non-expression state.

The embodiment of the present disclosure does not limit the specific selection manner of the reference facial image, and may select, as the reference facial image, a source facial image closest to the blankness from the at least one source facial image. For example, if there are three source face images, each of which has a crying expression, a facial distortion expression, and a smiling expression, and the smiling expression is closest to the blankness expression, the source face image with the smiling expression may be determined as the reference face image. For another example, if there are 4 source face images, which are respectively crying, laughing, frowning and blankness, the blankness source face image is used as the reference face image.

In some embodiments, the reference facial image may be automatically determined based on the first-order motion model, where the first-order motion model obtains expression description parameters of each source facial image in the at least one source facial image, and determines, according to the expression description parameters of each source facial image, the source facial image with the smallest difference from the non-expression state, and determines the source facial image as the reference facial image.

In step S32, inputting each of the source face images into a first-order motion model to perform expression change parameter extraction processing, so as to obtain expression change parameters corresponding to each of the source face images; the expression change parameters represent the variation of the expression description parameters of each source face image relative to the expression description parameters of the reference face image.

For each source face image, the first-order motion model may output expression description parameters of the source face image, and may determine expression change parameters according to the expression description parameters of the source face image and the expression description parameters of the reference face image. The expression description parameters in the embodiment of the present disclosure include a key point position parameter and a motion state description parameter, and correspondingly, the expression change parameters include a key point position change parameter and a motion state change parameter, where the motion state change parameter may also be expressed using a jacobian matrix.

For a source face image s_iThe expression description parameters corresponding to the expression parameters can be obtained, and specifically, the expression description parameters include a key point position parameter kp_iAnd motion state description parameter jacob_i(ii) a The key point position parameter of the reference face image can use kp_refRepresenting that the motion state description parameter of the benchmark face image can use jacob_refAnd (4) showing.

The expression change parameter of the source face image relative to the reference face image can be expressed as Δ kp_iAnd Δ jacob_iIn particular, Δ kp_i＝kp_i-kp_ref；Δjacob_i＝jacob_i*jacob_ref ^-1。

In step S33, the two-dimensional face material images are input into the first-order motion model to perform expression parameter extraction processing, so as to obtain expression description parameters of the two-dimensional face material images.

After the two-dimensional face material image is input into the first-order motion model, the first-order motion model can obtain expression description parameters of the two-dimensional face material image, including a key point position parameter kp^mAnd motion description state description parameter jacob^mAnd m represents a two-dimensional face material.

In step S34, target material expression parameters are obtained according to the expression change parameters and the expression description parameters of the two-dimensional face material image.

In some embodiments, as shown in fig. 5, fig. 5 is a flowchart illustrating that the expression parameters of the target material are obtained according to the expression change parameters and the expression description parameters of the two-dimensional face material image according to an exemplary embodiment, and the flowchart includes:

in step S341, the keypoint location parameter of the target material is obtained according to the keypoint location variation parameter and the keypoint location parameter of the two-dimensional face material image.

In some embodiments, the formula kp may be based on_i ^m＝kp^m+sΔkp_iObtaining the position parameters of the key points of the target material, wherein kp_i ^mRepresenting a keypoint location parameter, kp, of a target material corresponding to the source face image^mIs the key point position parameter, delta kp, of a two-dimensional face material image_iAnd s represents a proportionality constant determined according to the face area of the two-dimensional face material.

In step S342, the motion state description parameters of the target material are obtained according to the motion state variation parameters and the motion state description parameters of the two-dimensional face material image.

In some embodiments, jacob may be based on the formula_i ^m＝jacob^m*Δjacob_iObtaining the motion state description parameters of the target material, wherein jacob_i ^mRepresenting a motion state description parameter, jacob, of a target material image corresponding to the source face image^mDescribing parameters for the motion state of two-dimensional face material, delta jacob_iAnd representing the motion state change parameters of the source face image relative to the reference face image.

In step S343, the expression parameters of the target material are obtained according to the key point position parameters of the target material and the motion state description parameters of the target material.

In some embodiments, it may be according to formula I_i ^m＝M₁(kp_ref,kp_i ^m,jacob_i ^m) And obtaining a target material image. Wherein, I_i ^mRepresenting and source face image s_iCorresponding target material image, M₁Method for generating image representing target material, and the present disclosure is not limited to M₁Can realize M by a trained first-order motion model₁The function of (c). kp_refThe key point position parameter in the expression description parameter representing the reference source face image can be based on first-order motionAnd obtaining a dynamic model.

According to the method and the device, the key point position parameters and the motion state description parameters of the related images output by the first-order motion model are fully utilized, the target material expression parameters for generating the target material images are obtained through calculation, the expression capability of the target material expression parameters for expressing the expressions of the source face images is improved, and the fidelity of the expressions of the target material images and the source face images is finally improved.

In step S35, the target material image is generated based on the target material expression parameters and the target face material image.

In the embodiment of the disclosure, the target material image having the same expression as the source face image can be generated based on the first-order motion model, and the generation quality of the target material image can be ensured.

In step S40, a three-dimensional reconstruction is performed according to the target material image to obtain a target three-dimensional face model.

In the embodiment of the present disclosure, a specific method for performing three-dimensional reconstruction is not limited, and for example, three-dimensional reconstruction may be performed based on a 3D mobile Model (3D human face variability Model). In the field of computer vision, reconstructing the shape and texture of a three-dimensional face from a single face picture is an important research topic. The method based on 3D portable Model (3D human face variability) can successfully reconstruct the three-dimensional human face from a single human face picture.

In step S50, performing solution optimization on the target three-dimensional face model to obtain expression feature information corresponding to the target material image.

In the embodiment of the disclosure, three-dimensional expression characteristic information can be obtained by performing solution optimization on the target three-dimensional face model, and the three-dimensional expression characteristic information corresponds to a target material. Illustratively, if three target material images are provided, namely, the target material images have crying, facial distortion and smiling expressions, three groups of expression feature information can be obtained through corresponding optimization, and the three groups of expression feature information respectively correspond to the three-dimensional expression feature information of crying, the three-dimensional expression feature information of facial distortion and the three-dimensional expression feature information of smiling.

In some embodiments, an energy item of the target three-dimensional face model may be obtained, and the expression feature information corresponding to the target material image is obtained by performing minimum solution optimization on the energy item of the target three-dimensional face model. Specifically, as shown in fig. 6, fig. 6 is a flowchart illustrating solution optimization performed on the target three-dimensional face model to obtain expression feature information corresponding to the target material image according to an exemplary embodiment, where the flowchart includes:

in step S51, an energy item corresponding to each key point in the target three-dimensional face model is obtained, where the energy item uses the expression feature information as an argument.

In particular, the energy term corresponding to each keypoint may be represented as

And R represents a scale variable and a perceptual transformation matrix, respectively, t is a translation, s_kIs the coordinate of a key point, C_rIs a bilinear face model, w^T _idIs a face coefficient relating to an individual, w, to a two-dimensional face material image^T _expThe expression feature information is represented.

In step S52, with the sum of the energy items corresponding to each key point minimized as a target, the energy items corresponding to each key point are resolved to obtain the expression feature information corresponding to the target material image.

The embodiment of the present disclosure does not limit the specific method of solution, for example, the solution optimization may be performed by using methods such as nonlinear optimization and least square optimization, and the expression characteristic information corresponding to the target material image may be obtained through the solution optimization.

In the embodiment of the disclosure, three-dimensional expression characteristic information of a target material image can be obtained by performing solution optimization on a target three-dimensional face model, the three-dimensional expression characteristic information contains richer expression details, a training sample is constructed based on the expression characteristic information, and the precision of a face expression extraction model obtained by training can be improved.

In step S60, a training sample is constructed based on the target material image and the expression feature information corresponding to the target material image.

In step 70, a preset neural network is trained based on the training samples to obtain a facial expression extraction model.

In some embodiments, a preset neural network may be trained, that is, an image is input to the trained preset neural network, and three-dimensional expression feature information of the image may be directly output. In order to obtain the expression extraction model through training, a training sample for training the expression extraction model needs to be constructed. In this embodiment of the present disclosure, the training samples may include the target material images and expression feature information corresponding to the target material images, and a new training sample may be obtained by changing the source face images and/or the three-dimensional face material images, so as to generate a training sample set including a large number of the training samples, and train the preset neural network according to the training sample set, so as to obtain the expression extraction model. The structure of the neural network is not limited in this disclosure.

In other embodiments, such an expression extraction network may also be trained, that is, an image is input, three-dimensional expression prediction information may be obtained based on an existing expression prediction network in the expression extraction network, and then the expression prediction information is corrected based on the preset neural network in the expression extraction network, so as to obtain three-dimensional expression feature information with higher precision. In the embodiment of the present disclosure, the preset neural network is actually used to correct an expression, and in the embodiment of the present disclosure, the preset neural network is referred to as an expression correction network, and the existing expression prediction network and the trained correction network may be determined as the expression extraction model. In order to obtain the expression extraction model through training, a training sample for training the expression extraction model needs to be constructed, in this case, please refer to fig. 7, fig. 7 is a flowchart illustrating an exemplary embodiment of training a preset neural network based on the training sample to obtain a facial expression extraction model, including:

in step S71, the target material images in the training sample are input to an expression prediction network, so as to obtain expression prediction information.

In the embodiment of the disclosure, new training samples can be obtained by changing the source face image and/or the three-dimensional face material image, so as to form a set of training samples, and the expression correction network can be trained according to each training sample in the set.

In step S72, the expression prediction information is input to the expression correction network, and corrected expression information is obtained.

In step S73, a correction loss is obtained according to a difference between the modified expression information and the expression feature information in the training sample.

In step S74, the expression correction network is trained based on the correction loss.

Specifically, parameters of the expression correction network may be adjusted according to the correction loss feedback until a preset training stop condition is reached. In one embodiment, if the modification loss is greater than or equal to a preset training threshold, the parameters of the network to be trained are adjusted based on the modification loss, and if the loss value is less than the training threshold, the training is completed. The embodiment of the present disclosure does not limit the specific training threshold, and may be, for example, 0.1 or 0.5. In another feasible embodiment, the parameter adjustment times of the expression correction network may also be counted, and if the parameter adjustment times is greater than a preset time threshold, it may be determined that the preset training stop condition is reached.

In step S75, the expression prediction network and the trained expression correction network are determined as the facial expression extraction model.

Specifically, the facial expression extraction model can be obtained by sequentially connecting the trained expression prediction network and the expression correction network trained based on steps S71-75 in series. The present disclosure does not limit the structures of the expression prediction network and the expression correction network.

In the embodiment of the disclosure, the output result of the existing expression prediction network can be corrected by training the expression prediction network, so that the accuracy of the finally output three-dimensional expression feature information is higher. The three-dimensional expression characteristic information with higher precision is obtained by multiplexing the existing expression prediction network and correcting the output result, so that the utilization rate of the existing expression prediction model is improved, and the acquisition cost of the high-precision three-dimensional expression characteristic information is reduced.

In the embodiment of the disclosure, the three-dimensional expression characteristic information corresponding to one two-dimensional face image can be obtained in real time by training the expression extraction model, and the complicated three-dimensional modeling and optimization solving processes are not needed, so that the obtaining speed of the expression characteristic information is increased, and the precision of the expression characteristic information is obviously improved.

Based on the expression extraction model obtained by the above training, the present disclosure also shows a facial image generation method, as shown in fig. 8, fig. 8 is a flowchart of the facial image generation method shown according to an exemplary embodiment, and the method includes:

in step S10-1, a reference face image and three-dimensional face material are acquired.

In step S20-1, the reference face image is input into a facial expression extraction model, and the expression feature information of the reference face image is obtained.

In an embodiment, the reference facial image may be directly input into the facial expression extraction model to obtain the output facial expression feature information of the reference facial image.

In another embodiment, the reference facial image may be input to an expression prediction network of the expression extraction model to obtain expression prediction information; and inputting the expression prediction information into an expression correction network of the expression extraction model to obtain expression characteristic information. The expression characteristic information with high accuracy can be obtained by multiplexing the existing expression prediction network, the utilization rate of the existing expression prediction network is improved, and the expression characteristic information with high accuracy is obtained on the premise of avoiding great modification of the existing network.

In step S30-1, three-dimensional reconstruction is performed according to the three-dimensional face material and the expression feature information of the reference face image, so as to obtain a three-dimensional mapping face model.

The expression feature information output by the facial expression extraction Model in the embodiment of the present disclosure is three-dimensional expression feature information, and a three-dimensional mapping face Model can be obtained by performing three-dimensional reconstruction according to the three-dimensional face material and the expression feature information.

In step S40-1, the three-dimensional mapping face model is projected to obtain a target face image.

The expression in the reference face image can be transferred to the generated target face image, and the target face image takes the three-dimensional face material as a generation standard, for example, the reference face image is that the user a cries, the three-dimensional face material is that the user B cries, and the target face image is that the user B cries.

According to the method and the device, the reference face image is input into the face expression extraction model obtained through training, so that high-precision three-dimensional expression characteristic information is obtained, three-dimensional reconstruction is carried out on the basis of the expression characteristic information and three-dimensional face materials, and a two-dimensional target face image is obtained according to a reconstruction result, so that the target face image can highly restore the expression in the reference face image, the purpose that the target face image changes the expression according to the expression change of the reference face image in real time is achieved, and the method and the device have high fidelity.

Fig. 9 is a block diagram illustrating a facial expression extraction model generation apparatus according to an exemplary embodiment. Referring to fig. 9, the apparatus includes:

an image acquisition module 10 configured to perform acquiring a three-dimensional face material image and a source face image;

a two-dimensional face material image obtaining module 20 configured to perform projection on the three-dimensional face material image to obtain a two-dimensional face material image corresponding to the three-dimensional face material image;

a target material image obtaining module 30 configured to perform migration of the facial expression in the source face image to the two-dimensional face material image to obtain a target material image;

a three-dimensional reconstruction module 40 configured to perform three-dimensional reconstruction according to the target material image to obtain a target three-dimensional face model;

a solution optimization module 50 configured to perform solution optimization on the target three-dimensional face model to obtain expression feature information corresponding to the target material image;

a training sample construction module 60 configured to construct a training sample according to the target material image and the expression feature information corresponding to the target material image;

and the training module 70 is configured to execute training of a preset neural network based on the training samples to obtain a facial expression extraction model.

a correction loss calculation unit configured to perform calculation to obtain a correction loss according to a difference between the corrected expression information and the expression feature information in the training sample;

an expression correction network training unit configured to perform training of the expression correction network in accordance with the correction loss;

and the facial expression extraction model determining unit is configured to determine the expression prediction network and the trained expression correction network as the facial expression extraction model.

a reference facial image determination unit configured to perform determination of a reference facial image in at least one of the above-mentioned source facial images; the reference facial image is the source facial image with the smallest difference between facial expressions and non-expression states;

an expression change parameter obtaining unit configured to perform expression change parameter extraction processing by inputting each of the source face images into a first-order motion model, so as to obtain an expression change parameter corresponding to each of the source face images; the expression change parameters represent the variation of the expression description parameters of each source face image relative to the expression description parameters of the reference face image;

and the target material image generating unit is configured to execute generation of the target material image according to the target material expression parameters and the target face material image.

In an exemplary embodiment, the expression change parameters include a key point position change parameter and a motion state change parameter; the expression description parameters comprise key point position parameters and motion state description parameters, and the target material expression parameter determining unit is configured to execute the key point position parameters according to the key point position change parameters and the key point position parameters of the two-dimensional face material image to obtain the key point position parameters of the target material; obtaining a motion state description parameter of the target material according to the motion state change parameter and the motion state description parameter of the two-dimensional face material image; and obtaining the expression parameters of the target material according to the key point position parameters of the target material and the motion state description parameters of the target material.

In an exemplary embodiment, the solution optimization module is configured to perform obtaining an energy item corresponding to each key point in the target three-dimensional face model, where the energy item takes expression feature information as an argument; and solving the energy items corresponding to the key points by taking the minimized sum value of the energy items corresponding to the key points as a target to obtain the expression characteristic information corresponding to the target material image.

Fig. 10 is a block diagram illustrating a face image generation apparatus according to an exemplary embodiment. Referring to fig. 10, the apparatus includes:

an image material acquisition module 10-1 configured to perform acquisition of a reference face image and a three-dimensional face material;

the expression characteristic information acquisition module 20-1 is configured to input the reference face image into a facial expression extraction model to obtain expression characteristic information of the reference face image;

a three-dimensional mapping face model construction module 30-1 configured to perform three-dimensional reconstruction according to the three-dimensional face material and the expression feature information of the reference face image to obtain a three-dimensional mapping face model

A target face image output module 40-1 configured to perform projection of the three-dimensional mapping face model to obtain a target face image;

the facial expression extraction model generation method described in the embodiment of the facial expression extraction model method is obtained.

In an exemplary embodiment, the expression feature information obtaining module is configured to input the reference facial image into an expression prediction network of the facial expression extraction model to obtain expression prediction information of the reference facial image; and inputting the expression prediction information of the reference facial image into an expression correction network of the facial expression extraction model to obtain corrected expression information of the reference facial image, and determining the corrected expression information of the reference facial image as the expression characteristic information of the reference facial image.

With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.

In an exemplary embodiment, there is also provided an electronic device, comprising a processor; a memory for storing processor-executable instructions; the processor is configured to implement the steps of any one of the facial expression extraction model generation methods or the facial image generation method in the above embodiments when executing the instructions stored in the memory.

The electronic device may be a terminal, a server, or a similar computing device, taking the electronic device as a server as an example, fig. 11 is a block diagram of an electronic device of a facial expression extraction model generation method or a facial image generation method according to an exemplary embodiment, where the electronic device 1000 may generate a relatively large difference due to different configurations or performances, and may include one or more Central Processing Units (CPUs) 1010 (the processors 1010 may include but are not limited to Processing devices such as a microprocessor MCU or a programmable logic device FPGA), a memory 1030 for storing data, and one or more storage media 1020 (e.g., one or more mass storage devices) for storing an application 1023 or data 1022. Memory 1030 and storage media 1020 may be, among other things, transient or persistent storage. The program stored in the storage medium 1020 may include one or more modules, each of which may include a sequence of instructions operating on an electronic device. Still further, the central processor 1010 may be configured to communicate with the storage medium 1020 to execute a series of instruction operations in the storage medium 1020 on the electronic device 1000. The electronic device 1000 may also include one or more power supplies 1060, one or more wired or wireless network interfaces 1050, one or more input-output interfaces 1040, and/or one or more operating systems 1021, such as Windows Server, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM, and so forth.

Input-output interface 1040 may be used to receive or transmit data via a network. Specific examples of the network described above may include a wireless network provided by a communication provider of the electronic device 1000. In one example, i/o Interface 1040 includes a Network adapter (NIC) that may be coupled to other Network devices via a base station to communicate with the internet. In an exemplary embodiment, the input/output interface 100 may be a Radio Frequency (RF) module, which is used for communicating with the internet in a wireless manner.

It will be understood by those skilled in the art that the structure shown in fig. 11 is only an illustration and is not intended to limit the structure of the electronic device. For example, the electronic device 1000 may also include more or fewer components than shown in FIG. 11, or have a different configuration than shown in FIG. 11.

In an exemplary embodiment, a computer-readable storage medium is further provided, and when executed by a processor of an electronic device, the instructions in the computer-readable storage medium enable the electronic device to execute the facial expression extraction model generation method or the facial image generation method provided in any one of the above embodiments.

In an exemplary embodiment, a computer program product is also provided that includes computer instructions stored in a computer readable storage medium. The processor of the electronic device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions, so that the electronic device executes the facial expression extraction model generation method or the facial image generation method provided in any one of the above embodiments.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory, among others. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims

1. A facial expression extraction model generation method is characterized by comprising the following steps:

acquiring a three-dimensional face material image and a source face image;

2. The method for generating a facial expression extraction model according to claim 1, wherein the preset neural network is an expression correction network, and the training of the preset neural network based on the training samples to obtain the facial expression extraction model comprises:

training the expression correction network according to the correction loss;

3. The method for generating a facial expression extraction model according to claim 1 or 2, wherein the step of migrating the facial expression in the source face image to the two-dimensional face material image to obtain a target material image comprises:

4. The method of generating a facial expression extraction model according to claim 3, wherein the expression change parameters include a key point position change parameter and a motion state change parameter; the expression description parameters comprise key point position parameters and motion state description parameters, and the expression parameters of the target material are obtained according to the expression change parameters and the expression description parameters of the two-dimensional face material image, and the method comprises the following steps:

5. A face image generation method, characterized in that the method comprises:

acquiring a reference face image and a three-dimensional face material;

the facial expression extraction model is obtained according to the method for generating the facial expression extraction model according to any one of claims 1 to 4.

6. A facial expression extraction model generation apparatus, comprising:

7. An apparatus for generating a face image, the apparatus comprising:

8. An electronic device, comprising:

a processor;

a memory for storing the processor-executable instructions;

wherein the processor is configured to execute the instructions to implement the facial expression extraction model generation method of any one of claims 1 to 4 or the facial image generation method of claim 5.

9. A computer-readable storage medium in which instructions, when executed by a processor of an electronic device, enable the electronic device to perform the facial expression extraction model generation method of any one of claims 1 to 4 or the facial image generation method of claim 5.

10. A computer program product comprising a computer program, wherein the computer program, when executed by a processor, implements the facial expression extraction model generation method of any one of claims 1 to 4 or the facial image generation method of claim 5.