CN116012550A

CN116012550A - Face deformation target correction method and device, equipment, medium and product thereof

Info

Publication number: CN116012550A
Application number: CN202310089427.2A
Authority: CN
Inventors: 高杰
Original assignee: Bigo Technology Pte Ltd
Current assignee: Bigo Technology Pte Ltd
Priority date: 2023-01-30
Filing date: 2023-01-30
Publication date: 2023-04-25

Abstract

The application relates to a face deformation target correction method and a device, equipment, medium and product thereof, wherein the method comprises the following steps: acquiring a base face coefficient set, wherein the base face coefficient set comprises neutral face coefficients and average face coefficients of a plurality of face bases acquired in advance, and the face coefficients comprise face shape coefficients; according to the face key point information corresponding to each preset face area in the face grid of the three-dimensional face model, correspondingly dividing the base face coefficient set into area face coefficients of each face area; optimizing the vertexes of the corresponding areas in the face grids according to the area face coefficients corresponding to each face area to determine the face shape coefficients of the face grids; and correcting the face deformation target of the three-dimensional face model according to the optimized face shape coefficient. Under the condition of maintaining lower operation quantity, the face deformation target optimization method and device enable each specific face area of the face deformation target to obtain a refined optimization effect.

Description

Face deformation target correction method and device, equipment, medium and product thereof

Technical Field

The application relates to the technical field of digital people, in particular to a face deformation target correction method and a device, equipment, medium and product thereof.

Background

The facial expression tracking technology has wide application in the digital human field, is a problem widely studied in academia and industry, and is mainly divided into two major categories, namely a traditional method and a deep learning method.

The traditional method utilizes a self-built 3D face substrate, defines energy items through strict mathematical derivation, and optimizes identity coefficients and expression coefficients by minimizing the energy items, so that the shape and the expression of the face are fitted. Wherein, the identity coefficient alpha is used for fitting the shape of the face, and the expression coefficient beta is used for fitting the expression of the face. The deep learning method uses the acquired training images and the corresponding expression labels to train the existing data end to end, and the model learns the identity coefficient and the expression coefficient to fit the face and the expression.

For existing traditional methods, neutral faces are typically extracted from the self-built 3D face base to optimize the identity coefficients. Then, the whole face is optimized, face details are easily ignored, and the optimization function is difficult to converge.

For the deep learning method, on one hand, a reasonably designed network model is required, and meanwhile, a large amount of resources are required to train the model to be converged. On the other hand, it is difficult to finely optimize the specific coefficient of each expression as an end-to-end black box by the deep learning model, so that abnormal expression may occur.

Therefore, when the face images of the digital people are processed in various prior arts, the face images are optimized roughly, and high-quality face images are difficult to obtain.

Disclosure of Invention

The present application aims to solve the above-mentioned problems and provide a face deformation target correction method, a corresponding device, equipment, a non-volatile readable storage medium and a computer program product thereof.

According to one aspect of the present application, there is provided a face deformation target correction method, including the steps of:

acquiring a base face coefficient set, wherein the base face coefficient set comprises neutral face coefficients and average face coefficients of a plurality of face bases acquired in advance, and the face coefficients comprise face shape coefficients;

according to the face key point information corresponding to each preset face area in the face grid of the three-dimensional face model, correspondingly dividing the base face coefficient set into area face coefficients of each face area;

Optimizing the vertexes of the corresponding areas in the face grids according to the area face coefficients corresponding to each face area to determine the face shape coefficients of the face grids;

and correcting the face deformation target of the three-dimensional face model according to the optimized face shape coefficient.

According to another aspect of the present application, there is provided a face deformation target correction device, including:

the base coefficient acquisition module is used for acquiring a base face coefficient set, wherein the base face coefficient set comprises neutral face coefficients and average face coefficients of a plurality of face bases acquired in advance, and the face coefficients comprise face shape coefficients;

the face region segmentation module is used for correspondingly segmenting the base face coefficient set into regional face coefficients of each face region according to the face key point information corresponding to each preset face region in the face grid of the three-dimensional face model;

the vertex partition optimization module is used for optimizing the vertices of the corresponding areas in the face grid according to the area face coefficients corresponding to each face area so as to determine the face shape coefficients of the face grid;

and the deformation target correction module is used for correcting the face deformation target of the three-dimensional face model according to the optimized face shape coefficient.

According to another aspect of the present application, there is provided a face deformation target correction apparatus comprising a central processor and a memory, the central processor being arranged to invoke the steps of running a computer program stored in the memory to perform the face deformation target correction method described herein.

According to another aspect of the present application, there is provided a non-transitory readable storage medium storing a computer program implemented in accordance with the face deformation target correction method in the form of computer readable instructions, the computer program executing the steps included in the method when being invoked by a computer to run.

According to another aspect of the present application, there is provided a computer program product comprising computer programs/instructions which when executed by a processor implement the steps of the method as described in any of the embodiments of the present application.

Compared with the prior art, the face deformation target of the three-dimensional face model is corrected according to the face shape coefficients, under the condition of maintaining lower operation quantity, each specific face region of the face deformation target can obtain a refined optimization effect, so that a face image generated according to the face deformation target is more natural and fine, the imaging effect is good and stable, and the figure generated by the face deformation target can obtain excellent image.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a schematic diagram of a network architecture of an exemplary network live scenario of the present application;

FIG. 2 is a flowchart illustrating an embodiment of a face deformation target correction method according to the present application;

FIG. 3 is a schematic diagram of a partition relationship of face regions obtained by partitioning a face grid according to the present application;

fig. 4 is a flow chart of acquiring a regional face coefficient of each face region in the embodiment of the present application;

FIG. 5 is a flow chart of performing different optimization operations on different face regions in an embodiment of the present application;

FIG. 6 is a flowchart of a face deformation target correction process for distinguishing whether a key frame is performed according to an embodiment of the present application;

fig. 7 is a schematic flow chart of determining a keyframe based on a projection distance of a face deformation target in an embodiment of the present application;

Fig. 8 is a schematic flow chart of implementing digital live broadcasting by applying a face deformation target in an embodiment of the present application;

FIG. 9 is a schematic block diagram of a face deformation target correction device of the present application;

fig. 10 is a schematic structural diagram of a face deformation target correction device used in the present application.

Detailed Description

Referring to fig. 1, a network architecture adopted in an exemplary network live broadcast application scenario of the present application may be used to deploy a live broadcast service based on digital persons, and implement multiple purposes such as entertainment, e-commerce sales, explanation, etc. by performing network live broadcast by digital persons. Similarly, the network architecture of the present application may also be used to deploy services based on digital person social communications, gaming, entertainment, and the like.

The application server 81 shown in fig. 1 may be used to support the operation of the live service, while the media server 82 may be used to store or forward video streams of users, wherein a terminal device such as a computer 83, a mobile phone 84, etc. is typically provided as a client to the end user for uploading or downloading of playing video streams.

The method or apparatus of the present application may be implemented by programming as a computer program product, and run in the application server 81, the media server 82, and the

terminal devices

83 and 84, where a face deformation target of a three-dimensional face model of a digital person generated according to a given face image is optimized, so as to apply the face deformation target to the three-dimensional face model of the digital person and render a face image of the corresponding digital person.

In another exemplary application scenario, the technical solution of the present application may be implemented in a terminal device independent of a public network, and by running the computer program product, a corresponding digital face image is generated for a face image of a real person input by a user, so as to create a corresponding cartoon image.

The above application scenarios are all exemplary, and in fact, the technical scheme implemented by the application is a basic technology, and can be applied only if the requirements are matched, so that the application method is generally used in application scenarios with matched requirements.

The face deformation target can be any face deformation target generated in any mode, namely Blendhape, and is a set of parameter feature vectors used for controlling each key point in a face grid of a three-dimensional face model of a digital person to generate an action state. After a face deformation target is applied to a three-dimensional face model of a digital person, the three-dimensional face model is switched to an expression gesture described by the face deformation target, and then the three-dimensional face model is rendered and projected into a two-dimensional space, so that a face image of the digital person corresponding to the face deformation target can be generated. Different three-dimensional face models may have different parametric feature vector representations. For example, in one embodiment, the parametric feature vector of the vertex may be represented as v= (R) _i ，t _i ，s _i ，δ _i ) Respectively represent the rotation coefficient R _i Translation coefficient t _i Scaling factor s _i Expression factor delta _i . In other embodiments, the parameter feature vector may also be represented as coordinates corresponding to three axes of each vertex in the three-dimensional space and corresponding expression coefficients thereof, which are determined according to a control manner of the three-dimensional face model.

The face deformation target can be obtained by adopting a 3DMM (3D Morphable models,3D deformable face model) based on statistics or can be automatically generated based on a deep learning model.

Referring to fig. 2, in one embodiment, a method for correcting a face deformation target provided in the present application includes the following steps:

step 1200, acquiring a base face coefficient set, wherein the base face coefficient set comprises neutral face coefficients and average face coefficients of a plurality of face bases acquired in advance, and the face coefficients comprise face shape coefficients;

in an embodiment in which the face deformation target is generated by using a 3DMM as the face generation model, the main process may be divided into the following steps: a) Loading a self-built 3D face substrate; b) Detecting face key points (2D landmarks) in the video frames; c) According to the key points of the human face and the self-built 3D human face substrate, establishing an energy item related to the expression coefficient so as to minimize the energy item and optimize the expression coefficient delta; d) According to the key points of the human face, a 3D human face substrate and an expression coefficient are built, and an energy item related to the identity coefficient is built so as to minimize the energy item and optimize the identity coefficient alpha; e) And fitting a face image according to the optimized identity coefficient alpha and the optimized expression coefficient delta.

In the embodiment of generating the face deformation target by taking the deep learning model as the face generation model, firstly, an image-label pair is constructed, and a learning degree learning model is constructed to learn the expression coefficient, so that the facial expression is fitted to generate a face image. The main flow can be divided into the following steps: a) Constructing an image-label pair, wherein the label is a real expression coefficient; b) Constructing a network structure of a deep learning model, and outputting a predicted expression coefficient through random initialization parameters; c) Calculating a loss function and back-propagating optimized network parameters; d) Iterative optimization of network parameters, so that the training model can learn the expression coefficients; e) In the reasoning stage, the expression coefficient of the current image frame can be obtained only by sending the image frame into a deep learning model, so that the facial expression is fitted to generate a facial image.

The face deformation targets generated by the models are generated based on the face grids in the three-dimensional face model of the digital person, so that the face deformation targets with better quality can be obtained by optimizing the face grids. Particularly, when the image frame for generating the face deformation target is a key frame in a video stream, the generated face deformation target can be ensured to be more accurate by reconstructing the face grid due to the fact that the facial expression action in the image frame generates larger change.

As a basis for constructing a face mesh, a plurality of pre-acquired 3D face substrates are usually prepared, the face substrates are pre-acquired corresponding to a plurality of three-dimensional face models, and parameter description acquisition of the face mesh corresponding to the non-expression states of the three-dimensional face models is used as a neutral face coefficient, wherein the face coefficient corresponds to a corresponding expression state through control information describing each vertex, and the control information comprises a face shape coefficient describing the corresponding face shape and a face expression coefficient describing the corresponding face expression. A set of base face coefficients may be constructed based on these neutral face coefficients to reconstruct a face mesh required to generate the face morphing target.

In one embodiment, to ensure stability of the reconstructed face mesh, after obtaining each neutral face coefficient, an average face coefficient may be further obtained by averaging all neutral face coefficients, and the average face coefficient and all neutral face coefficients are combined together to construct the base face coefficient set. The average face coefficient exists in the base face coefficient set, so that the solving space of all neutral face coefficients is unified, and the face grid reconstructed subsequently has higher stability.

In order to maintain the consistency with the face deformation targets generated by the model, when generating the neutral face coefficients, the average face coefficient and the neutral face coefficient can be generated according to the expression coefficient delta of the face deformation targets, in one embodiment, the following formula can be applied to obtain the average face coefficient m _id Feature matrix B formed by neutral face coefficients _id ：

m _id ＝b ₀ ＝U ₀ ×δ′

B _id ＝{b ₁ ，b ₂ ，...，b _n }，b _i ＝U _i ×δ′

Wherein the face base Model E R ^a X (b+1) x (c+1), a is the number of vertices of each face mesh, b is the number of face identity types in the self-built face substrate, and c is the number of predefined expression types; u (U) _i ∈R ^a×(c+1) Represents the ith face, U ₀ Representing an average face; delta' = (delta) ₀ ，δ)∈R ^(c+1)×1 The expression coefficient representing the face deformation target obtained by the model plays a role of weight constraint,

neutral expression representing average face;

similarly, in an embodiment, according to actual needs, an average facial expression m can be constructed from a self-built facial substrate according to a facial shape coefficient alpha of a current facial deformation target _exp Expression matrix B _exp ：

m _exp ＝b ₀ ＝E ₀ ×α′

B _exp ＝{b′ ₁ ，b′ ₂ ，...，b′ _n }，b′ _i ＝E _i ×α′

It can be seen that the face shape coefficient and the face expression coefficient are determined from the face deformation target obtained by the model according to the need, the neutral face coefficient and the average face coefficient required for influencing the reconstruction of the face mesh can be correspondingly obtained, and the face mesh required by the model for generating the face deformation target can be optimized according to the base face coefficient set formed by the face coefficients.

Step S1300, correspondingly dividing the base face coefficient set into regional face coefficients of each face region according to the face key point information corresponding to each preset face region in the face grid of the three-dimensional face model;

as described above, each of the face generation models is based on the face mesh of the three-dimensional face model of one digital person to generate the face deformation target corresponding to the face mesh, and the face key point information in the face image input to the face generation model needs to be mapped into the face mesh so that the face mesh can be synchronized with the shape and expression of the face in the face image, so that when the face mesh is reconstructed, the face key points in the face image input and the vertices in the face mesh need to be aligned so as to realize the migration of the face expression from the face image of the real person to the face image of the digital person according to the correspondence. The face key point information of the face image serving as the face generation model can be obtained by extracting by means of any feasible face key point detection model.

In the application, when the corresponding relation between the face key points of the input face image and the vertexes of the face grids of the digital person is established, the face is divided into a plurality of face areas, and the mapping relation data between the face key points and the vertexes are established according to different face areas, so that the partition mapping processing is realized.

In one embodiment, as shown in fig. 3, the face area includes five face areas, that is, a left eye area and a right eye area, a brow area, a nose area, a mouth area, and a face contour area, and each face area has a symmetrical structure. In other embodiments, only the eye region and the brow region may be combined into the same face region corresponding to the top half, and the other regions may be combined into the same face region. Alternatively, the different face regions may be divided in other manners, so that corresponding optimization processing is performed on vertices of the face mesh corresponding to the different face regions.

The base face coefficient set is basic data for reconstructing the face grid of the digital person, so that the face regions are also required to be corresponding according to the corresponding relation between the face key points and the vertexes, and the base face coefficient set is divided into regional face coefficients corresponding to the face regions, so that each regional face coefficient only comprises neutral face coefficients and average face coefficients corresponding to the corresponding face regions.

In one embodiment, the regional face coefficients corresponding to each face region may be determined as follows:

Firstly, aligning a face region to which a face key point belongs with a face region to which a face grid of the digital person belongs to obtain a vertex set of each face region:

V＝{L _brow ，L _eye ，L _nose ，L _mouth ，L _outerface }

wherein L is an index of the vertex set of each face region corresponding to the face mesh, and the subscript brow, eye, nose, mouth, outer _face corresponds to the brow region, the eye region, the nose region, the mouth region, and the face contour region, respectively, by way of example.

Then, extracting the base of each region from the base face coefficient set according to the key point set V, and corresponding to the average face coefficient m _id And neutral face coefficient B _id The formula for performing region segmentation is expressed as follows:

m′ _id ＝{m _brow ，m _eye ，m _nose ，m _mouth ，m _{outer_face} }

wherein:

and B is _id ＝{B _brow ，B _eye ，B _nose ，B _mouth ，B _outerface }

Wherein, the liquid crystal display device comprises a liquid crystal display device,

step S1400, optimizing the vertexes of the corresponding areas in the face grids according to the area face coefficients corresponding to each face area to determine the face shape coefficients of the face grids;

determining the face coefficient of the corresponding face region, namely m' _id And B' _id Then, actually, a base face coefficient subset corresponding to each face region is obtained from the base face coefficient set, and m 'can be calculated' _id Is combined with B' _id Then, according to the face coefficients of each region, the vertices of the corresponding face regions in the face grid required by the face generation model are optimized, and the main task is to optimize the face shape coefficients alpha' = { alpha corresponding to each region _brow ，α _eye ，α _nose ，α _mouse ，α _{outer_face} The method can be specifically optimized by adopting the following formula:

wherein i, j respectively represent the ith frame face coefficient and the jth key point in the base face coefficient subset;

representing face shape coefficients fitting the current face mesh, +.>

Representing facial expression coefficient fitting the current face mesh,/->

For the face key points corresponding to the optimized vertexes, P is a projection matrix from three-dimensional space to two-dimensional space, which is defined as:

wherein R is a rotation matrix, T is a translation vector, f is a focal length, and c _x ，c _y Z is the center position of the projection plane _cam The vertical distance from the coordinate point to the projection plane of the camera is represented, x, y, z are 3D coordinates, and u, v are coordinates of the 2D pixel plane.

As can be seen from the above formula, the above formula can be modified and adopted alone

To optimize the face shape factor of the face mesh, or to use the formula to apply +.>

And simultaneously optimizing the facial form factor and the facial expression factor of the facial grid. The optimization of the vertices in the face mesh can be achieved by minimizing the errors of the face shape coefficients of the face mesh according to the corresponding base face coefficient subsets of the face areas, and the reconstruction of the face mesh can be achieved by performing the above-described optimization on the vertices in the face areas in the face mesh.

To balance the complexity and performance of the optimization, two regularization terms can be introduced based on the above optimization formula, as follows:

wherein lambda is ₁ ||α ^k -α ^k-1 I is a regular term corresponding to the face deformation target, lambda ₂ ||α ^k And I is a regular term corresponding to the face shape coefficient.

In some embodiments, the individual face regions are optimized primarily with respect to nonlinear least squares (Non-Linear least squares) and/or Linear least squares (LSQLlN).

According to the above process, when the vertex optimization is performed on the face grid, the vertex optimization is performed based on the corresponding basic face coefficient subset of each face region, the optimization between different face regions is mutually decoupled, and the optimization process of each face region is more targeted, so that a finer and more precise optimization effect can be obtained, the optimized face grid can obtain an excellent reconstruction effect, the face shape is more precise, and the face expression is more natural. The optimized face grid also determines the corresponding face shape parameters.

And S1500, correcting the face deformation target of the three-dimensional face model according to the optimized face shape coefficient.

After the reconstruction of the face grid is completed through the above process, the face shape parameters of the face grid can be cured, further, the face deformation target generated by the face generation model corresponding to the three-dimensional face model of the digital person can be corrected according to the face shape coefficients in a traditional mode, the face deformation target is optimized based on the reconstructed face grid, the reconstructed face grid is attached to the face deformation target, the finally obtained face deformation target is more accurate when the face shape and the face expression are expressed, and when the face deformation target is applied to the three-dimensional face model of the digital person and the corresponding face image is obtained by rendering and projecting the three-dimensional face model of the digital person, the face expression in the face image is more accurate, natural and fine.

When the face generation model generates corresponding face deformation targets for each image frame in the video stream one by one, after the face deformation targets are generated for one corresponding initial image frame, after the reconstruction of the face grid of the digital person is completed according to the above process, the face grid obtained by reconstruction can serve the subsequent image frames with the same face content, and the corresponding face deformation targets are generated for the subsequent image frames. Of course, if the face image therein generates a larger change in the subsequent image frame, a new face mesh can still be reconstructed according to the above procedure.

It can be seen that in the above process of performing optimization on the face mesh, no complicated deep learning model is needed, and the operation amount is relatively low, so that the method is friendly to calculation.

According to the above embodiment, according to the preset multiple face regions, the base face coefficient set formed by the neutral face coefficients and the average face coefficients of the pre-collected multiple face bases is divided into the region face coefficients corresponding to each face region, then, according to the corresponding region face coefficients, the vertices of the corresponding regions in the face grid of the three-dimensional face model are optimized according to each face region, so that the face grid is reconstructed, the face shape coefficients, namely the identity coefficients, are correspondingly determined, then, the face deformation target of the three-dimensional face model is corrected according to the face shape coefficients, under the condition that the operation amount is kept low, the detailed optimization effect can be obtained for each specific face region of the face deformation target, the face image generated according to the face deformation target is more natural and fine, the imaging effect is good and stable, and the excellent texture can be obtained by using the digital face image generated by the face deformation target.

On the basis of any embodiment of the present application, referring to fig. 4, according to face key point information corresponding to each preset face region in a face grid of a three-dimensional face model, the method for correspondingly dividing the base face coefficient set into regional face coefficients of each face region includes:

step S1310, performing face key point detection on a face image for generating the face deformation target to obtain face key point information of a face area image in the face image;

and carrying out face detection on the picture or the image frame which needs to correspondingly generate the face deformation target by adopting a face detection model pre-trained to a convergence state, and obtaining face rectangular frame information in the picture or the image frame. The face rectangle can calibrate the position and size of the face part in the face image, and the calibration result can be represented by a set with four coordinate elements, such as S _roi ：

S _roi ＝{x ₁ ，y ₁ ，x ₂ ，y ₂ }

Wherein x is ₁ And y ₁ Representing the upper left corner of the detected face portionPixel coordinates, x ₂ And y ₂ Representing the lower right corner pixel coordinates of the face portion.

And selecting a corresponding region image from the picture or the image frame according to the set, namely obtaining a face image. The face image completely contains image contents corresponding to the face part, redundant parts of other non-face areas in the face image are removed, and the face generating model can be input for correspondingly generating the face deformation target.

And detecting the obtained face image by adopting a face key point detection model which is pre-trained to a convergence state, and obtaining face key point information. The face keypoints can characterize key region locations of the face, such as brows, eyes, nose, mouth, facial contours, and the like. All the results of the face key points can be expressed as a set L of points ⁿ . Where n represents the number of key points of the face, and the number of the key points can be set by a related technician according to actual requirements, and can be 5, 30, 68, 106, 240, etc., and in one embodiment of the present application, the number of the key points is not limited.

L ⁿ ＝{(x ₁ ，y ₁ )，(x ₂ ，y ₂ )，…，(x _n ，y _n )}

The face detection model and the face key point detection model are preferably realized by a neural network model, and in practical application, the face detection model and the face key point detection model which are excellent in the related technology can be used.

Step S1320, aligning each face key point in the face key point information with a standard face vertex in a face grid of the three-dimensional face model, and determining a mapping relation between the face key point and the vertex in each face area;

the face outline in the face image has different angles and sizes due to the diversity of actual scenes, and is easy to interfere the subsequent three-dimensional face parameter calibration work. Standard alignment of the face image is therefore required.

The face grid of the three-dimensional face model of the digital person describes the face structure by using vertexes, and in order to enable the face key point information of the face image to be in a corresponding relation with the vertex information of the face grid, the alignment operation between the face key point information and the vertex information of the face image needs to be implemented. The vertex information of the face mesh can be obtained by projecting the face mesh of the three-dimensional face model of the digital person to a two-dimensional space, so that each standard face vertex is obtained, and the vertex information is formed.

In this embodiment, according to a preset mapping relationship between face key points and vertices in a plurality of face areas, point-to-point mapping relationship data between the face key points and vertices in each face area is established. In combination with the example of fig. 3, the face grid is divided into five parts according to the semantic of each part of the face according to the symmetrical relation, namely, the parts corresponding to the eyebrow, eyes, nose, mouth and face outline, the vertices in the face grid of an exemplary digital person with preset total number of vertices are counted according to the face areas, and the obtained point positions of each face area are shown in the following table:

the face key point information in the face image is generally represented by 106 points or 204 points, and only the basis corresponding to the face key points of each face area is needed to participate in the optimization process in the face grid reconstruction process, so that the indexes of the face key points are required to be aligned with the indexes of the vertexes of each area of the face grid, and the mapping relation data between the starting points is built.

Step 1330, according to the mapping relation, region face coefficients corresponding to each preset face region in the face grid are centrally segmented from the base face coefficients, and each region face coefficient comprises a neutral face coefficient and an average face coefficient of the corresponding region.

After the mapping relation between the face key points in the face image and the vertexes in the face grids under each face area is determined, vertex information corresponding to each face area can be determined according to the face key point information of each face area provided by the face image, then data segmentation is carried out on the base face coefficient set from the base face coefficient set according to vertex sets corresponding to each face area, base face coefficient subsets corresponding to each face area are correspondingly obtained, each base face coefficient subset is the area face coefficient of the corresponding face area, and similarly, each area face coefficient also comprises the neutral face coefficient and the average face coefficient of the corresponding area. Specific calculation processes may be described in the foregoing embodiments, which are not repeated.

According to the above embodiment, it can be seen that, by performing face key point detection on a face image in a picture or an image frame of a face detection model to obtain corresponding face key point information, and then establishing a mapping relationship between the face key point information in the face image, vertex information in a face grid of a digital person and preset face regions through alignment operation, a base face coefficient set can be more accurately subjected to data segmentation, so as to obtain face coefficients of each region, the method can be used for implementing regional optimization operation of the face grid, and the regional face coefficients are determined through partitioning, so that data required for reconstructing the face grid in each face region is more accurate, stray interference can be eliminated, interference caused by position offset and scale offset, subsequent interference of redundant information of a non-face region part and the like can be eliminated, and the optimization effect of the face grid is more accurate.

On the basis of any embodiment of the present application, referring to fig. 5, optimizing vertices of corresponding regions in the face mesh according to region face coefficients corresponding to each face region to determine face shape coefficients of the face mesh includes:

step S1410, based on a nonlinear optimization algorithm, performing a first optimization operation on vertices corresponding to other face regions outside the top half of the face;

when the expression is expressed in the top half part and the rest part of the face, the human body structure is natural, and the change characteristics in the shape are different, so that the respective optimization operations are carried out on the face areas belonging to the top half part of the face and the other face areas except the top half part of the face. In this embodiment, the brow region and the eye region in the previous example constitute a top half region of the face, and the nose region, the mouth region, and the face contour region constitute other face regions.

For the other face areas, considering that the motion change of the lower half part of the face expressing emotion is large, a nonlinear optimization algorithm can be adopted to implement a first optimization operation on the face areas so as to improve the optimization efficiency.

Please review the previous example, define the transformation of the projection matrix P from three-dimensional space to two-dimensional space as:

In order to realize nonlinear optimization, the re-projection error of the 2D pixel plane coordinates is established as follows:

E＝(u-u ₀ ) ² +(v-v ₀ ) ²

wherein, (u) ₀ ，v ₀ ) Is the known projection coordinates (face key point coordinates). At this time, the 2D coordinates (u, v) are nonlinear functions of the 3D coordinates (x, y, z), i.e., the non-linear least squares optimization of the re-projection error can be performed using the gaussian-newton method.

In the nose region, mouth region and face contour region alpha _{outer_face} Corresponding face shape factor alpha _nose 、α _mouth 、α _outerface When the optimization is implemented, the first optimization operation is implemented by applying the above reprojection error formula.

Step S1420, based on the linear optimization algorithm, implementing a second optimization operation on the vertices corresponding to each face region belonging to the top half of the face;

the top half area of the human face, namely the brow area and the eye area in the previous example, has linear characteristics when expressing emotion, so that a linear optimization algorithm can be further introduced for the corresponding human face shape coefficient alpha _brow And alpha _eye A second optimization operation is performed. To use a linear optimization algorithm, known projection coordinates (u ₀ ，v ₀ ) Introducing a new reprojection error into perspective projection transformation:

substituting the obtained image into a camera projection matrix to obtain:

for each vertex, let:

the reprojection error can be expressed as the following linear face model:

substituting the 3D coordinates (x, y, z) into the above linear face model, the reprojection error of each vertex can be calculated

Finally, the reprojection errors of all the vertexes are fused, for example:

at this time, in the re-projection error function, all parameters except the variable α are constant. Therefore, quadratic programming can be adopted to carry out linear least square optimization on the reprojection error.

According to the above embodiment, a is optimized using a linear least squares method _brow And alpha _eye Optimizing the residual region alpha using a non-linear method _nose 、α _mouse And alpha _{outer_face} That is, in the process of optimizing the face shape coefficient of the face area, according to the characteristics of the expression actions of different face areas, different optimization algorithms are adopted to perform differential optimization on the face shape coefficient, so that the effects of balancing the fitting effect and improving the computing effect are achieved, the operation amount is saved, the optimization effect of the face grid is also improved, and the face deformation target generated according to the optimized face grid is ensured to have natural and fine face expression expressive force.

On the basis of any embodiment of the present application, referring to fig. 6, before obtaining the base face coefficient set, the method includes:

and step S1100, judging whether the face image used for generating the face deformation target belongs to a key frame in a video stream where the face image belongs according to the face deformation target of the three-dimensional face model, and executing the subsequent step when the face image belongs to the key frame, otherwise, ignoring the subsequent step.

In the digital live network scene of the embodiment of the present application, the facial image of the live user needs to be subjected to expression tracking, so that the live expression of the live user is migrated to the facial expression of the digital person in a high-reduction degree as real time as possible, for this purpose, each image frame in the video stream of the live user is subjected to facial detection to determine the facial image therein, and then the facial deformation targets corresponding to the facial image are generated through a facial generation model, wherein the facial deformation targets are generated based on the facial grids of the digital person, and generally, a plurality of continuous facial images can be generated based on the same facial grid to generate corresponding facial deformation targets. However, when the facial distortion target changes relative to the previous facial expression with a large amplitude, the facial distortion target generated based on the original facial grid may deviate, and if the deviation is accumulated for a long time, the generated facial expression may be inaccurate. Thus, correction by technical means is required.

In one embodiment, whether the current image frame belongs to a key frame in the video stream or not, that is, an image frame corresponding to an initial time when a face generates a large expression to switch, can be judged by comparing whether the inter-frame image difference information between the current image frame and a previous image frame in the video stream exceeds a preset range, and when the current image frame belongs to such a key frame, the reconstruction of the face grid of the digital person can be executed according to the process of each embodiment of the application, and then the face deformation target corresponding to the current image frame is corrected according to the face grid so as to realize correction. The method for judging whether the image difference information between the face images in two adjacent image frames exceeds a preset range can be used for respectively extracting image feature vectors from the face images in the two image frames, calculating the similarity between the vectors by adopting any feasible data distance algorithm such as cosine similarity, euclidean distance, vector inner product and the like, and judging whether the similarity exceeds a preset threshold value.

In another embodiment, the projection distance may be calculated based on the feature space formed by the face deformation target of the current image frame and the previous face deformation target accumulation, and whether the current image frame belongs to the key frame may be determined according to whether the projection distance exceeds a preset distance threshold.

According to the above embodiment, it can be seen that, by directly or indirectly identifying whether the face image of the current image frame in the video stream corresponds to the key frame, the change amplitude of the face expression of the real person is perceived, and whether the face grid of the digital person is reconstructed is controlled, when the reconstruction is needed, other subsequent steps of the application are executed, otherwise, the face grid determined last time is used for generating the face deformation target, so as to avoid frequent reconstruction of the face model, further save the operation amount, and ensure the accuracy and the fidelity of the face expression carried by the generated face deformation target by reconstructing the face grid in time.

On the basis of any embodiment of the present application, referring to fig. 7, determining, according to a face deformation target of the three-dimensional face model, whether a face image for generating the face deformation target belongs to a key frame in a video stream where the face image belongs, includes:

step S1110, obtaining the face deformation target belonging to the current image frame in the video stream;

as described above, after each image frame in the video stream sequentially detects the face image therein, the image frames are input into the face generation model to generate the corresponding face deformation target. When a face image in a current image frame in the video stream enters the face generation model, after a face deformation target of the face generation model is correspondingly generated by the face generation model, the face deformation target is obtained and used for deciding whether the face deformation target belongs to a key frame or not.

Step S1120, carrying out statistical analysis on the vector representation of the face deformation target, and calculating the projection distance of the face deformation target;

to determine the projection distance of the face deformation target of the current image frame, the vector representation v= (R) corresponding to each vertex of the face deformation target of the current image frame _i ，t _i ，s _i ，δ _i ) And executing a principal component analysis algorithm (Principal Component Analysis, PCA) to realize dimension reduction, so as to obtain a mean value vector M and a feature vector matrix B. The PCA algorithm discards the last part, e.g., 5%, of the feature vectors, composes the remaining feature vectors into a feature space, and then calculates the projection distance of the face deformation target of the current image frame based on the feature space, as shown in the following formula:

d(V，B)＝||M+BB ^T (V-M)-V||

step S1130, determining whether the projection distance exceeds a preset distance threshold, and determining that the current image frame is a key frame when the projection distance exceeds the preset distance threshold, or else is a non-key frame.

When the projection distance of the face deformation target is larger than a set distance threshold, judging the corresponding current image frame as a key frame, and updating the feature space; when the distance threshold is not exceeded, then the current image frame belongs to a non-key frame.

In other embodiments, other equivalent algorithms may be used to calculate the projection distance instead of the principal component analysis algorithm, for example: principal Component Analysis (PCA), principal coordinate analysis (PCoA), non-metric multidimensional scaling analysis (NMDS), redundancy analysis (RDA), canonical Correspondence Analysis (CCA) all belong to a dimensionality reduction ordering analysis method and the like.

According to the above embodiment, it can be seen that, based on the statistical analysis performed on the vector representation of the face deformation target, whether the corresponding current image frame belongs to the key frame is determined, so as to control whether the face grid needs to be reconstructed.

On the basis of any embodiment of the present application, referring to fig. 8, after correcting a face deformation target of the three-dimensional face model according to the optimized face shape coefficient, the method includes:

step 1600, applying the corrected face deformation target to the three-dimensional face model, and rendering a face image of a digital person corresponding to the face deformation target;

the above various embodiments complete reconstruction of the face mesh of the three-dimensional face model, and after the face mesh is utilized to correct the face deformation target corresponding to the image frame in the video stream, the face deformation target can be directly replaced into the three-dimensional face model of the digital person, so that the face shape coefficient and the face expression coefficient contained in the face deformation target, which are subjected to the face action in the three-dimensional face model of the digital person, generate corresponding action switching, and a corresponding face expression posture is obtained, so that the three-dimensional face model of the digital person after expression migration can be obtained.

Further, according to actual needs, combining with illumination coefficients, texture coefficients and the like required by the digital person, performing three-dimensional rendering on the three-dimensional face model of the digital person and projecting the three-dimensional face model into a two-dimensional image space to obtain a face image of the digital person, namely finishing expression migration from the face image in the video image frame to the face image of the digital person.

Step S1700, replacing the face image of the digital person with the face image in the image frame of the video stream for generating the face deformation target, and pushing the video stream to the live broadcasting room for display.

In order to realize expression tracking, the face image of the digital person is replaced with the face image of the corresponding current image frame in the video stream, so that the face image of the real person in the video stream is replaced with the face image of the digital person, and the face expression tracking in the video stream can be realized. In one embodiment, in the video stream of the live broadcasting platform, the face image in each image frame is acquired and replaced by the face image of the digital person, so that the face images in the video stream of the anchor user are replaced by the face images of the digital person, and synchronous live broadcasting based on the digital person can be performed.

According to the embodiment, the method and the device can bring great application value to industries such as live broadcasting, film and television, digital image and the like from the reconstruction of the face grid of the three-dimensional face model to the expression migration of the digital person, and the expression migration application does not influence the change of other face information.

Referring to fig. 9, in one embodiment, a face deformation target correction device provided according to an aspect of the present application includes a base coefficient obtaining module 1200, a face region segmentation module 1300, a vertex partition optimization module 1400, and a deformation target correction module 1500, where the base coefficient obtaining module 1200 is configured to obtain a base face coefficient set, the base face coefficient set includes neutral face coefficients and average face coefficients of a plurality of face bases acquired in advance, and the face coefficients include face shape coefficients; the face region segmentation module 1300 is configured to correspondingly segment the base face coefficient set into regional face coefficients of each face region according to face key point information corresponding to each preset face region in a face grid of the three-dimensional face model; the vertex partition optimization module 1400 is configured to optimize vertices of corresponding regions in the face mesh according to the regional face coefficients corresponding to each face region, so as to determine face shape coefficients of the face mesh; the deformation target correction module 1500 is configured to correct a face deformation target of the three-dimensional face model according to the optimized face shape coefficient.

On the basis of any embodiment of the present application, the face region segmentation module 1300 includes: the key point detection unit is used for carrying out face key point detection on a face image for generating the face deformation target and obtaining face key point information of a face area image in the face image; the vertex alignment unit is used for aligning each face key point in the face key point information with a standard face vertex in a face grid of the three-dimensional face model and determining a mapping relation between the face key point and the vertex in each face area; the base segmentation unit is configured to segment regional face coefficients corresponding to each preset face region in the face grid from the base face coefficient set according to the mapping relation, wherein each regional face coefficient comprises a neutral face coefficient and an average face coefficient of a corresponding region.

On the basis of any embodiment of the application, the face area comprises an eye area, a forehead area, a nose area, a mouth area and a face contour area, and each face area has a symmetrical structure.

On the basis of any embodiment of the present application, the vertex partition optimization module 1400 includes: the first optimizing unit is used for executing first optimizing operation on the vertexes corresponding to other face areas except the top half part of the face based on a nonlinear optimizing algorithm; the second optimizing unit is configured to perform a second optimizing operation on vertices corresponding to the face regions belonging to the top half of the face based on a linear optimizing algorithm.

On the basis of any embodiment of the present application, a face deformation target correction device of the present application includes: the key frame identification module is used for judging whether the face image used for generating the face deformation target belongs to a key frame in a video stream where the face image belongs according to the face deformation target of the three-dimensional face model, and when the face image belongs to the key frame, other modules of the device are allowed to operate, and otherwise, other modules are forbidden to operate.

On the basis of any embodiment of the present application, the key frame identification module includes: a target acquisition unit configured to acquire the face deformation target belonging to a current image frame in a video stream; a distance calculating unit configured to perform statistical analysis on the vector representation of the face deformation target, and calculate a projection distance of the face deformation target; and the operation decision unit is used for judging whether the projection distance exceeds a preset distance threshold, and when the projection distance exceeds the distance threshold, other modules of the device are allowed to operate, and otherwise, the other modules are forbidden to operate.

On the basis of any embodiment of the present application, a face deformation target correction device of the present application includes: the rendering processing module is used for applying the corrected face deformation target to the three-dimensional face model and rendering a face image of a digital person corresponding to the face deformation target; and the live broadcast pushing module is used for replacing the face image of the digital person with the face image in the image frame of the video stream for generating the face deformation target and pushing the video stream to a live broadcast room for display.

Another embodiment of the present application further provides a face deformation target correction apparatus. As shown in fig. 10, the internal structure of the face deformation target correction apparatus is schematically shown. The face deformation target correction device includes a processor, a computer readable storage medium, a memory, and a network interface connected by a system bus. The non-volatile readable storage medium of the face deformation target correction device is stored with an operating system, a database and computer readable instructions, the database can store an information sequence, and the computer readable instructions can enable the processor to realize a face deformation target correction method when being executed by the processor.

The processor of the face deformation target correction device is used for providing computing and control capabilities and supporting the operation of the whole face deformation target correction device. The memory of the face deformation target correction device may store computer readable instructions, which when executed by the processor, may cause the processor to perform the face deformation target correction method of the present application. The network interface of the face deformation target correction device is used for being connected and communicated with a terminal.

It will be appreciated by those skilled in the art that the structure shown in fig. 10 is merely a block diagram of a part of the structure related to the present application and does not constitute a limitation of the face deformation target correction device to which the present application is applied, and that a specific face deformation target correction device may include more or less components than those shown in the drawings, or may combine some components, or have a different arrangement of components.

The processor in this embodiment is configured to perform specific functions of each module in fig. 9, and the memory stores program codes and various types of data required for executing the above-described modules or sub-modules. The network interface is used for realizing data transmission between the user terminals or the servers. The nonvolatile readable storage medium in this embodiment stores therein program codes and data necessary for executing all modules in the face deformation target correction device of the present application, and the server can call the program codes and data of the server to execute the functions of all modules.

The present application also provides a non-transitory readable storage medium storing computer readable instructions that, when executed by one or more processors, cause the one or more processors to perform the steps of the face deformation target correction method of any of the embodiments of the present application.

The present application also provides a computer program product comprising computer programs/instructions which when executed by one or more processors implement the steps of the method described in any of the embodiments of the present application.

It will be appreciated by those skilled in the art that implementing all or part of the above-described methods according to the embodiments of the present application may be accomplished by way of a computer program stored in a non-transitory readable storage medium, which when executed may comprise the steps of the embodiments of the methods described above. The storage medium may be a computer readable storage medium such as a magnetic disk, an optical disk, a Read-Only Memory (ROM), or a random access Memory (Random Access Memory, RAM).

In summary, under the condition of maintaining lower operation, when the face grid is rebuilt, the vertices of the face grid are subjected to partition correction according to the preset face areas, and then the face grid is applied to the face deformation target, so that each specific face area of the face deformation target can obtain a refined optimization effect, the face image generated according to the face deformation target is more natural and fine, the imaging effect is better and more stable, and the digital human image generated by using the face deformation target can obtain excellent texture.

Claims

1. The face deformation target correction method is characterized by comprising the following steps of:

2. The face deformation target correction method according to claim 1, wherein the dividing the base face coefficient set into the regional face coefficients of each face region according to the face key point information corresponding to each preset face region in the face grid of the three-dimensional face model comprises:

performing face key point detection on a face image for generating the face deformation target to obtain face key point information of a face area image in the face image;

aligning each face key point in the face key point information with a standard face vertex in a face grid of the three-dimensional face model, and determining a mapping relation between the face key point and the vertex in each face area;

and according to the mapping relation, region face coefficients corresponding to each preset face region in the face grid are intensively segmented from the base face coefficients, and each region face coefficient comprises a neutral face coefficient and an average face coefficient of a corresponding region.

3. The face deformation target correction method according to claim 1, wherein the face region includes an eye region, a brow region, a nose region, a mouth region, and a face contour region, each face region having a symmetrical structure.

4. The face deformation target correction method according to claim 1, wherein optimizing vertices of corresponding regions in the face mesh according to region face coefficients corresponding to each face region to determine face shape coefficients of the face mesh comprises:

based on a nonlinear optimization algorithm, performing a first optimization operation on vertices corresponding to other face areas outside the top half of the face;

and based on a linear optimization algorithm, performing a second optimization operation on the vertexes corresponding to the face areas belonging to the top half of the face.

5. The face deformation target correction method according to claim 1, characterized by comprising, before acquiring the base face coefficient set:

judging whether the face image used for generating the face deformation target belongs to a key frame in a video stream where the face image belongs according to the face deformation target of the three-dimensional face model, executing the subsequent step when the face image belongs to the key frame, otherwise, neglecting the subsequent step.

6. The face deformation target correction method according to claim 1, wherein determining whether a face image for generating the face deformation target belongs to a key frame in a video stream in which the face deformation target belongs according to the face deformation target of the three-dimensional face model comprises:

Acquiring the human face deformation target belonging to the current image frame in the video stream;

carrying out statistical analysis on the vector representation of the face deformation target, and calculating the projection distance of the face deformation target;

and judging whether the projection distance exceeds a preset distance threshold, and determining that the current image frame is a key frame when the projection distance exceeds the distance threshold, or else, determining that the current image frame is a non-key frame.

7. The face deformation target correction method according to any one of claims 1 to 6, characterized by comprising, after correcting the face deformation target of the three-dimensional face model according to the optimized face shape coefficient:

applying the corrected face deformation target to the three-dimensional face model, and rendering a face image of a digital person corresponding to the face deformation target;

and replacing the face image of the digital person with the face image in the image frame of the video stream for generating the face deformation target, and pushing the video stream to a live broadcasting room for display.

8. A face deformation target correction device, characterized by comprising:

9. A face deformation target correction device comprising a central processor and a memory, wherein the central processor is arranged to invoke execution of a computer program stored in the memory to perform the steps of the method according to any of claims 1 to 7.

10. A non-transitory readable storage medium, characterized in that it stores a computer program in the form of computer readable instructions, which when invoked by a computer to run, performs the steps of the method according to any one of claims 1 to 7.

11. A computer program product comprising computer programs/instructions which, when invoked by a processor, perform the steps of the method according to any one of claims 1 to 7.