CN112734895A

CN112734895A - Three-dimensional face processing method and electronic equipment

Info

Publication number: CN112734895A
Application number: CN202011643370.9A
Authority: CN
Inventors: 屈雁秋; 何山; 胡金水; 殷兵
Original assignee: iFlytek Co Ltd
Current assignee: iFlytek Co Ltd
Priority date: 2020-12-30
Filing date: 2020-12-30
Publication date: 2021-04-30

Abstract

The application provides a three-dimensional face processing method which comprises the following steps: reconstructing a three-dimensional face in a video sequence to obtain a three-dimensional face parameter set; and adding a time domain constraint for the three-dimensional face parameter set by using a two-dimensional face image texture corresponding to the three-dimensional face, and optimizing the three-dimensional face parameter set so that the three-dimensional face parameter set and the two-dimensional face image texture meet a preset condition in a time domain, wherein the time domain constraint is characterized by the consistency of the two-dimensional image texture. The application also provides corresponding electronic equipment. By the scheme, the time domain continuity of the reconstructed three-dimensional face parameters is guaranteed, the three-dimensional face parameters are stable and smooth, and the problem of delay is avoided.

Description

Three-dimensional face processing method and electronic equipment

Technical Field

The disclosed embodiments of the present application relate to the field of image processing technologies, and more particularly, to a three-dimensional face processing method and an electronic device.

Background

As the connection between computer graphics and computer vision technology becomes closer, the research of parameterizing three-dimensional face models based on 3D digital media objects (DMM) is more and more advanced, and the research scheme of estimating parameters of corresponding three-dimensional face models from a single RGB face image is more and more common.

At present, the three-dimensional face reconstruction task based on video has large difference of three-dimensional model parameters generated between frames due to the relatively complex task flow, so that the generated three-dimensional model grid has obvious jitter in a time domain.

Disclosure of Invention

According to an embodiment of the application, the application provides a three-dimensional face processing method and electronic equipment.

According to a first aspect of the present application, an exemplary three-dimensional face processing method is disclosed. An exemplary three-dimensional face processing method includes: reconstructing a three-dimensional face in a video sequence to obtain a three-dimensional face parameter set; and adding a time domain constraint for the three-dimensional face parameter set by using a two-dimensional face image texture corresponding to the three-dimensional face, and optimizing the three-dimensional face parameter set so that the three-dimensional face parameter set and the two-dimensional face image texture meet a preset condition in a time domain, wherein the time domain constraint is characterized by the consistency of the two-dimensional image texture.

In some embodiments, the video sequence includes a current frame face image and a next frame face image, and the three-dimensional face parameter set includes a current frame three-dimensional face parameter and a next frame three-dimensional face parameter, where the current frame three-dimensional face parameter is used to represent a current frame three-dimensional face mesh, and the next frame three-dimensional face parameter is used to represent a next frame three-dimensional face mesh; adding a time domain constraint to the three-dimensional face parameter set by using a two-dimensional image texture corresponding to the three-dimensional face, wherein the adding comprises: acquiring a current frame texture image according to the current frame face image and the current frame three-dimensional face grid; and rendering the next three-dimensional face grid by using the current frame texture image so as to project the next three-dimensional face grid to the current frame face image to obtain a current frame synthetic image.

In some embodiments, said optimizing said set of three-dimensional face parameters comprises: and acquiring an optical flow from the current frame synthetic image to the next frame face image, and correcting the projection from the vertex of the next frame three-dimensional face grid to the plane of the next frame face image through the optical flow to achieve an optimization target, so that the next frame three-dimensional face parameter and the texture of the next frame face image in the time domain meet the preset condition.

In some embodiments, the optimization objective includes a first sub-objective and a second sub-objective; the first sub-target is used for representing a texture continuous item, wherein the texture continuous item is calculated by the position of the nth vertex of the next frame of three-dimensional face mesh under the action of the optimized next frame of three-dimensional face parameter under the coordinate of the next frame of face image and the position of the nth vertex of the unoptimized next frame of three-dimensional face mesh under the coordinate of the next frame of face image after optical flow modification; the second sub-target is used for representing a Z coordinate smoothing item, wherein the Z coordinate smoothing item is calculated by a Z coordinate value of an m-th three-dimensional face key point under the action of the optimized next frame of three-dimensional face parameters and a Z coordinate value of an m-th three-dimensional face key point under the action of the current frame of three-dimensional face parameters.

In some embodiments, if the sum of the value of the texture continuation term and the value of the Z-coordinate smoothing term is smaller than a preset value, the continuity of the three-dimensional face parameter of the next frame in the time domain and the texture of the face image of the next frame satisfies a preset continuity condition, and the Z-coordinate of the three-dimensional face parameter of the next frame satisfies a preset smoothing condition.

In some embodiments, the current frame composite image is acquired using an arbitrary differentiable renderer.

In some embodiments, the method further comprises: adding two-dimensional key point constraints corresponding to the three-dimensional face parameter set; said optimizing said three-dimensional set of face parameters further comprises: and optimizing the three-dimensional face parameter set through the two-dimensional key point constraint so as to realize the consistency of the three-dimensional face parameter set and the two-dimensional face image.

According to a second aspect of the present application, an exemplary electronic device is disclosed, the exemplary electronic device comprising a processor and a memory, the memory storing instructions that, when executed, cause the processor to perform the three-dimensional face processing method according to the first aspect.

According to a third aspect of the present application, an example non-volatile computer storage medium is disclosed, the example non-volatile computer storage medium storing instructions that, when executed, cause a processor to perform the three-dimensional face processing method according to the first aspect.

The beneficial effect of this application has: after a three-dimensional face in a video sequence is reconstructed, a two-dimensional face image texture corresponding to the three-dimensional face is used, time domain constraint, namely consistency constraint of the two-dimensional image texture, is added to a three-dimensional face parameter set, and the three-dimensional face parameter set is optimized, so that the three-dimensional face parameter set and the two-dimensional face image texture meet preset conditions in the time domain, consistency with a two-dimensional image space is achieved, time domain continuity of the reconstructed three-dimensional face parameters is guaranteed, the three-dimensional face parameters are stable and smooth, and the problem of delay is avoided.

These and other objects of the present application will no doubt become obvious to those of ordinary skill in the art after having read the following detailed description of the preferred embodiments, which are illustrated in the various drawing figures and drawings.

Drawings

The present application will be further described with reference to the accompanying drawings and embodiments, in which:

fig. 1 is a flowchart of a three-dimensional face processing method according to an embodiment of the present application.

Fig. 2 is a partial flowchart of a three-dimensional face processing method according to an embodiment of the present application.

FIG. 3 is a schematic diagram of establishing a three-dimensional mesh space to two-dimensional image space relationship as employed in accordance with an embodiment of the present application.

Fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

FIG. 5 is a schematic diagram of a non-volatile storage medium according to an embodiment of the present application.

Detailed Description

In order to solve the problem that the generated three-dimensional model mesh has obvious jitter in the time domain, a parameterized three-dimensional face reconstruction scheme of a video sequence generally needs a scheme for ensuring the time sequence stability. The current scheme for ensuring the timing stability has the following modes:

one way is to ensure the timing stability of the input two-dimensional (2D) face keypoints, specifically, to perform timing filtering on the two-dimensional keypoints. The filtering of the input parameters can ensure the timing stability of the output parameters, but can cause the problem of input delay, thereby causing delay of the output parameters, and causing some parameters of the three-dimensional face to fail to track the two-dimensional image in time, such as opening and closing the mouth, and the opening and closing actions are not smooth enough.

The other mode is to ensure the stability of the output three-dimensional face rigid body pose and expression parameters, and specifically, to perform Kalman filtering on the corresponding parameters. Because the facial expression parameters are very sensitive to noise and are easy to generate unreasonable mixed shapes, satisfactory results cannot be generated by only using geometric constraints on a time domain.

It can be seen that neither the filtering based on the input parameters nor the filtering for the output three-dimensional face parameters can ensure that the generated three-dimensional face and the image texture keep time sequence consistency.

Therefore, the application provides a three-dimensional face processing method and electronic equipment.

In order to make those skilled in the art better understand the technical solutions of the present application, the following detailed description is made with reference to the accompanying drawings and the detailed description.

Fig. 1 is a flowchart of a three-dimensional face processing method according to an embodiment of the present application. The method may be performed by an electronic device including, but not limited to, a computer, a server, and the like. The method comprises the following steps:

step 110: and reconstructing a three-dimensional face in the video sequence to obtain a three-dimensional face parameter set.

The video sequence comprises a plurality of frames of face images, each frame of face image is a two-dimensional image, a three-dimensional face in the video sequence is reconstructed, and a three-dimensional face parameter set is obtained.

And reconstructing a three-dimensional face, specifically, performing frame-by-frame parameterization on a video sequence and performing three-dimensional face reconstruction, thereby obtaining each frame of three-dimensional face parameters, wherein the each frame of three-dimensional face parameters is not subjected to any other processing, namely, the original data of each frame of three-dimensional face parameters is obtained.

In an example, after the video sequence is subjected to frame extraction, a current frame face image is obtained, two-dimensional key points, such as points at a face contour and points at a nose tip, are extracted from an original two-dimensional face image corresponding to the current frame face image by using a three-dimensional face reconstruction scheme based on the key points, and then three-dimensional face reconstruction is performed to obtain original data of three-dimensional face parameters of the current frame. In other examples, other three-dimensional face reconstruction schemes may also be used to reconstruct a three-dimensional face in a video sequence, for example, a three-dimensional face reconstruction scheme based on a differentiable renderer, a three-dimensional face reconstruction scheme based on a neural network, and the like.

Step 120: and adding time domain constraint for the three-dimensional face parameter set by using the two-dimensional face image texture corresponding to the three-dimensional face, and optimizing the three-dimensional face parameter set so that the three-dimensional face parameter set and the two-dimensional face image texture in the time domain meet preset conditions.

The time domain constraint is characterized by the consistency of the two-dimensional image texture, namely, the consistency constraint of the two-dimensional image texture is added to the three-dimensional face parameter set.

In an example, the preset condition indicates that the continuity of the three-dimensional face parameter set in the time domain is in a preset direction which shows the continuity better, and further indicates that the three-dimensional face parameter set in the time domain is consistent with the texture of the two-dimensional face image.

After the three-dimensional face is reconstructed, a time domain constraint is added to the three-dimensional face parameter set by using a two-dimensional face image texture corresponding to the three-dimensional face, namely, a time domain relation between a space of a three-dimensional face grid and a space of a two-dimensional image is established by using the image texture, and the three-dimensional face parameter set is optimized, so that the three-dimensional face parameter set and the two-dimensional face image texture in the time domain meet a preset condition.

In this embodiment, after a three-dimensional face in a video sequence is reconstructed, a two-dimensional face image texture corresponding to the three-dimensional face is used, a time domain constraint, that is, a consistency constraint of the two-dimensional image texture is added to a three-dimensional face parameter set, and the three-dimensional face parameter set is optimized, so that the three-dimensional face parameter set and the two-dimensional face image texture meet a preset condition in a time domain, consistency with a two-dimensional image space is achieved, time domain continuity of the reconstructed three-dimensional face parameter is ensured, stability and smoothness of the three-dimensional face parameter are ensured, and a delay problem is not caused.

The video sequence comprises a current frame face image and a next frame face image, and the three-dimensional face parameter set comprises a current frame three-dimensional face parameter and a next frame three-dimensional face parameter, wherein the current frame three-dimensional face parameter is used for representing a current frame three-dimensional face grid, and the next frame three-dimensional face parameter is used for representing a next frame three-dimensional face grid.

For convenience of description, assume that a current frame face image is labeled as the ith frame, where i is greater than or equal to 0, a next frame face image is labeled as the (i + 1) th frame, and a current frame three-dimensional face parameter is labeled as Θ_iAnd the next frame of three-dimensional human face parameters is marked as theta_i+1The current frame three-dimensional face parameters are used for representing a current frame three-dimensional face grid, and specifically, the representation of the current frame three-dimensional face grid is as follows:

wherein, beta_iIs the three-dimensional face expression parameter in the three-dimensional face parameter of the current frame, namely the expression parameter in the three-dimensional face parameter of the ith frame, is the parameter of the non-rigid motion of the face, gamma_i＝(s_i，R_i，t_i) The three-dimensional face pose parameters in the current frame three-dimensional face parameters, namely the pose parameters in the ith frame three-dimensional face parameters, represent scaling parameters, rotation parameters and translation parameters under weak perspective transformation and are parameters of face rigid motion. By analogy, the next frame can be obtainedThe representation of the three-dimensional face mesh is not described herein for brevity and clarity.

As above, the time domain constraint is added to the three-dimensional face parameter set by using the two-dimensional image texture corresponding to the three-dimensional face, and in some embodiments, as shown in fig. 2, this step includes:

step 221: and acquiring a current frame texture image according to the current frame face image and the current frame three-dimensional face grid.

And projecting the current frame three-dimensional face mesh to the space of the current frame face image according to the current frame face image and the current frame three-dimensional face mesh to obtain a current frame texture image, namely, pixels on the current frame face image corresponding to the UV coordinates of each vertex of the current frame three-dimensional face mesh, so as to obtain the UV coordinates of each vertex, and further form the current frame texture image. The following description will take the current frame as the ith frame as an example.

From the ith frame of face image I_iAnd the ith frame three-dimensional face grid M (theta)_i) In the above, the texture image t of the current frame is obtained_iThe calculation formula is as follows:

t_i＝F_J(I_i，M(Θ_i))

wherein the function F_JAnd the method is used for establishing the connection from the three-dimensional grid space to the two-dimensional image space and acquiring the UV texture of the three-dimensional face model. As shown in fig. 3, is the function F_JThe method is used for establishing a relation between the ith frame of three-dimensional face mesh and the ith frame of face image (namely, a two-dimensional image) and obtaining the ith frame of texture image, wherein a point marked in the ith frame of face image is a visible face vertex, correspondingly, a point marked in the ith frame of three-dimensional face mesh is a vertex of the ith frame of three-dimensional face mesh, and an obtained UV coordinate of the corresponding vertex is shown in FIG. 3.

In particular, the function F is implemented_JThe process comprises the following steps: firstly, a visible face vertex in a current frame face image is obtained through depth detection, and then a pixel which is closest to the face vertex in the rasterized current frame face image is endowed with a UV texture corresponding to the face vertex. It is described by the following pseudo-code:

after a certain UV texture is obtained, the corresponding two-dimensional image features (namely visible surface vertexes in the current frame face image) are kept in the coordinates of the three-dimensional model of the surface vertexes, and the texture image t_iEach of the texels

All correspond to a certain vertex on the three-dimensional face mesh of the current frame

Therefore, it can be seen that the constraint of the consistency of the texture in the time domain can be transmitted back to the expression parameter β of the current frame three-dimensional face image_iAnd pose parameter s_i，R_i，t_i。

Step 222: and rendering the next frame of three-dimensional face grid by using the current frame texture image so as to project the next frame of three-dimensional face grid to the current frame face image to obtain a current frame synthetic image.

The description is continued by taking the current frame as the ith frame as an example.

Using the texture image of the current frame to render the three-dimensional face grid of the next frame to obtain a composite image of the current frame, wherein the expression formula is as follows:

wherein the function F_RAnd the rendering device is an arbitrary differentiable renderer and is used for projecting the next frame of three-dimensional face mesh to the current frame of face image, wherein the next frame of three-dimensional face mesh is projected to the current frame of face image because the next frame of three-dimensional face mesh has the two-dimensional image characteristics (the vertex in the current frame of face image) of the ith frame of face image, and the obtained current frame of composite image has the information of the next frame of three-dimensional face parameters.

In particular, in an example, the differentiable renderer is implemented by a neural network. The neural network comprises two stages, namely a training stage and a testing stage, wherein the training stage is used for constructing the neural network with parameters, and the testing stage is a network forward process using the neural network with parameters as a rendering function. That is, the current frame composite image is acquired through the testing stage of the neural network.

Since the UV texture can be kept consistent in the three-dimensional mesh space, it can be seen from the above that, after the i-th frame three-dimensional face mesh is transformed by the i + 1-th frame three-dimensional face parameter, the UV texture still retains the two-dimensional image feature of the i-th frame face image, that is, the two-dimensional image texture consistency.

As described above, after the time domain constraint is added to the three-dimensional face parameter set, the three-dimensional face parameter set is optimized. In some embodiments, optimizing the set of three-dimensional face parameters comprises: and acquiring an optical flow from the current frame synthetic image to the next frame face image, and modifying the projection from the vertex of the next frame three-dimensional face grid to the plane of the next frame face image through the optical flow to achieve an optimization target, so that the texture of the next frame three-dimensional face parameter and the texture of the next frame face image in a time domain meet a preset condition.

Synthesizing an image from an ith frame

To the (I + 1) th frame of face image I_i+1The optical flow of (A) is:

where f is an arbitrary existing optical flow operator, and for an arbitrary pixel position a of the I +1 th frame two-dimensional image plane, the coordinate is (x, y), the optical flow operator is configured to modify the pixel position a to a '(the coordinate is (x' ═ x + u, y '═ y + v)) by mapping I (x, y) → I' (x + u, y + v), so as to ensure continuity between the I-th frame two-dimensional image and the I +1 th frame two-dimensional image.

In the above formulaOptical flow map F modified differentiable renderer F_RRendered current frame composite image

And the next frame of face image I_i+1Further correcting the three-dimensional face parameters theta of the current frame_iExpression parameter beta in_iAnd pose parameter s_i，R_i，t_i。

And the optimization target is used for quantifying the three-dimensional face parameters of the next frame in the time domain and the texture of the face image of the next frame to meet the preset condition.

In some embodiments, the optimization goal includes a first sub-goal and a second sub-goal, wherein the first sub-goal is used for characterizing a texture continuum, wherein the texture continuum is calculated by the position of the nth vertex of the next frame of three-dimensional face mesh under the action of the optimized next frame of three-dimensional face parameters under the coordinates of the next frame of face image and the position of the nth vertex of the unoptimized next frame of three-dimensional face mesh under the coordinates of the next frame of face image after optical flow modification. And the second sub-target is used for representing a Z coordinate smoothing item, wherein the Z coordinate smoothing item is calculated by the Z coordinate value of the mth three-dimensional face key point under the action of the optimized three-dimensional face parameter of the next frame and the Z coordinate value of the mth three-dimensional face key point under the action of the three-dimensional face parameter of the current frame.

Specifically, the calculation formula of the optimization objective is as follows:

wherein the first sub-target is used for representing texture continuation item

The calculation formula is the following formula 1:

wherein the content of the first and second substances,

representing the three-dimensional face parameters in the optimized next frame

Under the action of the three-dimensional face mesh, the nth vertex of the next frame of three-dimensional face mesh is at the position of the next frame of face image under the coordinate.

And the position of the nth vertex of the unoptimized next frame of three-dimensional face mesh under the coordinates of the next frame of face image after optical flow correction is shown. Wherein the content of the first and second substances,

it has been calculated in advance from frame 0 and remains constant during the optimization.

And m represents the number of the vertexes of the three-dimensional face mesh of the next frame.

Texture continuation term obtained according to the above equation 1

The smaller the value of (a), the three-dimensional face parameter theta of the next frame is represented_i+1The better the texture continuity in the temporal domain with the next frame of face image.

The second sub-target is used for representing Z coordinate smooth item

Is calculated by the formulaThe following equation 2:

wherein the content of the first and second substances,

representing the three-dimensional face parameters in the optimized next frame

Z component (i.e., Z coordinate value) of the mth three-dimensional face key point under the action of (1);

three-dimensional human face parameters theta expressed in the current frame_i(i.e., the original unprocessed three-dimensional face parameters Θ of the current frame_i) And Z component of the mth three-dimensional face key point under action. k is a radical of_mAnd representing the index of the mth three-dimensional face key point on the grid vertex.

According to the formula 2, the smaller the value of the obtained Z coordinate smoothing term is, the smoother the Z coordinate representing the three-dimensional face parameter of the next frame is.

Further, in some embodiments, the sum of the value of the texture continuation term and the value of the Z-coordinate smoothing term (i.e., the sum of the values of the texture continuation term and the Z-coordinate smoothing term)

) And if the value is less than the preset value, the continuity of the three-dimensional face parameter of the next frame and the texture of the face image of the next frame in the time domain meets the preset continuity condition, and the Z coordinate of the three-dimensional face parameter of the next frame meets the preset smoothing condition.

In some embodiments, the method further comprises adding a two-dimensional keypoint constraint corresponding to the three-dimensional face to the set of three-dimensional face parameters. Due to the fact that corresponding two-dimensional key point constraints are added to the three-dimensional face parameter set, the consistency of the two-dimensional face image and the three-dimensional face is guaranteed, and the accuracy of three-dimensional face reconstruction is improved. The two-dimensional face key point constraint is to perform related operations by taking a key point in a certain face image as a reference. For example, a two-dimensional key point constraint in the current frame face image is added to the current frame three-dimensional face parameters.

At this time, optimizing the three-dimensional face parameter set further includes: and optimizing the three-dimensional face parameter set through two-dimensional key point constraint so as to realize the consistency of the three-dimensional face parameter set and the two-dimensional face image. For example, the three-dimensional face parameters of the current frame are optimized by taking key points in the face image of the current frame as a reference, so that the consistency between the face parameters of the current frame and the corresponding two-dimensional face image of the current frame is realized.

Under the condition of ensuring the consistency of the two-dimensional face image and the three-dimensional face, the optimization target further comprises a third sub-target, the third sub-target is used for representing the two-dimensional key point constraint of the current frame, and the calculation formula is as follows:

wherein s is_2d，R，t_2dRepresenting weak perspective camera parameters, K_mRepresenting the m-th two-dimensional keypoint,

represents an orthogonal projection matrix under an orthogonal projection,

representing the three-dimensional face parameters in the optimized next frame

The coordinates of the mth three-dimensional face key point under the action of (1).

Specifically, in an embodiment where the optimization objective includes a first sub-objective, a second sub-objective, and a third sub-objective, that is, the calculation formula of the optimization objective at this time is as follows:

the optimization process comprises the following steps: firstly, a coordinate ascent method is used for respectively solving the pose parameters gamma_i＝(s_i，R_i，t_i) And expression parameter beta_iAnd then jointly optimized with the two-dimensional key point constraint through the texture consistency constraint. It is given by the following pseudo code:

fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present invention. The electronic device 400 includes a memory 410 and a processor 420. The memory 410 is coupled to the processor 420.

Memory 410 may include read-only memory and/or random access memory, etc., and provides instructions and data to processor 420. A portion of the memory 410 may also include non-volatile random access memory (NVRAM). The memory 410 stores elements, executable modules or data structures, or subsets thereof, or expanded sets thereof: the operation instructions comprise various operation instructions for realizing various operations; an operating system, including various system programs, is used to implement various basic services and to handle hardware-based tasks.

In a particular application, the various components of the terminal are coupled together by a bus 430, where the bus 430 may include a power bus, a control bus, a status signal bus, and the like, in addition to a data bus. But for clarity of illustration the various busses are labeled in the figures as bus 430.

In some embodiments, processor 420, by invoking instructions stored by memory 410, may perform the following operations:

reconstructing a three-dimensional face in a video sequence to obtain a three-dimensional face parameter set; and

adding time domain constraint for the three-dimensional face parameter set by using the two-dimensional face image texture corresponding to the three-dimensional face, and optimizing the three-dimensional face parameter set so that the three-dimensional face parameter set and the two-dimensional face image texture meet preset conditions in the time domain, wherein the time domain constraint is represented by the consistency of the two-dimensional image texture.

Processor 420 may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware or instructions in the form of software in the processor 420. The processor 420 may be a general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components. The various methods, steps and logic blocks disclosed in the embodiments of the present invention may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present invention may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The storage medium is located in the memory 410, and the processor 420 reads the information in the memory 410 and performs the steps of the above method in combination with the hardware thereof.

The present invention also provides an embodiment of a non-volatile storage medium, as shown in fig. 5, the non-volatile storage medium 500 stores instructions 501 executable by a processor, and the instructions 501 are used for executing the method in the above embodiment. Specifically, the storage medium 500 may be specifically the memory 410 shown in fig. 4 or be a part of the memory 410.

It will be apparent to those skilled in the art that many modifications and variations can be made in the devices and methods while maintaining the teachings of the present application. Accordingly, the above disclosure should be considered limited only by the scope of the following claims.

Claims

1. A three-dimensional face processing method is characterized by comprising the following steps:

adding time domain constraint to the three-dimensional face parameter set by using the two-dimensional face image texture corresponding to the three-dimensional face, and optimizing the three-dimensional face parameter set so that the three-dimensional face parameter set and the two-dimensional face image texture meet preset conditions in the time domain, wherein the time domain constraint is represented by the consistency of the two-dimensional image texture.

2. The three-dimensional face processing method according to claim 1, wherein the video sequence comprises a current frame face image and a next frame face image, and the three-dimensional face parameter set comprises a current frame three-dimensional face parameter used for representing a current frame three-dimensional face mesh and a next frame three-dimensional face parameter used for representing a next frame three-dimensional face mesh;

adding a time domain constraint to the three-dimensional face parameter set by using a two-dimensional image texture corresponding to the three-dimensional face, wherein the adding comprises:

acquiring a current frame texture image according to the current frame face image and the current frame three-dimensional face grid;

and rendering the next three-dimensional face grid by using the current frame texture image so as to project the next three-dimensional face grid to the current frame face image to obtain a current frame synthetic image.

3. The three-dimensional face processing method of claim 2, wherein said optimizing said set of three-dimensional face parameters comprises:

and acquiring an optical flow from the current frame synthetic image to the next frame face image, and correcting the projection from the vertex of the next frame three-dimensional face grid to the plane of the next frame face image through the optical flow to achieve an optimization target, so that the next frame three-dimensional face parameter and the texture of the next frame face image in the time domain meet the preset condition.

4. A three-dimensional face processing method as claimed in claim 3, characterized in that the optimization objective comprises a first sub-objective and a second sub-objective;

the first sub-target is used for representing a texture continuous item, wherein the texture continuous item is calculated by the position of the nth vertex of the next frame of three-dimensional face mesh under the action of the optimized next frame of three-dimensional face parameter under the coordinate of the next frame of face image and the position of the nth vertex of the unoptimized next frame of three-dimensional face mesh under the coordinate of the next frame of face image after optical flow modification;

the second sub-target is used for representing a Z coordinate smoothing item, wherein the Z coordinate smoothing item is calculated by a Z coordinate value of an m-th three-dimensional face key point under the action of the optimized next frame of three-dimensional face parameters and a Z coordinate value of an m-th three-dimensional face key point under the action of the current frame of three-dimensional face parameters.

5. The three-dimensional face processing method according to claim 4, wherein a sum of a value of the texture continuation term and a value of the Z-coordinate smoothing term is smaller than a preset value, then the continuity of the next frame of three-dimensional face parameter and the texture of the next frame of face image in the time domain satisfies a preset continuity condition, and the Z-coordinate of the next frame of three-dimensional face parameter satisfies a preset smoothing condition.

6. A three-dimensional face processing method as claimed in claim 2, characterized in that the current frame composite image is obtained using an arbitrary differentiable renderer.

7. A three-dimensional face processing method as claimed in claim 3, further comprising:

adding two-dimensional key point constraints corresponding to the three-dimensional face parameter set;

said optimizing said three-dimensional set of face parameters further comprises:

and optimizing the three-dimensional face parameter set through the two-dimensional key point constraint so as to realize the consistency of the three-dimensional face parameter set and the two-dimensional face image.

8. The three-dimensional face processing method as claimed in claim 7, wherein the optimization objective further comprises a third sub-objective, the third sub-objective being used to represent a current frame two-dimensional keypoint constraint.

9. An electronic device comprising a processor and a memory, the memory storing instructions that, when executed, cause the processor to perform the three-dimensional face processing method of any one of claims 1-8.

10. A non-transitory computer storage medium having stored thereon instructions that, when executed, cause a processor to perform the three-dimensional face processing method according to any one of claims 1 to 8.