CN112734895A - Three-dimensional face processing method and electronic equipment - Google Patents

Three-dimensional face processing method and electronic equipment Download PDF

Info

Publication number
CN112734895A
CN112734895A CN202011643370.9A CN202011643370A CN112734895A CN 112734895 A CN112734895 A CN 112734895A CN 202011643370 A CN202011643370 A CN 202011643370A CN 112734895 A CN112734895 A CN 112734895A
Authority
CN
China
Prior art keywords
dimensional face
dimensional
image
face
next frame
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011643370.9A
Other languages
Chinese (zh)
Inventor
屈雁秋
何山
胡金水
殷兵
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
iFlytek Co Ltd
Original Assignee
iFlytek Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by iFlytek Co Ltd filed Critical iFlytek Co Ltd
Priority to CN202011643370.9A priority Critical patent/CN112734895A/en
Publication of CN112734895A publication Critical patent/CN112734895A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T15/003D [Three Dimensional] image rendering
    • G06T15/04Texture mapping
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three dimensional [3D] modelling, e.g. data description of 3D objects

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computer Graphics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Geometry (AREA)
  • Software Systems (AREA)
  • Image Generation (AREA)

Abstract

The application provides a three-dimensional face processing method which comprises the following steps: reconstructing a three-dimensional face in a video sequence to obtain a three-dimensional face parameter set; and adding a time domain constraint for the three-dimensional face parameter set by using a two-dimensional face image texture corresponding to the three-dimensional face, and optimizing the three-dimensional face parameter set so that the three-dimensional face parameter set and the two-dimensional face image texture meet a preset condition in a time domain, wherein the time domain constraint is characterized by the consistency of the two-dimensional image texture. The application also provides corresponding electronic equipment. By the scheme, the time domain continuity of the reconstructed three-dimensional face parameters is guaranteed, the three-dimensional face parameters are stable and smooth, and the problem of delay is avoided.

Description

Three-dimensional face processing method and electronic equipment
Technical Field
The disclosed embodiments of the present application relate to the field of image processing technologies, and more particularly, to a three-dimensional face processing method and an electronic device.
Background
As the connection between computer graphics and computer vision technology becomes closer, the research of parameterizing three-dimensional face models based on 3D digital media objects (DMM) is more and more advanced, and the research scheme of estimating parameters of corresponding three-dimensional face models from a single RGB face image is more and more common.
At present, the three-dimensional face reconstruction task based on video has large difference of three-dimensional model parameters generated between frames due to the relatively complex task flow, so that the generated three-dimensional model grid has obvious jitter in a time domain.
Disclosure of Invention
According to an embodiment of the application, the application provides a three-dimensional face processing method and electronic equipment.
According to a first aspect of the present application, an exemplary three-dimensional face processing method is disclosed. An exemplary three-dimensional face processing method includes: reconstructing a three-dimensional face in a video sequence to obtain a three-dimensional face parameter set; and adding a time domain constraint for the three-dimensional face parameter set by using a two-dimensional face image texture corresponding to the three-dimensional face, and optimizing the three-dimensional face parameter set so that the three-dimensional face parameter set and the two-dimensional face image texture meet a preset condition in a time domain, wherein the time domain constraint is characterized by the consistency of the two-dimensional image texture.
In some embodiments, the video sequence includes a current frame face image and a next frame face image, and the three-dimensional face parameter set includes a current frame three-dimensional face parameter and a next frame three-dimensional face parameter, where the current frame three-dimensional face parameter is used to represent a current frame three-dimensional face mesh, and the next frame three-dimensional face parameter is used to represent a next frame three-dimensional face mesh; adding a time domain constraint to the three-dimensional face parameter set by using a two-dimensional image texture corresponding to the three-dimensional face, wherein the adding comprises: acquiring a current frame texture image according to the current frame face image and the current frame three-dimensional face grid; and rendering the next three-dimensional face grid by using the current frame texture image so as to project the next three-dimensional face grid to the current frame face image to obtain a current frame synthetic image.
In some embodiments, said optimizing said set of three-dimensional face parameters comprises: and acquiring an optical flow from the current frame synthetic image to the next frame face image, and correcting the projection from the vertex of the next frame three-dimensional face grid to the plane of the next frame face image through the optical flow to achieve an optimization target, so that the next frame three-dimensional face parameter and the texture of the next frame face image in the time domain meet the preset condition.
In some embodiments, the optimization objective includes a first sub-objective and a second sub-objective; the first sub-target is used for representing a texture continuous item, wherein the texture continuous item is calculated by the position of the nth vertex of the next frame of three-dimensional face mesh under the action of the optimized next frame of three-dimensional face parameter under the coordinate of the next frame of face image and the position of the nth vertex of the unoptimized next frame of three-dimensional face mesh under the coordinate of the next frame of face image after optical flow modification; the second sub-target is used for representing a Z coordinate smoothing item, wherein the Z coordinate smoothing item is calculated by a Z coordinate value of an m-th three-dimensional face key point under the action of the optimized next frame of three-dimensional face parameters and a Z coordinate value of an m-th three-dimensional face key point under the action of the current frame of three-dimensional face parameters.
In some embodiments, if the sum of the value of the texture continuation term and the value of the Z-coordinate smoothing term is smaller than a preset value, the continuity of the three-dimensional face parameter of the next frame in the time domain and the texture of the face image of the next frame satisfies a preset continuity condition, and the Z-coordinate of the three-dimensional face parameter of the next frame satisfies a preset smoothing condition.
In some embodiments, the current frame composite image is acquired using an arbitrary differentiable renderer.
In some embodiments, the method further comprises: adding two-dimensional key point constraints corresponding to the three-dimensional face parameter set; said optimizing said three-dimensional set of face parameters further comprises: and optimizing the three-dimensional face parameter set through the two-dimensional key point constraint so as to realize the consistency of the three-dimensional face parameter set and the two-dimensional face image.
According to a second aspect of the present application, an exemplary electronic device is disclosed, the exemplary electronic device comprising a processor and a memory, the memory storing instructions that, when executed, cause the processor to perform the three-dimensional face processing method according to the first aspect.
According to a third aspect of the present application, an example non-volatile computer storage medium is disclosed, the example non-volatile computer storage medium storing instructions that, when executed, cause a processor to perform the three-dimensional face processing method according to the first aspect.
The beneficial effect of this application has: after a three-dimensional face in a video sequence is reconstructed, a two-dimensional face image texture corresponding to the three-dimensional face is used, time domain constraint, namely consistency constraint of the two-dimensional image texture, is added to a three-dimensional face parameter set, and the three-dimensional face parameter set is optimized, so that the three-dimensional face parameter set and the two-dimensional face image texture meet preset conditions in the time domain, consistency with a two-dimensional image space is achieved, time domain continuity of the reconstructed three-dimensional face parameters is guaranteed, the three-dimensional face parameters are stable and smooth, and the problem of delay is avoided.
These and other objects of the present application will no doubt become obvious to those of ordinary skill in the art after having read the following detailed description of the preferred embodiments, which are illustrated in the various drawing figures and drawings.
Drawings
The present application will be further described with reference to the accompanying drawings and embodiments, in which:
fig. 1 is a flowchart of a three-dimensional face processing method according to an embodiment of the present application.
Fig. 2 is a partial flowchart of a three-dimensional face processing method according to an embodiment of the present application.
FIG. 3 is a schematic diagram of establishing a three-dimensional mesh space to two-dimensional image space relationship as employed in accordance with an embodiment of the present application.
Fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
FIG. 5 is a schematic diagram of a non-volatile storage medium according to an embodiment of the present application.
Detailed Description
In order to solve the problem that the generated three-dimensional model mesh has obvious jitter in the time domain, a parameterized three-dimensional face reconstruction scheme of a video sequence generally needs a scheme for ensuring the time sequence stability. The current scheme for ensuring the timing stability has the following modes:
one way is to ensure the timing stability of the input two-dimensional (2D) face keypoints, specifically, to perform timing filtering on the two-dimensional keypoints. The filtering of the input parameters can ensure the timing stability of the output parameters, but can cause the problem of input delay, thereby causing delay of the output parameters, and causing some parameters of the three-dimensional face to fail to track the two-dimensional image in time, such as opening and closing the mouth, and the opening and closing actions are not smooth enough.
The other mode is to ensure the stability of the output three-dimensional face rigid body pose and expression parameters, and specifically, to perform Kalman filtering on the corresponding parameters. Because the facial expression parameters are very sensitive to noise and are easy to generate unreasonable mixed shapes, satisfactory results cannot be generated by only using geometric constraints on a time domain.
It can be seen that neither the filtering based on the input parameters nor the filtering for the output three-dimensional face parameters can ensure that the generated three-dimensional face and the image texture keep time sequence consistency.
Therefore, the application provides a three-dimensional face processing method and electronic equipment.
In order to make those skilled in the art better understand the technical solutions of the present application, the following detailed description is made with reference to the accompanying drawings and the detailed description.
Fig. 1 is a flowchart of a three-dimensional face processing method according to an embodiment of the present application. The method may be performed by an electronic device including, but not limited to, a computer, a server, and the like. The method comprises the following steps:
step 110: and reconstructing a three-dimensional face in the video sequence to obtain a three-dimensional face parameter set.
The video sequence comprises a plurality of frames of face images, each frame of face image is a two-dimensional image, a three-dimensional face in the video sequence is reconstructed, and a three-dimensional face parameter set is obtained.
And reconstructing a three-dimensional face, specifically, performing frame-by-frame parameterization on a video sequence and performing three-dimensional face reconstruction, thereby obtaining each frame of three-dimensional face parameters, wherein the each frame of three-dimensional face parameters is not subjected to any other processing, namely, the original data of each frame of three-dimensional face parameters is obtained.
In an example, after the video sequence is subjected to frame extraction, a current frame face image is obtained, two-dimensional key points, such as points at a face contour and points at a nose tip, are extracted from an original two-dimensional face image corresponding to the current frame face image by using a three-dimensional face reconstruction scheme based on the key points, and then three-dimensional face reconstruction is performed to obtain original data of three-dimensional face parameters of the current frame. In other examples, other three-dimensional face reconstruction schemes may also be used to reconstruct a three-dimensional face in a video sequence, for example, a three-dimensional face reconstruction scheme based on a differentiable renderer, a three-dimensional face reconstruction scheme based on a neural network, and the like.
Step 120: and adding time domain constraint for the three-dimensional face parameter set by using the two-dimensional face image texture corresponding to the three-dimensional face, and optimizing the three-dimensional face parameter set so that the three-dimensional face parameter set and the two-dimensional face image texture in the time domain meet preset conditions.
The time domain constraint is characterized by the consistency of the two-dimensional image texture, namely, the consistency constraint of the two-dimensional image texture is added to the three-dimensional face parameter set.
In an example, the preset condition indicates that the continuity of the three-dimensional face parameter set in the time domain is in a preset direction which shows the continuity better, and further indicates that the three-dimensional face parameter set in the time domain is consistent with the texture of the two-dimensional face image.
After the three-dimensional face is reconstructed, a time domain constraint is added to the three-dimensional face parameter set by using a two-dimensional face image texture corresponding to the three-dimensional face, namely, a time domain relation between a space of a three-dimensional face grid and a space of a two-dimensional image is established by using the image texture, and the three-dimensional face parameter set is optimized, so that the three-dimensional face parameter set and the two-dimensional face image texture in the time domain meet a preset condition.
In this embodiment, after a three-dimensional face in a video sequence is reconstructed, a two-dimensional face image texture corresponding to the three-dimensional face is used, a time domain constraint, that is, a consistency constraint of the two-dimensional image texture is added to a three-dimensional face parameter set, and the three-dimensional face parameter set is optimized, so that the three-dimensional face parameter set and the two-dimensional face image texture meet a preset condition in a time domain, consistency with a two-dimensional image space is achieved, time domain continuity of the reconstructed three-dimensional face parameter is ensured, stability and smoothness of the three-dimensional face parameter are ensured, and a delay problem is not caused.
The video sequence comprises a current frame face image and a next frame face image, and the three-dimensional face parameter set comprises a current frame three-dimensional face parameter and a next frame three-dimensional face parameter, wherein the current frame three-dimensional face parameter is used for representing a current frame three-dimensional face grid, and the next frame three-dimensional face parameter is used for representing a next frame three-dimensional face grid.
For convenience of description, assume that a current frame face image is labeled as the ith frame, where i is greater than or equal to 0, a next frame face image is labeled as the (i + 1) th frame, and a current frame three-dimensional face parameter is labeled as ΘiAnd the next frame of three-dimensional human face parameters is marked as thetai+1The current frame three-dimensional face parameters are used for representing a current frame three-dimensional face grid, and specifically, the representation of the current frame three-dimensional face grid is as follows:
Figure BDA0002873472560000061
wherein, betaiIs the three-dimensional face expression parameter in the three-dimensional face parameter of the current frame, namely the expression parameter in the three-dimensional face parameter of the ith frame, is the parameter of the non-rigid motion of the face, gammai=(si,Ri,ti) The three-dimensional face pose parameters in the current frame three-dimensional face parameters, namely the pose parameters in the ith frame three-dimensional face parameters, represent scaling parameters, rotation parameters and translation parameters under weak perspective transformation and are parameters of face rigid motion. By analogy, the next frame can be obtainedThe representation of the three-dimensional face mesh is not described herein for brevity and clarity.
As above, the time domain constraint is added to the three-dimensional face parameter set by using the two-dimensional image texture corresponding to the three-dimensional face, and in some embodiments, as shown in fig. 2, this step includes:
step 221: and acquiring a current frame texture image according to the current frame face image and the current frame three-dimensional face grid.
And projecting the current frame three-dimensional face mesh to the space of the current frame face image according to the current frame face image and the current frame three-dimensional face mesh to obtain a current frame texture image, namely, pixels on the current frame face image corresponding to the UV coordinates of each vertex of the current frame three-dimensional face mesh, so as to obtain the UV coordinates of each vertex, and further form the current frame texture image. The following description will take the current frame as the ith frame as an example.
From the ith frame of face image IiAnd the ith frame three-dimensional face grid M (theta)i) In the above, the texture image t of the current frame is obtainediThe calculation formula is as follows:
ti=FJ(Ii,M(Θi))
wherein the function FJAnd the method is used for establishing the connection from the three-dimensional grid space to the two-dimensional image space and acquiring the UV texture of the three-dimensional face model. As shown in fig. 3, is the function FJThe method is used for establishing a relation between the ith frame of three-dimensional face mesh and the ith frame of face image (namely, a two-dimensional image) and obtaining the ith frame of texture image, wherein a point marked in the ith frame of face image is a visible face vertex, correspondingly, a point marked in the ith frame of three-dimensional face mesh is a vertex of the ith frame of three-dimensional face mesh, and an obtained UV coordinate of the corresponding vertex is shown in FIG. 3.
In particular, the function F is implementedJThe process comprises the following steps: firstly, a visible face vertex in a current frame face image is obtained through depth detection, and then a pixel which is closest to the face vertex in the rasterized current frame face image is endowed with a UV texture corresponding to the face vertex. It is described by the following pseudo-code:
Figure BDA0002873472560000071
after a certain UV texture is obtained, the corresponding two-dimensional image features (namely visible surface vertexes in the current frame face image) are kept in the coordinates of the three-dimensional model of the surface vertexes, and the texture image tiEach of the texels
Figure BDA0002873472560000072
All correspond to a certain vertex on the three-dimensional face mesh of the current frame
Figure BDA0002873472560000073
Therefore, it can be seen that the constraint of the consistency of the texture in the time domain can be transmitted back to the expression parameter β of the current frame three-dimensional face imageiAnd pose parameter si,Ri,ti
Step 222: and rendering the next frame of three-dimensional face grid by using the current frame texture image so as to project the next frame of three-dimensional face grid to the current frame face image to obtain a current frame synthetic image.
The description is continued by taking the current frame as the ith frame as an example.
Using the texture image of the current frame to render the three-dimensional face grid of the next frame to obtain a composite image of the current frame, wherein the expression formula is as follows:
Figure BDA0002873472560000074
wherein the function FRAnd the rendering device is an arbitrary differentiable renderer and is used for projecting the next frame of three-dimensional face mesh to the current frame of face image, wherein the next frame of three-dimensional face mesh is projected to the current frame of face image because the next frame of three-dimensional face mesh has the two-dimensional image characteristics (the vertex in the current frame of face image) of the ith frame of face image, and the obtained current frame of composite image has the information of the next frame of three-dimensional face parameters.
In particular, in an example, the differentiable renderer is implemented by a neural network. The neural network comprises two stages, namely a training stage and a testing stage, wherein the training stage is used for constructing the neural network with parameters, and the testing stage is a network forward process using the neural network with parameters as a rendering function. That is, the current frame composite image is acquired through the testing stage of the neural network.
Since the UV texture can be kept consistent in the three-dimensional mesh space, it can be seen from the above that, after the i-th frame three-dimensional face mesh is transformed by the i + 1-th frame three-dimensional face parameter, the UV texture still retains the two-dimensional image feature of the i-th frame face image, that is, the two-dimensional image texture consistency.
As described above, after the time domain constraint is added to the three-dimensional face parameter set, the three-dimensional face parameter set is optimized. In some embodiments, optimizing the set of three-dimensional face parameters comprises: and acquiring an optical flow from the current frame synthetic image to the next frame face image, and modifying the projection from the vertex of the next frame three-dimensional face grid to the plane of the next frame face image through the optical flow to achieve an optimization target, so that the texture of the next frame three-dimensional face parameter and the texture of the next frame face image in a time domain meet a preset condition.
The description is continued by taking the current frame as the ith frame as an example.
Synthesizing an image from an ith frame
Figure BDA0002873472560000081
To the (I + 1) th frame of face image Ii+1The optical flow of (A) is:
Figure BDA0002873472560000082
where f is an arbitrary existing optical flow operator, and for an arbitrary pixel position a of the I +1 th frame two-dimensional image plane, the coordinate is (x, y), the optical flow operator is configured to modify the pixel position a to a '(the coordinate is (x' ═ x + u, y '═ y + v)) by mapping I (x, y) → I' (x + u, y + v), so as to ensure continuity between the I-th frame two-dimensional image and the I +1 th frame two-dimensional image.
In the above formulaOptical flow map F modified differentiable renderer FRRendered current frame composite image
Figure BDA0002873472560000091
And the next frame of face image Ii+1Further correcting the three-dimensional face parameters theta of the current frameiExpression parameter beta iniAnd pose parameter si,Ri,ti
And the optimization target is used for quantifying the three-dimensional face parameters of the next frame in the time domain and the texture of the face image of the next frame to meet the preset condition.
In some embodiments, the optimization goal includes a first sub-goal and a second sub-goal, wherein the first sub-goal is used for characterizing a texture continuum, wherein the texture continuum is calculated by the position of the nth vertex of the next frame of three-dimensional face mesh under the action of the optimized next frame of three-dimensional face parameters under the coordinates of the next frame of face image and the position of the nth vertex of the unoptimized next frame of three-dimensional face mesh under the coordinates of the next frame of face image after optical flow modification. And the second sub-target is used for representing a Z coordinate smoothing item, wherein the Z coordinate smoothing item is calculated by the Z coordinate value of the mth three-dimensional face key point under the action of the optimized three-dimensional face parameter of the next frame and the Z coordinate value of the mth three-dimensional face key point under the action of the three-dimensional face parameter of the current frame.
Specifically, the calculation formula of the optimization objective is as follows:
Figure BDA0002873472560000092
wherein the first sub-target is used for representing texture continuation item
Figure BDA0002873472560000093
The calculation formula is the following formula 1:
Figure BDA0002873472560000094
wherein the content of the first and second substances,
Figure BDA0002873472560000095
Figure BDA0002873472560000096
Figure BDA0002873472560000097
representing the three-dimensional face parameters in the optimized next frame
Figure BDA0002873472560000098
Under the action of the three-dimensional face mesh, the nth vertex of the next frame of three-dimensional face mesh is at the position of the next frame of face image under the coordinate.
Figure BDA0002873472560000099
And the position of the nth vertex of the unoptimized next frame of three-dimensional face mesh under the coordinates of the next frame of face image after optical flow correction is shown. Wherein the content of the first and second substances,
Figure BDA00028734725600000910
it has been calculated in advance from frame 0 and remains constant during the optimization.
And m represents the number of the vertexes of the three-dimensional face mesh of the next frame.
Texture continuation term obtained according to the above equation 1
Figure BDA0002873472560000101
The smaller the value of (a), the three-dimensional face parameter theta of the next frame is representedi+1The better the texture continuity in the temporal domain with the next frame of face image.
The second sub-target is used for representing Z coordinate smooth item
Figure BDA0002873472560000108
Is calculated by the formulaThe following equation 2:
Figure BDA0002873472560000102
wherein the content of the first and second substances,
Figure BDA0002873472560000103
Figure BDA0002873472560000104
representing the three-dimensional face parameters in the optimized next frame
Figure BDA0002873472560000105
Z component (i.e., Z coordinate value) of the mth three-dimensional face key point under the action of (1);
Figure BDA0002873472560000106
three-dimensional human face parameters theta expressed in the current framei(i.e., the original unprocessed three-dimensional face parameters Θ of the current framei) And Z component of the mth three-dimensional face key point under action. k is a radical ofmAnd representing the index of the mth three-dimensional face key point on the grid vertex.
According to the formula 2, the smaller the value of the obtained Z coordinate smoothing term is, the smoother the Z coordinate representing the three-dimensional face parameter of the next frame is.
Further, in some embodiments, the sum of the value of the texture continuation term and the value of the Z-coordinate smoothing term (i.e., the sum of the values of the texture continuation term and the Z-coordinate smoothing term)
Figure BDA0002873472560000107
) And if the value is less than the preset value, the continuity of the three-dimensional face parameter of the next frame and the texture of the face image of the next frame in the time domain meets the preset continuity condition, and the Z coordinate of the three-dimensional face parameter of the next frame meets the preset smoothing condition.
In some embodiments, the method further comprises adding a two-dimensional keypoint constraint corresponding to the three-dimensional face to the set of three-dimensional face parameters. Due to the fact that corresponding two-dimensional key point constraints are added to the three-dimensional face parameter set, the consistency of the two-dimensional face image and the three-dimensional face is guaranteed, and the accuracy of three-dimensional face reconstruction is improved. The two-dimensional face key point constraint is to perform related operations by taking a key point in a certain face image as a reference. For example, a two-dimensional key point constraint in the current frame face image is added to the current frame three-dimensional face parameters.
At this time, optimizing the three-dimensional face parameter set further includes: and optimizing the three-dimensional face parameter set through two-dimensional key point constraint so as to realize the consistency of the three-dimensional face parameter set and the two-dimensional face image. For example, the three-dimensional face parameters of the current frame are optimized by taking key points in the face image of the current frame as a reference, so that the consistency between the face parameters of the current frame and the corresponding two-dimensional face image of the current frame is realized.
Under the condition of ensuring the consistency of the two-dimensional face image and the three-dimensional face, the optimization target further comprises a third sub-target, the third sub-target is used for representing the two-dimensional key point constraint of the current frame, and the calculation formula is as follows:
Figure BDA0002873472560000111
wherein s is2d,R,t2dRepresenting weak perspective camera parameters, KmRepresenting the m-th two-dimensional keypoint,
Figure BDA0002873472560000112
represents an orthogonal projection matrix under an orthogonal projection,
Figure BDA0002873472560000113
representing the three-dimensional face parameters in the optimized next frame
Figure BDA0002873472560000114
The coordinates of the mth three-dimensional face key point under the action of (1).
Specifically, in an embodiment where the optimization objective includes a first sub-objective, a second sub-objective, and a third sub-objective, that is, the calculation formula of the optimization objective at this time is as follows:
Figure BDA0002873472560000115
the optimization process comprises the following steps: firstly, a coordinate ascent method is used for respectively solving the pose parameters gammai=(si,Ri,ti) And expression parameter betaiAnd then jointly optimized with the two-dimensional key point constraint through the texture consistency constraint. It is given by the following pseudo code:
Figure BDA0002873472560000121
fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present invention. The electronic device 400 includes a memory 410 and a processor 420. The memory 410 is coupled to the processor 420.
Memory 410 may include read-only memory and/or random access memory, etc., and provides instructions and data to processor 420. A portion of the memory 410 may also include non-volatile random access memory (NVRAM). The memory 410 stores elements, executable modules or data structures, or subsets thereof, or expanded sets thereof: the operation instructions comprise various operation instructions for realizing various operations; an operating system, including various system programs, is used to implement various basic services and to handle hardware-based tasks.
In a particular application, the various components of the terminal are coupled together by a bus 430, where the bus 430 may include a power bus, a control bus, a status signal bus, and the like, in addition to a data bus. But for clarity of illustration the various busses are labeled in the figures as bus 430.
In some embodiments, processor 420, by invoking instructions stored by memory 410, may perform the following operations:
reconstructing a three-dimensional face in a video sequence to obtain a three-dimensional face parameter set; and
adding time domain constraint for the three-dimensional face parameter set by using the two-dimensional face image texture corresponding to the three-dimensional face, and optimizing the three-dimensional face parameter set so that the three-dimensional face parameter set and the two-dimensional face image texture meet preset conditions in the time domain, wherein the time domain constraint is represented by the consistency of the two-dimensional image texture.
Processor 420 may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware or instructions in the form of software in the processor 420. The processor 420 may be a general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components. The various methods, steps and logic blocks disclosed in the embodiments of the present invention may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present invention may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The storage medium is located in the memory 410, and the processor 420 reads the information in the memory 410 and performs the steps of the above method in combination with the hardware thereof.
The present invention also provides an embodiment of a non-volatile storage medium, as shown in fig. 5, the non-volatile storage medium 500 stores instructions 501 executable by a processor, and the instructions 501 are used for executing the method in the above embodiment. Specifically, the storage medium 500 may be specifically the memory 410 shown in fig. 4 or be a part of the memory 410.
It will be apparent to those skilled in the art that many modifications and variations can be made in the devices and methods while maintaining the teachings of the present application. Accordingly, the above disclosure should be considered limited only by the scope of the following claims.

Claims (10)

1. A three-dimensional face processing method is characterized by comprising the following steps:
reconstructing a three-dimensional face in a video sequence to obtain a three-dimensional face parameter set; and
adding time domain constraint to the three-dimensional face parameter set by using the two-dimensional face image texture corresponding to the three-dimensional face, and optimizing the three-dimensional face parameter set so that the three-dimensional face parameter set and the two-dimensional face image texture meet preset conditions in the time domain, wherein the time domain constraint is represented by the consistency of the two-dimensional image texture.
2. The three-dimensional face processing method according to claim 1, wherein the video sequence comprises a current frame face image and a next frame face image, and the three-dimensional face parameter set comprises a current frame three-dimensional face parameter used for representing a current frame three-dimensional face mesh and a next frame three-dimensional face parameter used for representing a next frame three-dimensional face mesh;
adding a time domain constraint to the three-dimensional face parameter set by using a two-dimensional image texture corresponding to the three-dimensional face, wherein the adding comprises:
acquiring a current frame texture image according to the current frame face image and the current frame three-dimensional face grid;
and rendering the next three-dimensional face grid by using the current frame texture image so as to project the next three-dimensional face grid to the current frame face image to obtain a current frame synthetic image.
3. The three-dimensional face processing method of claim 2, wherein said optimizing said set of three-dimensional face parameters comprises:
and acquiring an optical flow from the current frame synthetic image to the next frame face image, and correcting the projection from the vertex of the next frame three-dimensional face grid to the plane of the next frame face image through the optical flow to achieve an optimization target, so that the next frame three-dimensional face parameter and the texture of the next frame face image in the time domain meet the preset condition.
4. A three-dimensional face processing method as claimed in claim 3, characterized in that the optimization objective comprises a first sub-objective and a second sub-objective;
the first sub-target is used for representing a texture continuous item, wherein the texture continuous item is calculated by the position of the nth vertex of the next frame of three-dimensional face mesh under the action of the optimized next frame of three-dimensional face parameter under the coordinate of the next frame of face image and the position of the nth vertex of the unoptimized next frame of three-dimensional face mesh under the coordinate of the next frame of face image after optical flow modification;
the second sub-target is used for representing a Z coordinate smoothing item, wherein the Z coordinate smoothing item is calculated by a Z coordinate value of an m-th three-dimensional face key point under the action of the optimized next frame of three-dimensional face parameters and a Z coordinate value of an m-th three-dimensional face key point under the action of the current frame of three-dimensional face parameters.
5. The three-dimensional face processing method according to claim 4, wherein a sum of a value of the texture continuation term and a value of the Z-coordinate smoothing term is smaller than a preset value, then the continuity of the next frame of three-dimensional face parameter and the texture of the next frame of face image in the time domain satisfies a preset continuity condition, and the Z-coordinate of the next frame of three-dimensional face parameter satisfies a preset smoothing condition.
6. A three-dimensional face processing method as claimed in claim 2, characterized in that the current frame composite image is obtained using an arbitrary differentiable renderer.
7. A three-dimensional face processing method as claimed in claim 3, further comprising:
adding two-dimensional key point constraints corresponding to the three-dimensional face parameter set;
said optimizing said three-dimensional set of face parameters further comprises:
and optimizing the three-dimensional face parameter set through the two-dimensional key point constraint so as to realize the consistency of the three-dimensional face parameter set and the two-dimensional face image.
8. The three-dimensional face processing method as claimed in claim 7, wherein the optimization objective further comprises a third sub-objective, the third sub-objective being used to represent a current frame two-dimensional keypoint constraint.
9. An electronic device comprising a processor and a memory, the memory storing instructions that, when executed, cause the processor to perform the three-dimensional face processing method of any one of claims 1-8.
10. A non-transitory computer storage medium having stored thereon instructions that, when executed, cause a processor to perform the three-dimensional face processing method according to any one of claims 1 to 8.
CN202011643370.9A 2020-12-30 2020-12-30 Three-dimensional face processing method and electronic equipment Pending CN112734895A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011643370.9A CN112734895A (en) 2020-12-30 2020-12-30 Three-dimensional face processing method and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011643370.9A CN112734895A (en) 2020-12-30 2020-12-30 Three-dimensional face processing method and electronic equipment

Publications (1)

Publication Number Publication Date
CN112734895A true CN112734895A (en) 2021-04-30

Family

ID=75609240

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011643370.9A Pending CN112734895A (en) 2020-12-30 2020-12-30 Three-dimensional face processing method and electronic equipment

Country Status (1)

Country Link
CN (1) CN112734895A (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1607551A (en) * 2003-08-29 2005-04-20 三星电子株式会社 Method and apparatus for image-based photorealistic 3D face modeling
CN1920886A (en) * 2006-09-14 2007-02-28 浙江大学 Video flow based three-dimensional dynamic human face expression model construction method
CN101751689A (en) * 2009-09-28 2010-06-23 中国科学院自动化研究所 Three-dimensional facial reconstruction method
US20180012407A1 (en) * 2016-07-08 2018-01-11 Microsoft Technology Licensing, Llc Motion Capture and Character Synthesis
CN109035388A (en) * 2018-06-28 2018-12-18 北京的卢深视科技有限公司 Three-dimensional face model method for reconstructing and device
CN109584353A (en) * 2018-10-22 2019-04-05 北京航空航天大学 A method of three-dimensional face expression model is rebuild based on monocular video
CN111294665A (en) * 2020-02-12 2020-06-16 百度在线网络技术(北京)有限公司 Video generation method and device, electronic equipment and readable storage medium
CN111368137A (en) * 2020-02-12 2020-07-03 百度在线网络技术(北京)有限公司 Video generation method and device, electronic equipment and readable storage medium

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1607551A (en) * 2003-08-29 2005-04-20 三星电子株式会社 Method and apparatus for image-based photorealistic 3D face modeling
CN1920886A (en) * 2006-09-14 2007-02-28 浙江大学 Video flow based three-dimensional dynamic human face expression model construction method
CN101751689A (en) * 2009-09-28 2010-06-23 中国科学院自动化研究所 Three-dimensional facial reconstruction method
US20180012407A1 (en) * 2016-07-08 2018-01-11 Microsoft Technology Licensing, Llc Motion Capture and Character Synthesis
CN109035388A (en) * 2018-06-28 2018-12-18 北京的卢深视科技有限公司 Three-dimensional face model method for reconstructing and device
CN109584353A (en) * 2018-10-22 2019-04-05 北京航空航天大学 A method of three-dimensional face expression model is rebuild based on monocular video
CN111294665A (en) * 2020-02-12 2020-06-16 百度在线网络技术(北京)有限公司 Video generation method and device, electronic equipment and readable storage medium
CN111368137A (en) * 2020-02-12 2020-07-03 百度在线网络技术(北京)有限公司 Video generation method and device, electronic equipment and readable storage medium

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
SÁNDOR FAZEKAS ET AL: "Analysis and performance evaluation of optical flow features for dynamic texture recognition", 《SIGNAL PROCESSING: IMAGE COMMUNICATION》 *
吕海清 等: "真实感三维人脸建模技术综述", 《软件导刊》, vol. 17, no. 1 *
张剑;: "融合SFM和动态纹理映射的视频流三维表情重建", 计算机辅助设计与图形学学报, no. 06 *

Similar Documents

Publication Publication Date Title
Patwardhan et al. Video inpainting under constrained camera motion
CN112802173B (en) Relightable texture for use in rendering images
US7564457B2 (en) Shot shading method and apparatus
US8436867B1 (en) System and method for generating computer graphic images by identifying variant and invariant shader arguments
US11704853B2 (en) Techniques for feature-based neural rendering
WO2018039936A1 (en) Fast uv atlas generation and texture mapping
GB2606785A (en) Adaptive convolutions in neural networks
CN116109757A (en) Hash coding dynamic three-dimensional human body rendering synthesis method based on inner hidden coordinates
CN114972634A (en) Multi-view three-dimensional deformable human face reconstruction method based on feature voxel fusion
US7129940B2 (en) Shot rendering method and apparatus
CN112734895A (en) Three-dimensional face processing method and electronic equipment
US10922872B2 (en) Noise reduction on G-buffers for Monte Carlo filtering
CN111627098A (en) Method and device for identifying water flow area in image and generating dynamic water flow video
Lv et al. Generating smooth and facial-details-enhanced talking head video: A perspective of pre and post processes
CN113506220B (en) Face gesture editing method and system driven by 3D vertex and electronic equipment
CN115035230B (en) Video rendering processing method, device and equipment and storage medium
Rochette et al. Human pose manipulation and novel view synthesis using differentiable rendering
CN116883231B (en) Image data generation method, device and equipment of fisheye camera and storage medium
CN117496074B (en) Efficient three-dimensional scene reconstruction method suitable for rapid movement of camera
Feng et al. R2Human: Real-Time 3D Human Appearance Rendering from a Single Image
Bakken Using synthetic data for planning, development and evaluation of shape-from-silhouette based human motion capture methods
Kim et al. DreamSampler: Unifying Diffusion Sampling and Score Distillation for Image Manipulation
Visutsak Grid Transformation for Image Registration and Morphing
BR102021025992A2 (en) COMPUTER-IMPLEMENTED METHOD AND SYSTEM TO CLASSIFY AN INPUT PICTURE FOR SYNTHESIS OF NEW VIEWS IN A 3D VISUAL EFFECT, AND, NON-TRANSITORY COMPUTER READABLE STORAGE MEDIA
RU2560086C1 (en) System and method for video temporary complement

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination