CN115205487A

CN115205487A - Monocular camera face reconstruction method and device

Info

Publication number: CN115205487A
Application number: CN202210509305.XA
Authority: CN
Inventors: 徐枫; 凌精望
Original assignee: Tsinghua University
Current assignee: Tsinghua University
Priority date: 2022-05-10
Filing date: 2022-05-10
Publication date: 2022-10-18

Abstract

The application relates to the technical field of three-dimensional vision and graphics, in particular to a monocular camera face reconstruction method and a monocular camera face reconstruction device, wherein the method comprises the following steps: acquiring face videos of a user face at different angles by using a monocular camera; inputting the face video into a preset expanded neuro-implicit function to obtain face geometric representation of the face of the user; aligning and deforming each frame of face in the face video based on the hidden vector of the preset extended nerve hidden function, and deforming the face geometric representation by using the preset hidden function to generate a face geometric reconstruction result of the face of the user. Therefore, the problems that in the related technology, when a reconstruction result is restrained by face prior, geometric defects easily occur, and particularly when the face geometry is further deformed, reconstruction with higher precision exceeding the face prior cannot be performed, and face reconstruction with higher precision cannot be performed from low-cost monocular camera acquisition equipment are solved.

Description

Monocular camera face reconstruction method and device

Technical Field

The application relates to the technical field of three-dimensional vision and graphics, in particular to a monocular camera face reconstruction method and device.

Background

In the related technology, the three-dimensional coordinates of each face feature point can be determined according to the position of the face feature point and the depth image, so that the area where the face is located is determined, the non-face area is removed, each point cloud in the area where the face is located corresponds to the global model, and then three-dimensional face reconstruction is achieved.

However, in the related art, when the reconstruction result is constrained by the face prior, geometric defects easily occur, and particularly when the face geometry is further deformed, higher-precision reconstruction exceeding the face prior cannot be performed, so that higher-precision face reconstruction cannot be performed from cheap monocular camera acquisition equipment, the face geometric reconstruction requirement cannot be met, and a solution is urgently needed.

Disclosure of Invention

The present application is based on the recognition by the inventors that:

the neuro implicit function is a geometrical representation method emerging in recent years, and has richer geometrical expression capability than a three-dimensional grid. It is in the form of a multi-layered perceptron for modeling the distance field of the space in which the object is located. Its input is a three-dimensional query point, and the directional distance from the query point to the surface of the object is output. Through the neural hidden function, a volume rendering algorithm can query the passing point of a ray emitted from the camera, so that the rendering result of an object represented by the neural hidden function on a specified camera can be subjected to derivative rendering. Because the rendering process is conductive, the rendering error of the camera space can transmit the gradient back to the neural implicit function for geometric representation optimization. The neuro-implicit function can be expanded, and is changed from representing an object to representing a family of similar objects, the method is that not only the coordinates of the query point but also an implicit vector representing a certain object are input, so that the multilayer perceptron outputs the position of the query point, and the method has great significance for reconstructing an accurate three-dimensional face from a cheap acquisition device for the directed distance of the object.

The application provides a monocular camera face reconstruction method and a monocular camera face reconstruction device, and aims to solve the problems that in the related art, when a reconstruction result is constrained by face prior, geometric defects easily occur, and particularly when the face geometry is further deformed, higher-precision reconstruction exceeding the face prior cannot be performed, higher-precision face reconstruction cannot be performed from low-cost monocular camera acquisition equipment, and the like.

An embodiment of a first aspect of the present application provides a monocular camera face reconstruction method, including the following steps: acquiring face videos of a user face at different angles by using a monocular camera; inputting the face video into a preset expanded neural implicit function to obtain face geometric representation of the face of the user; aligning and deforming each frame of face in the face video based on the hidden vector of the preset extended neuro-hidden function, and deforming the face geometric representation by using the preset hidden function to generate a face geometric reconstruction result of the face of the user.

Optionally, in an embodiment of the present application, before inputting the face data into the preset extended neuro-implicit function, the method further includes: acquiring face three-dimensional scanning data of the face of the user; and encoding the three-dimensional information in the human face three-dimensional scanning data into the preset expanded neuro-implicit function so that the preset expanded neuro-implicit function outputs the three-dimensional human face in the space when the implicit vector is input, and the directed distance of the query point is obtained.

Optionally, in an embodiment of the present application, the method of the embodiment of the present application further includes: and converting the face geometric reconstruction result into a three-dimensional face grid.

Optionally, in an embodiment of the present application, the converting the geometric face reconstruction result into a three-dimensional face mesh includes: sampling a plurality of query points in three-dimensional space; acquiring a deformation value of each query point based on the preset implicit function, and deforming each query point based on the deformation value to obtain a plurality of deformed query points; and obtaining the directional distance of each query point based on the preset expanded neuro-implicit function, and obtaining the three-dimensional face mesh according to the directional distance of each query point.

An embodiment of a second aspect of the present application provides a monocular camera face reconstruction device, including: the first acquisition module is used for acquiring face videos of a user face at different angles by using a monocular camera; the input module is used for inputting the face video into a preset expanded neuro-implicit function to obtain the face geometric representation of the face of the user; and the reconstruction module is used for aligning and deforming each frame of face in the face video based on the hidden vector of the preset expanded nerve hidden function, and generating a face geometric reconstruction result of the face of the user by utilizing the preset hidden function to deform the face geometric representation.

Optionally, in an embodiment of the present application, the apparatus in the embodiment of the present application further includes: the second acquisition module is used for acquiring the face three-dimensional scanning data of the face of the user; and the coding module is used for coding the three-dimensional information in the human face three-dimensional scanning data into the preset expanded neural implicit function so as to output the hidden vector to the space where the three-dimensional human face is located when the preset expanded neural implicit function inputs the hidden vector, and obtain the directed distance of the query point.

Optionally, in an embodiment of the present application, the apparatus of the embodiment of the present application further includes: and the conversion module is used for converting the face geometric reconstruction result into a three-dimensional face grid.

Optionally, in an embodiment of the present application, the conversion module includes: the sampling unit is used for sampling a plurality of query points in a three-dimensional space; the processing unit is used for acquiring a deformation value of each query point based on the preset implicit function, and deforming each query point based on the deformation value to obtain a plurality of deformed query points; and the generating unit is used for obtaining the directed distance of each query point based on the preset expanded neural implicit function and obtaining the three-dimensional face grid according to the directed distance of each query point.

An embodiment of a third aspect of the present application provides an electronic device, including: the system comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor executes the program to realize the monocular camera face reconstruction method according to the embodiment.

A fourth aspect of the present application provides a computer-readable storage medium, which stores computer instructions for causing the computer to execute the monocular camera face reconstruction method according to the foregoing embodiment.

The embodiment of the application can align and deform each frame of face in a face video based on the hidden vector of the preset expanded neuro-implicit function, the neuro-implicit function is used as the three-dimensional representation of the face, the preset implicit function is used for deforming the geometric representation of the face, so that the face geometric reconstruction result of the face of a user is generated, the constraint is provided for geometric optimization by using the RGB (Red Green Blue, three primary colors) value of a monocular camera, and the high-precision face reconstruction can be performed from low-cost monocular camera acquisition equipment by enhancing the expression capability of the geometric representation. Therefore, the problems that in the related technology, when a reconstruction result is restrained by face prior, geometric defects easily occur, and particularly when the face geometry is further deformed, reconstruction with higher precision exceeding the face prior cannot be performed, and face reconstruction with higher precision cannot be performed from low-cost monocular camera acquisition equipment are solved.

Additional aspects and advantages of the present application will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the present application.

Drawings

The foregoing and/or additional aspects and advantages of the present application will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:

fig. 1 is a flowchart of a monocular camera face reconstruction method according to an embodiment of the present application;

fig. 2 is a schematic structural diagram of a monocular camera face reconstruction method and apparatus according to an embodiment of the present application.

Fig. 3 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

Reference will now be made in detail to embodiments of the present application, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the accompanying drawings are illustrative and intended to explain the present application and should not be construed as limiting the present application.

The monocular camera face reconstruction method and apparatus according to the embodiment of the present application are described below with reference to the drawings. In the method, the embodiment can align and deform each frame of face in a face video based on a hidden vector of a preset extended neuro-implicit function, use the neuro-implicit function as three-dimensional representation of the face, and use the preset implicit function to deform the geometric representation of the face, thereby generating a face geometric reconstruction result of the face of a user, further realizing providing constraint for geometric optimization by using RGB (Red Green Blue) values of a monocular camera, and further realizing high-precision face reconstruction from low-cost monocular camera acquisition equipment by enhancing the expression capability of the geometric representation. Therefore, the problems that in the related technology, when a reconstruction result is restrained by face prior, geometric defects easily occur, and particularly when the face geometry is further deformed, reconstruction with higher precision exceeding the face prior cannot be performed, and face reconstruction with higher precision cannot be performed from low-cost monocular camera acquisition equipment are solved.

Specifically, fig. 1 is a schematic flow chart of a monocular camera face reconstruction method according to an embodiment of the present disclosure.

As shown in fig. 1, the monocular camera face reconstruction method includes the following steps:

in step S101, facial videos of a user with different faces at different angles are acquired by a monocular camera.

It can be understood that, in the embodiment of the application, the monocular camera can be used to obtain face videos of the user's face at different angles, for example, to obtain face three-dimensional scanning data of the user's face, and learn face prior modeling from the face three-dimensional scanning data set, which is in the form of a multilayer perceptron, so that the monocular camera can be used to model a distance field of a space where an object is located, thereby effectively avoiding geometric defects.

In step S102, the face video is input to a preset extended neuro-implicit function to obtain a face geometric representation of the user' S face.

Specifically, the face prior can be learned from the three-dimensional face scanning data set in a data-driven mode, so that a reconstruction result can be constrained in a geometric space where a face according with biological rules is located, the face three-dimensional scanning data of the acquired face of the user is input to a preset expanded neural implicit function, face geometric representation of the face of the user is obtained, and reconstruction of the three-dimensional face is more accurate by enhancing the expression capacity of the geometric representation.

Optionally, in an embodiment of the present application, before inputting the face data into the preset extended neuro-implicit function, the method further includes: acquiring face three-dimensional scanning data of a user face; and coding three-dimensional information in the three-dimensional scanning data of the face into a preset expanded neuro-implicit function so as to output the hidden vector to the space where the three-dimensional face is located when the preset expanded neuro-implicit function inputs the hidden vector, and acquiring the directed distance of the query point.

It can be understood that, when face prior modeling is performed in the embodiment of the application, by acquiring face three-dimensional scanning data of a user face, three-dimensional information in the face three-dimensional scanning data is encoded in a preset extended neuro-implicit function a through training of a multilayer perceptron, so that the preset extended neuro-implicit function can be output to a space where a certain three-dimensional face is located when a hidden vector and a coordinate of a query point are input, and further, a directed distance of the query point is acquired, and reconstruction of the three-dimensional face is more accurate.

In step S103, each frame of face in the face video is aligned and deformed based on the hidden vector of the preset extended neuro-hidden function, and the preset hidden function is used to deform the geometric representation of the face, so as to generate a face geometric reconstruction result of the face of the user.

As a possible implementation manner, in the embodiment of the present application, when performing monocular camera video three-dimensional reconstruction, a user is required to keep a face still, and a monocular camera (for example, a front camera of a mobile phone) is used to capture a video of a face of the user observed from different angles, and is used as an input of the monocular camera video three-dimensional reconstruction, and the hidden vector input by the preset extended neuro-implicit function a obtained by performing guided rendering and fitting face prior modeling is further roughly aligned and deformed for each frame of face of the video, and then another preset extended neuro-implicit function B network is trained to further deform the face geometry represented by a, so as to generate a face geometry reconstruction result of the face of the user, thereby achieving reconstruction with higher precision.

Further, in an embodiment of the present application, the method of the embodiment of the present application further includes: and converting the face geometric reconstruction result into a three-dimensional face grid.

It can be understood that, in order to introduce the reconstructed face into animation software or a game engine for use, the following method is used to convert the geometric reconstruction result of the face into a three-dimensional face mesh.

In an embodiment of the present application, converting a geometric face reconstruction result into a three-dimensional face mesh includes: sampling a plurality of query points in three-dimensional space; acquiring a deformation value of each query point based on a preset implicit function, and deforming each query point based on the deformation value to obtain a plurality of deformed query points; and obtaining the directed distance of each query point based on a preset expanded neuro-implicit function, and obtaining a three-dimensional face grid according to the directed distance of each query point.

In the actual execution process, the query points conforming to the Marching Cube format are sampled in the three-dimensional space, each query point is subjected to preset expanded implicit function B to obtain a deformation value, each query point is deformed based on the deformation value, the deformed query points are subjected to preset expanded implicit function A to obtain the directional distance of each query point, the directional distance of each query point is input into a Marching Cube algorithm, then the converted three-dimensional face mesh is obtained, and the practicability of face geometric reconstruction is effectively improved.

The working principle of the embodiment of the present application is described in detail with a specific embodiment.

Step S1: face prior modeling. The method comprises the steps of learning face prior from a face three-dimensional scanning data set, coding the face prior in a preset extended implicit function A, and in order to enable a reconstruction result to be constrained in a geometric space where a face according with a biological rule is located, learning the face prior from the three-dimensional face scanning data set in a data driving mode.

Step S1.1: the grid data is scanned using a pre-acquired or published three-dimensional face and aligned substantially rigidly by labeling feature points at the eyes, nose and mouth corners.

Step S1.2: and scanning each three-dimensional grid, randomly sampling some query points in a three-dimensional space, and recording the directed distance from the query points to the three-dimensional grid.

Step S1.3: and allocating a randomly initialized hidden vector to each three-dimensional scanning datum.

Step S1.4: inputting each query point and the corresponding scanned hidden vector into a multilayer perceptron of a preset expanded hidden function A, comparing an output distance value of the multilayer perceptron with a real directed distance value, thereby establishing a loss function and calculating a gradient, transmitting the gradient back to a parameter of the multilayer perceptron and the input hidden vector, and further iteratively optimizing the parameter by a first-order optimization method to reduce a loss function value.

Step S1.5: and repeating the steps until the loss function converges.

Step S1.6: when the face prior modeling is completed, a hidden vector and a query point are input, and the extended hidden function A can correctly output the directed distance of the query point to a legal face, so that the geometric representation of a family of legal faces is realized.

Step S2: and (5) reconstructing a monocular camera video. And optimizing the input implicit vector of the preset extended implicit function A and the preset extended implicit function B, and enabling a rendering result of the reconstructed face to be fitted with an input monocular video, namely performing geometric reconstruction on the face of a section of face video shot by a monocular camera input by a user.

Step S2.1: the user is required to use a monocular camera (such as a front camera of a mobile phone) to shoot a video for observing the face of the user from different angles, and the user keeps the face as still as possible and keeps calm without making any expression.

Step S2.2: each clear frame without motion blur is extracted from the video, the blurred frame is discarded, then the camera pose is obtained by using open-source SFM (motion from motion) software, and an optimization algorithm with two steps is used for face reconstruction.

Step S2.3: the optimization object of the first step is an input hidden vector of a preset extended hidden function A.

Step S2.3.1: and according to the obtained camera pose, randomly emitting some rays from the camera of each frame, inputting the query points on the rays and the input hidden vectors into a preset expanded hidden function A together, and performing guided voxel rendering to further obtain the rendering color value of the current reconstruction result under the rays.

Step S2.3.2: in order to set an optimization target, the color value of the image pixel corresponding to the light ray is taken out, the absolute error between the rendering color value and the image color value is restricted, and the input hidden vector is optimized by a first-order optimization algorithm until the color value error is converged.

Step S2.3.3: in the expression space of the preset extended implicit function a, the difference between the reconstruction result and the real three-dimensional face is minimized, but because the deformation expression capability of a is limited, the embodiment of the application performs optimization in step S2.4, and performs free deformation on the basis.

Step S2.4: and secondly, fixing the input implicit vector of the preset expanded implicit function A, and optimizing the weight parameters of the multilayer perceptron of the preset expanded implicit function B.

Step S2.4.1: and (3) emitting light rays for conducting volume-conducting rendering in a manner similar to the step S2.3.1, but when a query point is processed, firstly inputting the query point into the preset expanded implicit function B, obtaining an output deformation value, and applying the output deformation value on the query point to obtain a new query position, so that the new query position is used as the input of the preset expanded implicit function A.

Step S2.4.2: and comparing the color values as an optimization target in a manner similar to step S2.3.2, updating the multilayer perceptron weight parameter of the preset expanded implicit function B by using a first-order optimization algorithm, and repeatedly optimizing until rendering color errors are converged.

Step S2.5: sampling query points conforming to a Marching Cube format in a three-dimensional space, obtaining a deformation value and deforming each query point through a preset expanded implicit function B, obtaining directed distances through a preset expanded implicit function A of the deformed query points, and inputting the directed distances into a Marching Cube algorithm to obtain a converted three-dimensional face grid.

According to the monocular camera face reconstruction method provided by the embodiment of the application, each frame of face in a face video can be aligned and deformed based on the hidden vector of the preset extended neuro-hidden function, the neuro-hidden function is used as the three-dimensional representation of the face, the geometrical representation of the face is deformed by using the preset hidden function, so that the face geometrical reconstruction result of the face of a user is generated, the constraint is provided for geometrical optimization by using the RGB (Red Green Blue, three primary colors) value of the monocular camera, and the higher-precision face reconstruction can be performed from low-cost monocular camera acquisition equipment by enhancing the expression capability of the geometrical representation.

Next, a monocular camera face reconstruction device according to an embodiment of the present application will be described with reference to the drawings.

Fig. 2 is a block diagram of a monocular camera face reconstruction device according to an embodiment of the present application.

As shown in fig. 2, the monocular camera face reconstruction device 10 includes: a first acquisition module 100, an input module 200, and a reconstruction module 300.

Specifically, the first obtaining module 100 is configured to obtain face videos of a user, where the face is at different angles, by using a monocular camera.

And the input module 200 is configured to input the face video to a preset extended neuro-implicit function to obtain a face geometric representation of the face of the user.

And the reconstruction module 300 is configured to align and deform each frame of face in the face video based on the hidden vector of the preset extended neuro-hidden function, and deform the geometric representation of the face by using the preset hidden function to generate a face geometric reconstruction result of the face of the user.

Optionally, in an embodiment of the present application, the apparatus 10 of the embodiment of the present application further includes: a second obtaining module and an encoding module.

The second acquisition module is used for acquiring the face three-dimensional scanning data of the face of the user.

And the coding module is used for coding the three-dimensional information in the three-dimensional scanning data of the face into the preset expanded neuro-implicit function so as to output the hidden vector to the space where the three-dimensional face is located when the preset expanded neuro-implicit function inputs the hidden vector, and obtain the directed distance of the query point.

Optionally, in an embodiment of the present application, the apparatus 10 of the embodiment of the present application further includes: and a conversion module.

The conversion module is used for converting the face geometric reconstruction result into a three-dimensional face mesh.

Optionally, in an embodiment of the present application, the conversion module includes: the device comprises a sampling unit, a processing unit and a generating unit.

The sampling unit is used for sampling a plurality of query points in a three-dimensional space.

And the processing unit is used for acquiring the deformation value of each query point based on the preset implicit function, deforming each query point based on the deformation value and obtaining a plurality of deformed query points.

And the generating unit is used for obtaining the directional distance of each query point based on the preset expanded neuro-implicit function and obtaining the three-dimensional face mesh according to the directional distance of each query point.

It should be noted that the foregoing explanation on the embodiment of the monocular camera face reconstruction method is also applicable to the monocular camera face reconstruction device of this embodiment, and is not repeated herein.

According to the monocular camera face reconstruction device provided by the embodiment of the application, each frame of face in a face video can be aligned and deformed based on the hidden vector of the preset extended neuro-hidden function, the neuro-hidden function is used as the three-dimensional representation of the face, the geometrical representation of the face is deformed by using the preset hidden function, so that the face geometrical reconstruction result of the face of a user is generated, the constraint is provided for geometrical optimization by using the RGB (Red Green Blue, three primary colors) value of the monocular camera, and the higher-precision face reconstruction can be performed from low-cost monocular camera acquisition equipment by enhancing the expression capability of the geometrical representation.

Fig. 3 is a schematic structural diagram of an electronic device according to an embodiment of the present application. The electronic device may include:

a memory 301, a processor 302, and a computer program stored on the memory 301 and executable on the processor 302.

The processor 302, when executing the program, implements the monocular camera face reconstruction method provided in the above-described embodiments.

Further, the electronic device further includes:

a communication interface 303 for communication between the memory 301 and the processor 302.

A memory 301 for storing computer programs operable on the processor 302.

The memory 301 may comprise high-speed RAM memory, and may also include non-volatile memory (non-volatile memory), such as at least one disk memory.

If the memory 301, the processor 302 and the communication interface 303 are implemented independently, the communication interface 303, the memory 301 and the processor 302 may be connected to each other through a bus and perform communication with each other. The bus may be an Industry Standard Architecture (ISA) bus, a Peripheral Component Interconnect (PCI) bus, an Extended ISA (EISA) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown in FIG. 3, but this does not mean only one bus or one type of bus.

Alternatively, in practical implementation, if the memory 301, the processor 302 and the communication interface 303 are integrated on one chip, the memory 301, the processor 302 and the communication interface 303 may complete communication with each other through an internal interface.

The processor 302 may be a Central Processing Unit (CPU), an Application Specific Integrated Circuit (ASIC), or one or more Integrated circuits configured to implement embodiments of the present Application.

The present embodiment also provides a computer-readable storage medium, on which a computer program is stored, which when executed by a processor, implements the monocular camera face reconstruction method as described above.

In the description herein, reference to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the application. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or N embodiments or examples. Moreover, various embodiments or examples and features of various embodiments or examples described in this specification can be combined and combined by one skilled in the art without being mutually inconsistent.

Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present application, "N" means at least two, e.g., two, three, etc., unless specifically limited otherwise.

Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more N executable instructions for implementing steps of a custom logic function or process, and alternate implementations are included within the scope of the preferred embodiment of the present application in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the embodiments of the present application.

The logic and/or steps represented in the flowcharts or otherwise described herein, e.g., an ordered listing of executable instructions that can be considered to implement logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device. More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or N wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). Additionally, the computer-readable medium could even be paper or another suitable medium upon which the program is printed, as the program can be electronically captured, via for instance optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner if necessary, and then stored in a computer memory.

It should be understood that portions of the present application may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the N steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution system. If implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are well known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.

It will be understood by those skilled in the art that all or part of the steps carried by the method for implementing the above embodiments may be implemented by hardware related to instructions of a program, which may be stored in a computer readable storage medium, and when the program is executed, the program includes one or a combination of the steps of the method embodiments.

In addition, functional units in the embodiments of the present application may be integrated into one processing module, or each unit may exist alone physically, or two or more units are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. The integrated module, if implemented in the form of a software functional module and sold or used as a stand-alone product, may also be stored in a computer readable storage medium.

The storage medium mentioned above may be a read-only memory, a magnetic or optical disk, etc. While embodiments of the present application have been shown and described above, it will be understood that the above embodiments are exemplary and should not be construed as limiting the present application and that changes, modifications, substitutions and alterations in the above embodiments may be made by those of ordinary skill in the art within the scope of the present application.

Claims

1. A monocular camera face reconstruction method is characterized by comprising the following steps:

acquiring face videos of a user face at different angles by using a monocular camera;

inputting the face video into a preset expanded neuro-implicit function to obtain a face geometric representation of the face of the user; and

aligning and deforming each frame of face in the face video based on the hidden vector of the preset expanded neural hidden function, and generating a face geometric reconstruction result of the face of the user by utilizing the preset hidden function to deform the face geometric representation.

2. The method of claim 1, further comprising, before inputting the face data to the preset extended neuro-implicit function:

acquiring face three-dimensional scanning data of the face of the user;

and encoding the three-dimensional information in the human face three-dimensional scanning data into the preset expanded neuro-implicit function so that the preset expanded neuro-implicit function outputs the three-dimensional human face in the space when the implicit vector is input, and the directed distance of the query point is obtained.

3. The method of claim 2, further comprising:

and converting the face geometric reconstruction result into a three-dimensional face grid.

4. The method of claim 3, wherein the converting the face geometric reconstruction result into a three-dimensional face mesh comprises:

sampling a plurality of query points in three-dimensional space;

acquiring a deformation value of each query point based on the preset implicit function, and deforming each query point based on the deformation value to obtain a plurality of deformed query points;

and obtaining the directional distance of each query point based on the preset expanded neuro-implicit function, and obtaining the three-dimensional face mesh according to the directional distance of each query point.

5. A monocular camera face reconstruction device, comprising:

the first acquisition module is used for acquiring face videos of a user face at different angles by using a monocular camera;

the input module is used for inputting the face video into a preset expanded neuro-implicit function to obtain the face geometric representation of the face of the user; and

and the reconstruction module is used for aligning and deforming each frame of face in the face video based on the hidden vector of the preset expanded neural hidden function, deforming the geometric representation of the face by using the preset hidden function and generating a face geometric reconstruction result of the face of the user.

6. The apparatus of claim 5, further comprising:

the second acquisition module is used for acquiring the face three-dimensional scanning data of the face of the user;

and the coding module is used for coding the three-dimensional information in the human face three-dimensional scanning data into the preset expanded neuro-implicit function so as to output the directional distance of the query point in the space where the three-dimensional human face is located when the preset expanded neuro-implicit function inputs the implicit vector.

7. The apparatus of claim 6, further comprising:

and the conversion module is used for converting the face geometric reconstruction result into a three-dimensional face mesh.

8. The apparatus of claim 7, wherein the conversion module comprises:

the sampling unit is used for sampling a plurality of query points in a three-dimensional space;

the processing unit is used for acquiring a deformation value of each query point based on the preset implicit function, and deforming each query point based on the deformation value to obtain a plurality of deformed query points;

9. An electronic device, comprising: a memory, a processor and a computer program stored on the memory and executable on the processor, the processor executing the program to implement the monocular camera face reconstruction method of any one of claims 1-4.

10. A computer-readable storage medium, on which a computer program is stored, which program is executable by a processor for implementing a monocular camera face reconstruction method according to any one of claims 1-4.