CN110807836B

CN110807836B - Three-dimensional face model generation method, device, equipment and medium

Info

Publication number: CN110807836B
Application number: CN202010018284.2A
Authority: CN
Inventors: 林祥凯
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2020-01-08
Filing date: 2020-01-08
Publication date: 2020-05-12
Anticipated expiration: 2040-01-08
Also published as: CN110807836A

Abstract

The application discloses a method, a device, equipment and a medium for generating a three-dimensional face model, and belongs to the technical field of artificial intelligence computer vision. The method comprises the following steps: acquiring an input three-dimensional face grid of a target object and a standard face model corresponding to a standard object; dividing the three-dimensional face grid and the standard face model into at least two face subregions according to a corresponding relation; respectively fitting each face sub-region in the standard face model to a corresponding face sub-region in the three-dimensional face grid; and after the at least two face sub-regions are fitted, carrying out fusion processing on the adjacent face sub-regions to obtain a three-dimensional face model corresponding to the target object. By fitting each divided face sub-region, the quality of the output three-dimensional face model is improved, the phenomenon that the three-dimensional face model generates distortion is avoided, and the three-dimensional expression base derived based on the three-dimensional face model is enabled to better accord with the expression generated by the target object.

Description

Three-dimensional face model generation method, device, equipment and medium

Technical Field

The present application relates to the field of computer vision technology of artificial intelligence, and in particular, to a method, an apparatus, a device, and a medium for generating a three-dimensional face model.

Background

Three-dimensional (3 Dimensions, 3D) face reconstruction refers to a 3D model of a face reconstructed from one or more two-dimensional (2 Dimensions, 2D) images.

In the related art, the three-dimensional face reconstruction is obtained by fitting a three-dimensional face Model of a target object obtained from an image with a standard face Model in a 3D mobile Model (3D deformable Model) library. Through a Deformation Transfer (DT) technology, points on the low-mode point cloud (standard face model) are pulled to a position corresponding to the high-mode point cloud (three-dimensional face model of the target object) by using a correspondence (correspondance) between points in the two models, and other points are obtained by fitting a smoothing term.

Based on the above situation, the three-dimensional face model fitted on the basis of the 3d dm library cannot completely express the human head similar to the human head of the target object, that is, the fitted low model does not correspond to the high model, so that the derived expression base does not conform to the expression generated by the target object.

Disclosure of Invention

The embodiment of the application provides a method, a device, equipment and a medium for generating a three-dimensional face model, so that the three-dimensional face model of a target object, which is fitted on the basis of a 3DMM (three-dimensional motion model) library, is similar to the face of the target object. The technical scheme is as follows:

according to an aspect of the present application, there is provided a method for generating a three-dimensional face model, the method including:

acquiring an input three-dimensional face grid of a target object and a standard face model corresponding to a standard object;

dividing the three-dimensional face grid and the standard face model into at least two face subregions according to a corresponding relation;

acquiring a posture parameter of each face subregion and a local three-dimensional face grid corresponding to the face subregion, wherein the local three-dimensional face grid is a part of the three-dimensional face grid;

calculating error loss when each face subregion is fitted to a face subregion corresponding to the three-dimensional face grid according to the attitude parameter of each face subregion and the local three-dimensional face grid, wherein the error loss comprises vertex loss;

fitting the facial sub-regions to corresponding facial sub-regions in the three-dimensional face mesh when the vertex loss converges;

and after the at least two face sub-regions are fitted, carrying out fusion processing on the adjacent face sub-regions to obtain a three-dimensional face model corresponding to the target object.

According to another aspect of the present application, there is provided a method of generating a three-dimensional object model, the method comprising:

acquiring an input three-dimensional shape grid of a target object and a standard shape model corresponding to a standard object;

dividing the three-dimensional shape grid and the standard shape model into at least two shape subregions according to a corresponding relation;

acquiring a posture parameter of each shape subregion and a local three-dimensional outline grid corresponding to the shape subregion, wherein the local three-dimensional outline grid is a part of the three-dimensional outline grid;

calculating error loss when each shape subregion is fitted to the shape subregion corresponding to the three-dimensional shape mesh according to the attitude parameter of each shape subregion and the local three-dimensional shape mesh, wherein the error loss comprises vertex loss;

fitting the shape sub-regions to corresponding shape sub-regions in the three-dimensional outline mesh when the vertex loss converges;

and after the at least two shape sub-regions are fitted, carrying out fusion processing on the adjacent shape sub-regions to obtain a three-dimensional object model corresponding to the target object.

According to another aspect of the present application, there is provided an apparatus for generating a three-dimensional face model, the apparatus comprising:

the first acquisition module is used for acquiring the three-dimensional face grid of the input target object and a standard face model corresponding to the standard object;

the first processing module is used for dividing the three-dimensional face grid and the standard face model into at least two face sub-regions according to a corresponding relation;

the first obtaining module is configured to obtain a pose parameter of each face subregion and a local three-dimensional face mesh corresponding to the face subregion, where the local three-dimensional face mesh is a part of the three-dimensional face mesh;

a first calculation module, configured to calculate, according to the pose parameter of each face sub-region and the local three-dimensional face mesh, an error loss when each face sub-region is fitted to a face sub-region corresponding to the three-dimensional face mesh, where the error loss includes a vertex loss;

a first fitting module, configured to fit the face sub-region to a corresponding face sub-region in the three-dimensional face mesh when the vertex loss converges;

and the first fusion module is used for performing fusion processing on the adjacent face subregions after the at least two face subregions are fitted to obtain the three-dimensional face model corresponding to the target object.

According to another aspect of the present application, there is provided an apparatus for generating a three-dimensional object model, the apparatus comprising:

the second acquisition module is used for acquiring the input three-dimensional appearance grid of the target object and a standard appearance model corresponding to the standard object;

the second processing module is used for dividing the three-dimensional shape grid and the standard shape model into at least two shape sub-regions according to a corresponding relation;

the second obtaining module is configured to obtain a pose parameter of each shape sub-region and a local three-dimensional outline mesh corresponding to the shape sub-region, where the local three-dimensional outline mesh is a part of the three-dimensional outline mesh;

a second calculation module, configured to calculate, according to the attitude parameter of each shape sub-region and the local three-dimensional shape mesh, an error loss when each shape sub-region is fitted to a shape sub-region corresponding to the three-dimensional shape mesh, where the error loss includes a vertex loss;

a second fitting module for fitting the shape sub-regions to corresponding shape sub-regions in the three-dimensional outline mesh when the vertex loss converges;

and the second fusion module is used for performing fusion processing on the adjacent shape sub-regions after the at least two shape sub-regions are fitted to obtain the three-dimensional object model corresponding to the target object.

According to another aspect of the present application, there is provided a computer device comprising: a processor and a memory, the memory having stored therein at least one instruction, at least one program, a set of codes, or a set of instructions, which is loaded and executed by the processor to implement the method of generating a three-dimensional face model and the method of generating a three-dimensional object model as described above.

According to another aspect of the present application, there is provided a computer-readable storage medium having stored therein at least one instruction, at least one program, code set, or set of instructions that is loaded and executed by a processor to implement the method of generating a three-dimensional face model and the method of generating a three-dimensional object model as described above.

The beneficial effects brought by the technical scheme provided by the embodiment of the application at least comprise:

compared with the method for fitting the whole three-dimensional face grid of the target object with the standard face model, the three-dimensional face grid and the standard face model are divided into at least two face sub-regions according to the corresponding relation, each face sub-region in the standard face model is fitted into the three-dimensional face grid respectively, and the three-dimensional face model corresponding to the target object is obtained by fusing adjacent face sub-regions. By fitting each divided face sub-region, the quality of the output three-dimensional face model is improved, the phenomenon that the three-dimensional face model generates distortion is avoided, and the three-dimensional expression base derived based on the three-dimensional face model is enabled to better accord with the expression generated by the target object.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

FIG. 1 is a schematic diagram of a bilinear 3DMM library provided by an exemplary embodiment of the present application;

FIG. 2 is a schematic diagram of a set of emoticons provided by an exemplary embodiment of the present application;

FIG. 3 is a flow diagram of a process according to the present invention;

FIG. 4 is a schematic diagram of generating a three-dimensional face model and generating a three-dimensional expression base according to an exemplary embodiment of the present application;

FIG. 5 is a flowchart of a method for generating a three-dimensional face model according to an exemplary embodiment of the present application;

FIG. 6 is a flowchart of a method for generating a three-dimensional face model according to another exemplary embodiment of the present application;

FIG. 7 is a schematic diagram of facial sub-region segmentation for a standard face model according to another exemplary embodiment of the present application;

FIG. 8 is a plurality of perspective facial images provided by an exemplary embodiment of the present application;

FIG. 9 is a schematic diagram of a face sub-region and a transition region of a standard face model provided by an exemplary embodiment of the present application;

FIG. 10 is a graphical illustration comparing the effect of a split face sub-region fit to an entire face region fit provided by an exemplary embodiment of the present application;

FIG. 11 is a flowchart of a method for generating a three-dimensional face model according to an exemplary embodiment of the present application;

FIG. 12 is a schematic diagram of a standard face model with and without fusion processing according to an exemplary embodiment of the present application;

FIG. 13 is a flowchart of a method for generating a three-dimensional object model provided by an exemplary embodiment of the present application;

FIG. 14 is a flow chart of a method of voice interaction provided by an exemplary embodiment of the present application;

FIG. 15 is a block diagram of an apparatus for generating a three-dimensional face model according to an exemplary embodiment of the present application;

FIG. 16 is a block diagram of an apparatus for generating a three-dimensional object model provided in an exemplary embodiment of the present application;

fig. 17 is a schematic structural diagram of a computer device according to an exemplary embodiment of the present application.

Detailed Description

To make the objects, technical solutions and advantages of the present application more clear, embodiments of the present application will be described in further detail below with reference to the accompanying drawings.

First, terms referred to in the embodiments of the present application are described:

AI (Artificial Intelligence) is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human Intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.

The artificial intelligence technology is a comprehensive subject and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.

With the research and progress of artificial intelligence technology, the artificial intelligence technology is developed and applied in a plurality of fields, such as common smart homes, smart wearable devices, virtual assistants, smart speakers, smart marketing, unmanned driving, automatic driving, unmanned aerial vehicles, robots, smart medical care, smart customer service, face recognition, three-dimensional face model reconstruction, and the like.

The scheme provided by the embodiment of the application relates to the technical field of 3D face reconstruction, a standard face model corresponding to a standard object is fitted with a three-dimensional face grid to generate a three-dimensional face model of a target object, and a group of expression bases of the target object are generated based on the three-dimensional face model.

Computer Vision technology (Computer Vision, CV): computer vision is a science for researching how to make a machine "see", and further, it means that a camera and a computer are used to replace human eyes to perform machine vision such as identification, tracking and measurement on a target, and further image processing is performed, so that the computer processing becomes an image more suitable for human eyes to observe or transmitted to an instrument to detect. As a scientific discipline, computer vision research-related theories and techniques attempt to build artificial intelligence systems that can capture information from images or multidimensional data. The computer vision technology generally includes technologies such as image processing, image Recognition, image semantic understanding, image retrieval, OCR (Optical Character Recognition), video processing, video semantic understanding, video content/behavior Recognition, three-dimensional object reconstruction, 3D technology, virtual reality, augmented reality, synchronous positioning, map construction, and the like, and also includes common biometric technologies such as face Recognition, fingerprint Recognition, and the like.

The 3d Mobile Model (3D) library comprises two parts of an expression base and a shape base, and the 3D M library comprises a linear or bilinear form. As shown in fig. 1, a schematic diagram of a typical bilinear 3d dm library 10 is shown. Each row is the same person, and there are m rows (m different shapes, also called) for m persons in total, and each column in a row corresponds to a different expression, and there are n rows for n expressions in total.

Once the 3d mm library shown in fig. 1 is provided, a face with an arbitrary expression in an arbitrary shape can be parameterized by the 3d mm library. The following equation:

cr is a 3DMM library, the dimension is n multiplied by k multiplied by m, k is the number of single face point clouds, n is the number of expressions, m is the number of shape bases (or called as 'face pinching bases'), exp is an expression base coefficient, the dimension is 1 multiplied by n, id is a shape base coefficient, the dimension is m multiplied by 1, and n, m and k are positive integers.

From the above formula, any face can be parameterized as id + exp, where the id of the person is fixed, so a set of expression bases (also called blenderscape) specific to the person can be derived, as shown in fig. 2. Obviously, when such a set of expression bases 20 of the person exists, the person can be driven by expression base coefficients, and a three-dimensional face model of the person under any expression can be generated by transforming the expression base coefficients.

Fig. 3 shows a flow chart of the present invention. The method provided by the embodiment of the application is applied to the process of generating the three-dimensional face model of the target object and generating the group of expression bases of the target object. The process comprises the steps of calculating the vertex loss of each facial sub-area and the image loss of each facial sub-area, carrying out simultaneous optimization on the vertex loss and the image loss, obtaining a three-dimensional face model of a target object at the moment, carrying out post-processing on the three-dimensional face model to obtain an optimized three-dimensional face model, and generating a set of three-dimensional expression bases of the target object based on the three-dimensional face model.

After a group of expression bases of the target object are generated, the group of expression bases can be driven to generate a three-dimensional face model of the target object under any expression, and therefore different product functions are achieved. For example, the technical solution provided by the present application can be applied to scenes such as Augmented Reality expressions (AR expressions) used in games and social applications. In one example, a user uses an application program supporting three-dimensional face reconstruction in a smart phone, the application program calls a camera of the smart phone to scan the face of the user, generates a three-dimensional face model of the user based on a scanned image, generates a set of drivable three-dimensional expression bases of the user, and switches the three-dimensional face models under different expressions to realize corresponding functions by driving the set of expression bases, as shown in fig. 4 (b), wherein the set of expression bases 41 includes a plurality of expressions of one three-dimensional face model.

The method comprises the steps of fitting a standard face model in a 3DMM library with a three-dimensional face grid, wherein in the related technology, a group of standard expression bases are generated on the 3DMM library in advance, an input face model with any topology (namely the three-dimensional face grid or a model with a high module and a large number of key points) is aligned to each standard expression base (the standard face model or a model with a low module and a small number of key points) to generate a group of expression bases of a target object, the standard face model is pulled to the position corresponding to the three-dimensional face grid through a corresponding relation by the deformation transmission technology, and other points on the three-dimensional face grid are fitted to the standard face model through smooth operation. In actual operation, the three-dimensional face mesh is quite noisy, as shown in fig. 4 (a), the surface of the three-dimensional face mesh 40 is not smooth or has some flaws, so that the generated three-dimensional face model of the target object also has defects, especially, the complicated parts of the target object, such as the mouth and the nose, are prone to be defective, and the generated three-dimensional face model is not similar to or distorted with the face of the target object. In order to improve the quality of the generated three-dimensional face model, a high-quality corresponding relation is used in the process, and the high-quality corresponding relation needs to be calibrated manually.

The embodiment of the application provides a method, which is characterized in that a standard face model is subjected to regional fitting, so that the generated three-dimensional face model of a target object has higher quality, and the derived expression base is similar to the target object.

In the method flow provided by the embodiment of the application, the execution main body of each step may be a terminal such as a mobile phone, a tablet computer, a desktop computer, a notebook computer, a multimedia playing device, a wearable device, and the like, or may be a server. For convenience of description, in the following method embodiments, only the execution subject of each step is taken as an example of a computer device, and the description is made, alternatively, the computer device may be any electronic device with computing and storage capabilities, such as a terminal or a server.

Fig. 5 shows a flowchart of a method for generating a three-dimensional face model according to an exemplary embodiment of the present application. The method can be applied to the computer equipment, and comprises the following steps:

step 501, acquiring a three-dimensional face mesh of an input target object and a standard face model corresponding to a standard object.

A three-dimensional face Mesh (Mesh) refers to three-dimensional data describing a face of a target object using a set formed by vertices of a polyhedron and a polygon, which is at least one of a triangle, a quadrangle, or other simple convex polygon, in order to simplify a rendering process. Optionally, the three-dimensional face mesh may also include three-dimensional data composed of normal polygons with holes. This is not limited in this application. The three-dimensional face mesh is suitable for transformation such as rotation, translation, scaling and affine transformation. In the embodiment of the present application, the three-dimensional face mesh corresponds to a model with a large number of points (the number of key points) (high mode), and the standard face model corresponds to a model with a small number of points (the number of key points) (low mode).

Alternatively, continuous shooting may be performed around the face (or the entire head) of the target object by an image capture device, which refers to a device capable of shooting color images and depth images, and the continuous shooting is photo shooting at a preset frequency or video shooting. Illustratively, the acquired image information of the target object is input into a model to obtain a three-dimensional face mesh of the target object, and the model is a machine learning model with the capability of dividing the three-dimensional face mesh. Optionally, the acquired images include images of multiple perspectives, such as a front face pose image, a side face pose image, a top view pose image, and a top view pose image, of the target object.

Optionally, the standard face model corresponding to the standard object is a three-dimensional face model in a 3DMM library, and when reconstructing the three-dimensional face model, the standard face model needs to be fitted to a given three-dimensional face mesh, for example, after performing translation transformation, scaling transformation, and rotation transformation on the standard face model, the standard face model is aligned to the given three-dimensional face mesh. The standard face model corresponds to a transformable face template.

Step 502, dividing the three-dimensional face mesh and the standard face model into at least two face sub-regions according to the corresponding relationship.

The face sub-region is a partial region in the face region corresponding to the model. Optionally, the face sub-region may be divided according to a requirement, or according to a preset dividing manner, or according to a corresponding relationship, or in the same manner. Illustratively, the three-dimensional face mesh and the standard face model are divided into four face sub-regions according to the corresponding relationship, namely the face sub-regions corresponding to the nose, the mouth, the chin and other parts (the corresponding relationship is the five sense organs). Illustratively, the three-dimensional face mesh and the standard face model are divided into two face sub-regions according to a corresponding relationship, namely a surface region and an inner region, namely a cheek region, a nose bridge region and the like, and a nostril region (a corresponding relationship is a surface region and an inner region). Illustratively, the three-dimensional face mesh and the standard face model are divided into two face sub-regions according to a corresponding relationship, namely a simple face sub-region and a complex face sub-region, such as a single-layer region with a large area, such as a cheek, and a multi-layer region, such as a nose-mouth region (the corresponding relationship is the difficulty of fitting).

Illustratively, the three-dimensional face mesh and the standard face model are divided in the same manner, so that the face subregions divided by the three-dimensional face mesh and the standard face model are ensured to be in one-to-one correspondence.

Step 503, fitting each face sub-region in the standard face model to a corresponding face sub-region in the three-dimensional face mesh.

Schematically, dividing the standard face model and the three-dimensional face grid into four face sub-regions according to a corresponding relation, wherein the four face sub-regions are respectively corresponding to a nose, a mouth, a chin and other parts, fitting the nose region of the standard face model and the nose region of the three-dimensional face grid, and the nose region is a region surrounded by corresponding edges at the junction of the nose and the cheek and comprises a region where nostrils are located; and fitting the mouth region of the standard face model with the mouth region of the three-dimensional face grid.

Optionally, each face sub-region in the standard face model and the corresponding face sub-region in the three-dimensional face mesh may be fitted simultaneously, or may be fitted in a certain order, for example, fitting the nose region from top to bottom, and then fitting the mouth region. This is not limited in this application.

By continuously optimizing the error loss between the standard face model and the three-dimensional face grid, the error loss is converged, and the fitting degree of the standard face model and the three-dimensional face grid is continuously improved.

And step 504, after the at least two face sub-regions are fitted, performing fusion processing on the adjacent face sub-regions to obtain a three-dimensional face model corresponding to the target object.

The fusion processing refers to smoothing processing of an uneven area existing between two adjacent face sub-areas after fitting. In one example, the two fitted face sub-regions are a nose sub-region and a mouth region respectively, an uneven region exists between the nose region and the mouth region, so that a three-dimensional face model of the target object has certain noise, after the uneven region is subjected to fusion processing, the uneven region is converted into a smooth region, the noise of the three-dimensional face model is reduced, the image degree of the output three-dimensional face model and the target object is improved, and the three-dimensional face model is optimized.

Alternatively, the fusion process may be performed after the fitting of two adjacent face subregions, or after the fitting of partial face subregions, or after the fitting of each face subregion separately. For example, after fitting the nose region and the mouth region on the standard face model with the nose region and the mouth region on the three-dimensional face mesh, respectively, the nose region and the mouth region are subjected to fusion processing, or after fitting each face sub-region on the three-dimensional face mesh and the standard face model, adjacent face sub-regions are subjected to fusion processing.

In summary, in the method provided in this embodiment, compared with fitting the whole three-dimensional face mesh of the target object with the standard face, the three-dimensional face mesh and the standard face model are divided into at least two face sub-regions according to the corresponding relationship, each face sub-region in the standard face model is fitted into the three-dimensional face mesh, and then the adjacent face sub-regions are subjected to fusion processing to obtain the three-dimensional face model corresponding to the target object. By fitting each divided face sub-region, the quality of the output three-dimensional face model is improved, the phenomenon that the three-dimensional face model generates distortion is avoided, and the three-dimensional expression base derived based on the three-dimensional face model is enabled to better accord with the expression generated by the target object.

Fig. 6 shows a flowchart of a method for generating a three-dimensional face model according to an exemplary embodiment of the present application. The method can be applied to the computer equipment. The method comprises the following steps:

step 601, acquiring a three-dimensional face mesh of an input target object and a standard face model corresponding to a standard object.

Step 601 is identical to step 501 shown in fig. 5, and is not described herein again.

Step 602, dividing the three-dimensional face mesh and the standard face model into at least two face subregions according to the corresponding relationship.

As shown in fig. 7, the standard face model 70 is divided into a nose region 701, a mouth region 702, a chin region 703 and other part corresponding regions 704 (such as cheek parts and eye socket parts) according to the corresponding position relationship of the five sense organs. Illustratively, the three-dimensional face mesh is also divided into four face sub-regions corresponding to the standard face model.

Step 603, for each face sub-region in the standard face model, calculating an error loss when fitting the face sub-region to the corresponding face sub-region in the three-dimensional face mesh.

Optionally, the error loss comprises a vertex loss and an image loss.

The method comprises the following substeps:

and 6031a, acquiring the attitude parameters of each face subregion and a local three-dimensional face grid corresponding to the face subregion, wherein the local three-dimensional face grid is a part of the three-dimensional face grid.

Illustratively, the posture parameter of the nose region 701 is pose1, the posture parameter of the mouth region 702 is pose2, the posture parameter of the chin region 703 is pose3, and the posture parameter of the other part corresponding region 704 is pose 4.

And 6032a, calculating the vertex loss when each face subregion is fitted to the face subregion corresponding to the three-dimensional face mesh according to the attitude parameter of each face subregion and the local three-dimensional face mesh.

Optionally, the vertex loss comprises a first shape base loss (idloss) and a first expression base loss (exploss).

The formula for calculating the first shape base loss for each facial sub-region is as follows:

wherein, the optimized variables are the shape base coefficient id, sRT is the attitude parameter pose, s is the scaling parameter, R is the rotation matrix, T is the translation parameter, Cr is the three-dimensional deformation model (3 DMM library), exp is the mean value of the current expression base coefficient, M is the expression base coefficient_highThe method is a three-dimensional face grid (namely a high mode), and the best id coefficient can be solved through Gauss-Newton iteration.

The formula for calculating the first expression base loss for each facial sub-region is as follows:

wherein, the optimized variables are expression base coefficients exp, sRT are attitude parameters pos, s is a scaling parameter, R is a rotation matrix, T is a translation parameter, Cr is a 3DMM library, exp is the mean value of the current expression base coefficients, M is the mean value of the current expression base coefficients_highThe method is a three-dimensional face grid (namely a high mode), the best id coefficient can be solved through Gauss-Newton iteration, and the face sub-region optimized by the formula is a part corresponding to a corresponding relation (correspondence).

Optimizing the first shape base coefficient and the first expression base coefficient includes the substeps of:

s1, calculating a first shape base loss and a first expression base loss for fitting each face subregion to the local three-dimensional face grid according to the attitude parameter of each face subregion, the shape base coefficient and the expression base coefficient corresponding to each face subregion and the local three-dimensional face grid.

And S2, optimizing the shape base coefficient according to the first shape base loss, and calculating to obtain a first shape base coefficient after each face sub-region is optimized, wherein the first shape base coefficient is used for controlling the face phase of the target object.

And S3, optimizing the expression basis coefficients according to the first expression basis loss, and calculating to obtain the optimized first expression basis coefficient of each facial sub-region, wherein the first expression basis coefficient is used for controlling the expression of the target object.

And S4, repeating the three steps until the first shape base loss and the first expression base loss respectively converge.

And step 6031b, acquiring n face images of the target object, wherein the n face images are used for generating a three-dimensional face grid, and n is a positive integer.

The face image is an image acquired of the face (or the entire head) of the target object. The human face image can be acquired through image acquisition equipment, and the image acquisition equipment comprises a camera, a video camera, a scanner and a terminal with a shooting function (such as a mobile phone, a tablet computer, a notebook computer, a desktop computer connected with the camera and the like). Illustratively, the continuous shooting is performed around the face of the target object using the mobile terminal, the continuous shooting is performed as a video, or the continuous shooting is performed at a preset frequency, in one example, the face of the target object is kept still, and the shooting is performed around the target object using the mobile terminal, in another example, the shooting is performed by moving the face of the target object in the up, down, left, and right directions while keeping the mobile terminal still. The face image is an image based on various angles of view, and as shown in fig. 8, (a) of fig. 8 shows a face image 81 captured in a front angle of view of the target object, (b) of fig. 8 shows a face image 82 captured in a left side face angle of view of the target object, and (c) of fig. 8 shows a face image 83 captured in a right side face angle of view of the target object.

Each shooting instant will simultaneously take a color image and a depth image. The color image and the depth image shot at the same shooting moment form an image pair. In other words, each image pair includes a color image and a depth image taken at the same time.

Illustratively, the color image is a Red Green Blue (RGB) format color image. Each pixel point in the Depth (Depth, D) image stores a distance (Depth) value from the Depth camera to each real point in the scene.

Illustratively, the color image and the depth image are stored as two associated images, such as with the time of capture. Alternatively, the color image and the depth image are stored as the same image, for example, the image contains R, G, B, D four channels simultaneously. The embodiment of the present application does not limit the specific storage manner of the color image and the depth image.

And step 6032b, acquiring key point information in the face image, attitude parameters of the face image and key point index information in the standard face model.

Optionally, the face image is input into the model to obtain the key point information and the pose parameter in the face image, or the key point and the pose parameter are manually calibrated. Illustratively, the model is a machine learning model that supports detecting keypoints and pose parameters. Schematically, the manual calibration of the pose parameters measures parameters such as a face pose angle (e.g., euler angle), a face rotation angle, a scaling and the like by establishing a coordinate system on the face image.

And step 6033b, calculating image loss when each face sub-region is fitted to the corresponding face sub-region in the three-dimensional face grid according to the key point information, the posture parameters and the key point index information.

Optionally, the image loss includes a second shape base loss (idloss) and a second expression base loss (apploss), which are calculated for a keypoint (landmark) in the face image.

The formula for calculating the second shape-based loss of each facial sub-region on the three-dimensional face mesh is as follows:

k is a camera projection parameter, RT is a posture parameter (position) of a target object relative to an image, R is a rotation matrix, T is a translation parameter, the rotation matrix is obtained through calculation of a 3D-2D Point pair matching method (PnP), Cr is a 3DMM library, and landmark is a key Point in a detected face image.

The formula for calculating the loss of the second expression base of each facial sub-area on the three-dimensional face grid is as follows:

k is a camera projection parameter, RT is a pos of a target object relative to an image, R is a rotation matrix, T is a translation parameter and is obtained through PnP calculation, Cr is a 3DMM library, and landmark is a key point in a detected face image.

Optimizing the second shape base coefficient and the second expression base coefficient includes the substeps of:

and S11, calculating second shape base loss and second expression base loss when each facial sub-region is fitted to the corresponding facial sub-region in the three-dimensional face grid according to the key point information, the posture parameters, the key point index information, the shape base coefficient and the expression base coefficient corresponding to each facial sub-region.

And establishing a calculation formula of a second shape base coefficient and a second expression base coefficient corresponding to the key points in the face image according to the key point index information.

And S12, optimizing the shape base coefficient according to the second shape base loss, and calculating to obtain a second shape base coefficient after each face sub-area is optimized, wherein the second shape base coefficient is used for controlling the face phase of the target object.

And S13, optimizing the expression base coefficients according to the second expression base loss, and calculating to obtain a second expression base coefficient after each facial sub-region is optimized, wherein the second expression base coefficient is used for controlling the expression of the target object.

And S14, repeating the three steps until the second shape base loss and the second expression base loss respectively converge.

And step 604, fitting the face sub-regions to corresponding face sub-regions in the three-dimensional face grid when the error loss is converged.

For each face sub-region, fitting a part corresponding to an effective point (inliner) in the corresponding relation, wherein the effective point is a point which meets the requirement and has no error, for example, a point in an uneven region on the three-dimensional face model does not belong to the effective point. In combination with the corresponding key points (landmark) on the three-dimensional face mesh, each face sub-region corresponds to an independent shape basis coefficient (id) and expression basis coefficient (exp), and simultaneously has an independent pose parameter (pos, such as at least one of a rotation parameter R, a translation parameter T, and a scaling parameter S).

Step 605, after at least two facial sub-regions are fitted, a transition region in the adjacent facial sub-regions is obtained, where the transition region is a region composed of common points between the adjacent facial sub-regions.

As shown in fig. 9 (a), the standard face model is divided into four face sub-regions, including a nose region 901, a mouth region 902, a chin region 903, and a region 904 corresponding to other portions. Illustratively, there is a transition region 911 between the nose region 901 and the mouth region 902, and a transition region 912 between the mouth region 902 and the chin region 903, as shown in fig. 9 (b). It will be appreciated that there is a transition region between any two adjacent facial sub-regions.

And 606, performing fusion processing on the transition region to obtain a three-dimensional face model corresponding to the target object.

As shown in fig. 10, the three-dimensional face mesh corresponds to the column (a), the standard face model fitted to the whole head corresponds to the column (b), and the standard face model fitted by the method provided in this embodiment corresponds to the column (c), and the models include a front view 101 and a side view 102. It is obvious that at the positions corresponding to the nose region, the mouth region and the chin region, the method provided by the example fits the standard face model with better effect than the standard face model fitting the whole head.

And step 607, deriving a set of three-dimensional expression bases corresponding to each facial sub-region according to the shape base coefficient and the expression base coefficient of each facial sub-region.

Step 608, a set of three-dimensional expression bases corresponding to the at least two facial sub-regions is subjected to fusion processing, so as to obtain a set of three-dimensional expression bases of the target object.

It should be noted that the vertex loss of the standard face model calculated in step 6031a and step 6032a can be implemented independently, the image loss of the standard face model calculated in steps 6031b to 6033b can be implemented independently, and the step of calculating the vertex loss and the step of calculating the image loss can also be implemented in combination.

In summary, in the method provided in this embodiment, the three-dimensional face grid and the standard face model are divided into at least two face sub-regions according to the corresponding relationship, an error loss of each face sub-region of the standard face model is calculated, each face sub-region is fitted to the corresponding face sub-region in the three-dimensional face grid when the error loss is converged, after the at least two face sub-regions are fitted, a transition region between adjacent face sub-regions is subjected to fusion processing, so as to obtain a smooth three-dimensional face model of the target object, and a set of three-dimensional expression bases can be generated based on the three-dimensional face model. The generated three-dimensional face model is closer to the face of the target object by respectively fitting the standard face model after region division, so that the expression base is closer to the expression generated by the target object.

Fig. 11 is a flowchart illustrating a method for generating a three-dimensional face model according to an exemplary embodiment of the present application, where the method is applicable to the computer device, and the method includes the following steps:

step 1101, calculating the vertex loss of each face sub-region of the standard face model.

The vertex loss for each facial sub-region includes shape base loss (idloss) and expression base loss (exploss). The formula for calculating idloss is as follows:

wherein, the optimized variables are the shape base coefficient id, sRT is the attitude parameter pose, s is the scaling parameter, R is the rotation matrix, T is the translation parameter, Cr is the three-dimensional deformation model (3 DMM library), exp is the mean value of the current expression base coefficient, M is the expression base coefficient_highThe mode is high, and the best id coefficient can be solved through Gauss-Newton iteration.

The formula for calculating explores is as follows:

wherein, the optimized variables are expression base coefficients exp, sRT are attitude parameters pos, s is a scaling parameter, R is a rotation matrix, T is a translation parameter, Cr is a 3DMM library, exp is the mean value of the current expression base coefficients, M is the mean value of the current expression base coefficients_highThe method is high-modulus, the best id coefficient can be solved through Gauss-Newton iteration, and the face sub-region optimized by the formula is a part corresponding to a corresponding relation (correspondence).

Step 1102, calculating the image loss of each face subregion of the standard face model according to the key point information in the face image.

Some information can be acquired through the acquired face image of the target object to further establish a constraint optimization shape base coefficient and an expression base coefficient, and schematically, the face image is a photo of each visual angle of the target object, and the face image is an image of a color image and a depth image.

Because the landmark in the face image of each view corresponds to the keypoint information (keypoint) in the standard face model, knowing the index (index) of the keypoint, the error loss of the landmark can be directly established through the perspective projection relationship, and similarly, the error loss of the landmark comprises the expression basis loss and the shape basis loss, and the formula is as follows:

The formula for calculating the expression base is as follows:

Through the formula, the shape basis coefficient and the expression basis coefficient of landmark in the face image are optimized.

Step 1103, optimize vertex loss and image loss simultaneously.

And for each face subregion, fitting a part corresponding to the effective point in the corresponding relation, wherein the effective point is a point which meets the requirement and has no error, and for example, a flaw point existing on the three-dimensional face model does not belong to the effective point. In conjunction with the corresponding landmark, each facial sub-region corresponds to a separate shape basis coefficient and expression basis coefficient, as well as a separate pose parameter (e.g., at least one of a rotation parameter, a translation parameter, and a scaling parameter). The specific process of simultaneous optimization is as follows:

1. the initial id and exp are set to the mean of the 3DMM library.

Illustratively, a plurality of different standard face models are arranged in the 3d dm library, each standard face model contains a shape base coefficient and an expression base coefficient, the shape base coefficient is used for controlling the facial phase of the standard face model, the expression base coefficient is used for controlling the expression of the standard face model, and the shape base coefficient and the expression base coefficient of different standard face models are different. Optionally, the average value of the shape base coefficients of all the standard face models in the 3DMM library is taken to obtain an initial shape base coefficient, and the average value of the expression base coefficients of all the standard face models in the 3DMM library is taken to obtain an initial expression base coefficient.

2. Each face sub-region is traversed.

In one example, the three-dimensional face mesh and the standard face model are divided into 4 face sub-regions according to the corresponding relationship, and the position of each face sub-region and the shape base coefficient and the expression base coefficient corresponding to each face sub-region need to be determined. It will be appreciated that the shape basis coefficients and expression basis coefficients of the three-dimensional face mesh are different from those of a standard face model.

3. And calculating the shape base coefficient and the expression base coefficient of the corresponding face sub-region on the standard face model by using the current shape base coefficient and the expression base coefficient, calculating the face pose of the target object by using the corresponding relation, and calculating the pose of the corresponding face sub-region relative to the face image by pnp.

In one example, the calculated shape base coefficient corresponding to the nose region of the standard face model is id1 and the expression base coefficient is exp1, the shape base coefficient corresponding to the mouth region is id2 and the expression base coefficient is exp2, the shape base coefficient corresponding to the chin region is id3 and the expression base coefficient is exp3, and the shape base coefficient corresponding to the other partial region is id4 and the expression base coefficient is exp 4. Illustratively, the face pose of the target object is pos 1, and the pose of the nose region with respect to the face image is pos 2.

4. And fixing the expression base coefficients, and optimizing the shape base coefficients of the corresponding face sub-area by using vertex loss vertex _ loss and landmark _ loss.

5. And fixing the shape base coefficients, and optimizing the expression base coefficients of the corresponding face sub-area by using vertex loss vertex _ loss and landmark _ loss.

6. And iterating the 2 nd step, the 3 rd step and the 4 th step until convergence.

7. Steps 3, 4, 5 and 6 are repeated for the next facial sub-region.

Based on the above process, the error loss of each face sub-region can be calculated, the error loss of each face sub-region is optimized respectively, the shape base coefficient and the expression base coefficient of each face sub-region are obtained, and therefore the three-dimensional face mesh corresponding to each face sub-region can be obtained, and the three-dimensional face mesh corresponding to each face sub-region is fused to obtain the final three-dimensional face mesh.

And step 1104, performing post-processing on the generated three-dimensional face model of the target object.

As can be seen from fig. 12 (a), the generated three-dimensional face model has obvious wrinkles at the boundary of the adjacent face sub-regions, and the transition region needs to be smoothed. Where each face sub-region borders a circle of common points is determined, as shown in fig. 9, (a) of fig. 9 divides the generated three-dimensional face model into a nose region 901, a mouth region 902, a chin region 903 and other corresponding regions 904. Illustratively, the nose region 901 and the mouth region 902 have a common point junction 911, and the mouth region 902 and the chin region 903 have a common point junction 912.

The fusion processing of the transition region comprises the following sub-steps:

step 111, a first distance and a second distance are obtained, wherein the first distance is a distance between the common point in the transition region and the first surface sub-region, and the second distance is a distance between the common point in the transition region and the second surface sub-region.

In one example, the common point a is a common point of a transition region between the nose region and the mouth region, the first face sub-region is the nose region, the second face sub-region is the mouth region, a distance between the common point a and the nose region is a first distance S1, and a distance between the common point B and the nose region is a second distance S2.

A first weight of the common point relative to the first facial sub-region is determined based on the first distance, and a second weight of the common point relative to the second facial sub-region is determined based on the second distance, step 112.

Illustratively, the sum of the squares of the inverses of S1 and S2, normalized to a first weight of the common point A relative to the nose region and a second weight of the common point A relative to the mouth region. It will be appreciated that the closer a common point is to a face sub-region, the greater the weight of that common point to that face sub-region; conversely, the smaller the weight of the common point to the face sub-region.

And step 113, determining a boundary point according to the shape base coefficient and the expression base coefficient corresponding to the first facial sub-region, the first weight, the shape base coefficient and the expression base coefficient corresponding to the second facial sub-region, and the second weight.

And multiplying the common point by the shape base coefficient and the expression base coefficient of each facial sub-region to obtain a product, and performing weight weighted summation on the product to determine a final boundary point.

And step 114, replacing the common points with boundary points, wherein the boundary points form the region subjected to the fusion processing.

And step 115, obtaining a three-dimensional face model corresponding to the target object according to the fused region.

Illustratively, the standard face model is fused according to the final boundary point, and the transition area becomes smooth. As shown in fig. 12, (a) of fig. 12 is an effect diagram obtained by directly stitching the face sub-regions of the standard face model, where there are wrinkles at positions corresponding to the nose region and the mouth region of the standard face model, and (b) of fig. 12 is an effect diagram obtained by performing fusion processing on the transition region of the standard face model, and the positions corresponding to the nose region and the mouth region of the standard face model are smoothed.

The smoothing method is not limited to the above method, and any other weighted average algorithm or method with the same effect can be used for smoothing the three-dimensional face model.

Step 1105, deriving an expression base based on the three-dimensional face model of the target object.

And in accordance with the method for deriving the expression bases of the target object, calculating a group of expression bases on the 3DMM library through the shape base coefficients. The method provided by this embodiment needs to derive a set of expression bases for each facial sub-region separately, and at the same time, the method in step 1104 is used to smooth the transition region, and generate a set of three-dimensional expression bases for the target object.

In summary, in the method provided in this example, vertex loss of each facial sub-region of the standard face model and image loss of key point information in the face image are calculated, the vertex loss and the image loss are optimized simultaneously, and the generated three-dimensional face model of the target object is post-processed, so that the generated three-dimensional face model has higher quality and is more similar to the input three-dimensional face mesh of the target object, and thus a set of three-dimensional expression bases derived based on the three-dimensional face model is closer to the expression generated by the target object.

The method for generating a three-dimensional face model may be further combined with the method for generating a three-dimensional object model, and an embodiment of the present application further provides a method for generating a three-dimensional object model, as shown in fig. 13, where the method may be applied to the computer device, and the method includes the following steps:

step 1301, acquiring the input three-dimensional shape grid of the target object and a standard shape model corresponding to the standard object.

The target object includes at least one of a face or a head of an animal, a body part of a human, and an object having a shape. Illustratively, the target object is the face of a rabbit, or the elbow joint of a human, or an object with a complicated shape and structure. The three-dimensional shape mesh of the target object can be obtained by acquiring an image of the target object, the standard shape model corresponding to the standard object can be obtained by a corresponding shape model library, and the standard shape model is a model in a static state or a basic state (without other additional parts) of the target object.

Step 1302, the three-dimensional outline mesh and the standard outline model are divided into at least two shape sub-regions according to a corresponding relationship.

The shape sub-region is a partial region of the external shape region of the target object, and optionally, the shape sub-region may be divided according to a requirement, or divided according to a preset dividing manner, or divided according to a corresponding relationship, or divided in the same manner.

In one example, the target object is a rabbit head, and the three-dimensional shape mesh corresponding to the rabbit head and the standard shape model are divided into an eye region, a mouth region, an ear region, and other regions according to corresponding relations. Optionally, the standard outline model is a three-dimensional rabbit head model in a model library having standard outline models of rabbit heads corresponding to multiple types of rabbits.

And step 1303, fitting each shape sub-region in the standard outline model to a corresponding shape sub-region in the three-dimensional outline grid respectively.

Optionally, the three-dimensional shape mesh and the standard shape model are fitted by calculating at least one of vertex loss and image loss.

And calculating the error loss of each shape subregion fitting to the shape subregion corresponding to the three-dimensional shape mesh according to the attitude parameter of each shape subregion and the local three-dimensional shape mesh of the shape subregion, which is consistent with the principle of generating the three-dimensional face model. The local three-dimensional outline mesh is a part of the three-dimensional outline mesh.

Or, calculating the image loss of the shape sub-region fitted to the corresponding shape sub-region in the three-dimensional outline mesh by acquiring the key point information, the attitude parameters and the key point index information in the standard outline model in the target object from the n images of the target object. When the target object is the face or the head of an animal, the key points are points corresponding to organs of the face or the head, such as the eyes, the mouth, the ears and the like of the animal; when the target object is a body part of an animal, the key points are the contours of the body part or points corresponding to features that can be distinguished from other animals, such as the contour of a tail; when the target object is a body part of a person, the key points are points corresponding to contours or joints of the body part of the person, such as elbow joints; when the target object is an object having a complex structure or shape, the key point is a point corresponding to the contour or characteristic shape of the object, such as a spherical structure.

Optionally, the vertex loss is calculated in the same or different manner as the three-dimensional face model, and the image loss is calculated in the same or different manner as the three-dimensional face model.

And 1304, fitting the at least two shape sub-regions, and fusing the adjacent shape sub-regions to obtain a three-dimensional object model corresponding to the target object.

Illustratively, the fitted two adjacent shape sub-regions are fused, or after each shape sub-region is fitted, the adjacent shape sub-regions are fused, so that the fitted shape sub-regions are smoother. The mode of the fusion processing is the same as or different from that of the three-dimensional face model.

The method in this embodiment may be applied to generate an AR expression package and generate a corresponding three-dimensional object model from a three-dimensional face mesh, such as generating a three-dimensional object model with an animal face from a three-dimensional face mesh, and derive a set of three-dimensional expression bases based on the three-dimensional object model, where the three-dimensional expression bases are expression bases based on the animal face.

In one example, a user uses a smart phone, an application program supporting generation of a three-dimensional object model is run on the smart phone, the application program calls a camera of the smart phone to collect facial images of the user and generates an animal facial model with corresponding expressions, the animal facial model is a rabbit facial model, and a group of expression bases corresponding to the expressions of the user can be derived based on the rabbit facial model.

In summary, the method provided in this embodiment is similar to the method for generating the three-dimensional face model, and performs regional fitting on the three-dimensional shape mesh corresponding to the target object and the standard shape model, and performs fusion processing on the fitted models, thereby improving the quality of the output three-dimensional object model corresponding to the target object.

In the following, the technical solution of the present application is described by taking the application to a voice interaction scenario as an example.

Please refer to fig. 14, which shows a flowchart of a voice interaction method according to an embodiment of the present application. The execution subject of the method can be a terminal such as a mobile phone, a tablet computer, a wearable device and the like. The method may include the steps of:

step 1401, determining voice information to be played and an expression sequence corresponding to the voice information, where the expression sequence includes at least one expression.

The mapping relationship between the voice information and the expression sequence can be stored in advance, and after the voice information to be played is determined, the expression sequence corresponding to the voice information can be found according to the mapping relationship. For example, the mapping relationship between each pronunciation and the expression may be stored, after the voice information to be played is determined, a pronunciation sequence corresponding to the voice information may be determined, the pronunciation sequence includes at least one pronunciation arranged in sequence, and the expression corresponding to each pronunciation is obtained, that is, the expression sequence corresponding to the voice information may be obtained.

Step 1402, for each expression in the expression sequence, generating a three-dimensional face model of the target object under each expression according to the expression base coefficient corresponding to each expression and the expression base of the target object.

Taking the target expression of the target object as an example, determining a target expression base coefficient corresponding to the target expression, and then generating a three-dimensional face model of the target object under the target expression according to the target expression base coefficient and the expression base of the target object.

Optionally, rendering the three-dimensional face model under each expression by using a texture map of the target object to obtain the three-dimensional face model with textures under each expression.

Optionally, a set of expression bases of the target object is generated as follows: shooting and acquiring image pairs of a target object under n head gestures, wherein each image pair comprises an RGB (red, green and blue) image and a depth image under one head gesture, and n is a positive integer; a set of expression bases for the target object is generated from the n image pairs. For the generation process of the expression base, reference may be made to the description in the above embodiments, which is not described in detail in this embodiment.

Step 1403, the voice information is played.

And 1404, sequentially displaying the three-dimensional face models under the expressions according to the sequence of the expressions contained in the expression sequence in the process of playing the voice information.

Optionally, in the process of playing the voice information, the three-dimensional face model with texture under each expression is sequentially displayed according to the sequence of each expression contained in the expression sequence.

To sum up, in the technical scheme provided in the embodiment of the present application, a three-dimensional face model of a target object under each expression is generated by determining an expression sequence corresponding to voice information to be played, and according to an expression basis coefficient corresponding to each expression and an expression basis of the target object, and in the process of playing the voice information, the three-dimensional face model under each expression is sequentially displayed according to the sequence of each expression contained in the expression sequence, so that a scheme of performing voice interaction based on a model obtained by reconstructing a three-dimensional face is realized, and matched expressions can be displayed according to voice played in real time, which is more vivid.

The following are embodiments of the apparatus of the present application, and for details that are not described in detail in the embodiments of the apparatus, reference may be made to corresponding descriptions in the above method embodiments, and details are not described herein again.

Fig. 15 shows a schematic structural diagram of a device for generating a three-dimensional face model according to an exemplary embodiment of the present application. The apparatus can be implemented as all or a part of a terminal by software, hardware or a combination of both, and includes:

a first obtaining module 1510, configured to obtain an input three-dimensional face mesh of a target object and a standard face model corresponding to a standard object;

a first processing module 1520, configured to divide the three-dimensional face mesh and the standard face model into at least two face sub-regions according to a corresponding relationship;

the first fitting module 1530 is configured to respectively fit each face sub-region in the standard face model to a corresponding face sub-region in the three-dimensional face mesh;

the first fusion module 1540 is configured to perform fusion processing on the adjacent face subregions after the at least two face subregions are fitted, so as to obtain a three-dimensional face model corresponding to the target object.

In an alternative embodiment, the apparatus includes a calculation module 1550;

the first calculating module 1550 is configured to calculate, for each face subregion in the standard face model, an error loss when the face subregion is fitted to a corresponding face subregion in the three-dimensional face mesh;

the fitting module 1530 is configured to fit the face sub-region to a corresponding face sub-region in the three-dimensional face mesh when the error loss converges.

In an alternative embodiment, the error penalty comprises a vertex penalty;

the first obtaining module 1510 is configured to obtain a pose parameter of each facial sub-region and a local three-dimensional face mesh corresponding to the facial sub-region, where the local three-dimensional face mesh is a part of the three-dimensional face mesh;

the first calculating module 1550 is configured to calculate, according to the pose parameter of each facial sub-region and the local three-dimensional face mesh, a vertex loss when each facial sub-region is fitted to a facial sub-region corresponding to the three-dimensional face mesh.

In an alternative embodiment, the vertex penalty comprises a first shape base penalty and a first expression base penalty;

the first calculating module 1550 is configured to calculate a first shape base loss and a first expression base loss for fitting each facial sub-region to the local three-dimensional face grid according to the pose parameter of each facial sub-region, the shape base coefficient and the expression base coefficient corresponding to each facial sub-region, and the local three-dimensional face grid;

the first calculating module 1550 is configured to optimize the shape base coefficient according to the first shape base loss, and calculate to obtain a first shape base coefficient after each face subregion is optimized;

the first calculating module 1550 is configured to optimize the expression base coefficients according to the first expression base loss, and calculate to obtain a first expression base coefficient after each facial sub-region is optimized;

the first processing module 1520 is configured to repeat the above three steps until the first shape base loss and the first expression base loss converge respectively.

In an alternative embodiment, the error loss comprises image loss;

the first obtaining module 1510 is configured to obtain n face images of a target object, where the n face images are images used to generate a three-dimensional face mesh, and n is a positive integer;

the first obtaining module 1510 is configured to obtain key point information in a face image, pose parameters of the face image, and key point index information in a standard face model;

the first calculating module 1550 is configured to calculate, according to the key point information, the pose parameter, and the key point index information, an image loss when each face sub-region is fitted to a corresponding face sub-region in the three-dimensional face mesh.

In an alternative embodiment, the image loss comprises a second shape-based loss and a second expression-based loss;

the first calculating module 1550 is configured to calculate, according to the key point information, the pose parameter, the key point index information, and the shape basis coefficient and the expression basis coefficient corresponding to the face sub-regions, a second shape basis loss and a second expression basis loss when each face sub-region is fitted to a corresponding face sub-region in the three-dimensional face mesh;

the first calculating module 1550 is configured to optimize the shape base coefficient according to the second shape base loss, and calculate a second shape base coefficient after each facial sub-region is optimized;

the first calculating module 1550 is configured to optimize the expression base coefficient according to the second expression base loss, and calculate to obtain a second expression base coefficient after each facial sub-region is optimized;

the first processing module 1520 is configured to repeat the above three steps until the second shape base loss and the second expression base loss converge respectively.

In an alternative embodiment, the first obtaining module 1510 is configured to obtain a transition region in the adjacent face sub-regions after the fitting of the at least two face sub-regions, where the transition region is a region formed by common points between the adjacent face sub-regions;

the first processing module 1520 is configured to perform fusion processing on the transition region to obtain a three-dimensional face model corresponding to the target object.

In an optional embodiment, the first obtaining module 1510 is configured to obtain a first distance and a second distance, where the first distance is a distance between the common point and the first facial sub-region in the transition region, and the second distance is a distance between the common point and the second facial sub-region in the transition region;

the first calculation module 1550, configured to determine a first weight of the common point with respect to the first facial sub-region according to the first distance, and determine a second weight of the common point with respect to the second facial sub-region according to the second distance;

the first calculating module 1550 is configured to determine a boundary point according to the shape base coefficient and the expression base coefficient corresponding to the first facial sub-region, the first weight, the shape base coefficient and the expression base coefficient corresponding to the second facial sub-region, and the second weight;

the first processing module 1520, configured to replace the common point with a junction point, where the junction point constitutes an area after the fusion processing;

the first processing module 1520 is configured to obtain a three-dimensional face model corresponding to the target object according to the region after the fusion processing.

In an alternative embodiment, the first processing module 1520 is configured to derive a set of three-dimensional expression bases corresponding to each facial sub-region according to the shape base coefficient and the expression base coefficient of each facial sub-region; and carrying out fusion processing on a group of three-dimensional expression bases corresponding to the at least two facial sub-regions to obtain a group of three-dimensional expression bases of the target object.

Fig. 16 is a schematic structural diagram illustrating a device for generating a three-dimensional object model according to an exemplary embodiment of the present application. The apparatus can be implemented as all or a part of a terminal by software, hardware or a combination of both, and includes:

a second obtaining module 1610, configured to obtain an input three-dimensional shape grid of the target object and a standard shape model corresponding to the standard object;

a second processing module 1620, configured to divide the three-dimensional shape mesh and the standard shape model into at least two shape sub-regions according to a corresponding relationship;

the second obtaining module 1610 is configured to obtain a pose parameter of each shape sub-region and a local three-dimensional shape mesh corresponding to the shape sub-region, where the local three-dimensional shape mesh is a part of the three-dimensional shape mesh;

the second calculating module 1650 is configured to calculate, according to the attitude parameter of each shape subregion and the local three-dimensional shape mesh, an error loss when each shape subregion is fitted to a shape subregion corresponding to the three-dimensional shape mesh, where the error loss includes a vertex loss;

a second fitting module 1630, configured to fit the shape sub-regions to corresponding shape sub-regions in the three-dimensional shape mesh when the vertex loss converges;

and a second fusion module 1640, configured to perform fusion processing on adjacent shape sub-regions after the at least two shape sub-regions are fitted, so as to obtain a three-dimensional object model corresponding to the target object.

Fig. 17 shows a block diagram of a terminal 1700 according to an embodiment of the present application. The terminal 1700 may be an electronic device such as a mobile phone, a tablet computer, a wearable device, a multimedia playing device, and a camera.

In general, terminal 1700 includes: a processor 1701 and a memory 1702.

The processor 1701 may include one or more processing cores, such as a 4-core processor, a 17-core processor, and the like. The processor 1701 may be implemented in at least one hardware form of a DSP (Digital Signal Processing), an FPGA (field Programmable Gate Array), and a PLA (Programmable Logic Array). The processor 1701 may also include a main processor, which is a processor for processing data in an awake state, also called a Central Processing Unit (CPU), and a coprocessor; a coprocessor is a low power processor for processing data in a standby state. In some embodiments, the processor 1701 may be integrated with a GPU (Graphics Processing Unit) that is responsible for rendering and rendering content that the display screen needs to display. In some embodiments, the processor 1701 may further include an AI (Artificial Intelligence) processor for processing computing operations related to machine learning.

The memory 1702 may include one or more computer-readable storage media, which may be non-transitory. The memory 1702 may also include high-speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In some embodiments, a non-transitory computer readable storage medium in the memory 1702 is used to store at least one instruction, at least one program, code set, or set of instructions for execution by the processor 1701 to implement a method for generating a three-dimensional face model, or a method for speech interaction, or a method for generating a three-dimensional object model as provided by the method embodiments of the present application.

In some embodiments, terminal 1700 may also optionally include: a peripheral interface 1703 and at least one peripheral. The processor 1701, memory 1702 and peripheral interface 1703 may be connected by buses or signal lines. Various peripheral devices may be connected to peripheral interface 1703 by a bus, signal line, or circuit board. Specifically, the peripheral device may include: at least one of a radio frequency circuit 1704, a touch display screen 1705, a camera 1706, an audio circuit 1707, a positioning component 1708, and a power source 1709.

The camera 1706 may be a three-dimensional camera formed by a color camera and a depth camera.

Those skilled in the art will appreciate that the architecture shown in fig. 17 is not intended to be limiting with respect to terminal 1700, and may include more or fewer components than those shown, or some components may be combined, or a different arrangement of components may be employed.

In an exemplary embodiment, a computer readable storage medium is further provided, in which at least one instruction, at least one program, a code set, or a set of instructions is stored, which when executed by a processor of a computer device, implements the above-mentioned method for generating a three-dimensional face model and method for generating a three-dimensional object model.

Optionally, the computer-readable storage medium may include: a Read Only Memory (ROM), a Random Access Memory (RAM), a Solid State Drive (SSD), or an optical disc, etc. The Random Access Memory may include a resistive Random Access Memory (ReRAM) and a Dynamic Random Access Memory (DRAM).

In an exemplary embodiment, a computer program product is also provided, which, when executed by a processor of a computer device, is configured to implement the above-mentioned three-dimensional face model generation method and three-dimensional object model generation method.

It should be understood that reference to "a plurality" herein means two or more. "and/or" describes the association relationship of the associated objects, meaning that there may be three relationships, e.g., a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship.

It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, where the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.

The above description is only exemplary of the present application and should not be taken as limiting, as any modification, equivalent replacement, or improvement made within the spirit and principle of the present application should be included in the protection scope of the present application.

Claims

1. A method for generating a three-dimensional face model, the method comprising:

after the at least two face sub-regions are fitted, carrying out fusion processing on adjacent face sub-regions to obtain a three-dimensional face model corresponding to the target object;

deriving a group of three-dimensional expression bases corresponding to each facial sub-region according to the shape base coefficient and the expression base coefficient of each facial sub-region;

and performing the fusion processing on the group of three-dimensional expression bases corresponding to the at least two facial sub-regions to obtain a group of three-dimensional expression bases of the target object.

2. The method of claim 1, wherein the vertex loss comprises a first shape base loss and a first expression base loss;

the calculating, according to the pose parameter of each of the face subregions and the local three-dimensional face mesh, the vertex loss when each of the face subregions is fitted to the face subregion corresponding to the three-dimensional face mesh, includes:

calculating the first shape base loss and the first expression base loss for fitting each facial sub-region to the local three-dimensional face grid according to the attitude parameter of each facial sub-region, the shape base coefficient and the expression base coefficient corresponding to each facial sub-region and the local three-dimensional face grid;

optimizing the shape base coefficient according to the first shape base loss, and calculating to obtain a first shape base coefficient after each face subregion is optimized;

optimizing the expression base coefficients according to the first expression base loss, and calculating to obtain the optimized first expression base coefficient of each facial subregion;

and repeating the three steps related to calculating the first shape base loss and the first expression base loss, optimizing the first shape base coefficient and optimizing the first expression base coefficient until the first shape base loss and the first expression base loss respectively converge.

3. The method of claim 1, wherein the error loss comprises image loss;

for each face sub-region in the standard face model, calculating an error loss when fitting the face sub-region to a corresponding face sub-region in the three-dimensional face mesh, comprising:

acquiring n face images of the target object, wherein the n face images are used for generating the three-dimensional face grid, and n is a positive integer;

acquiring key point information in the face image, attitude parameters of the face image and key point index information in the standard face model;

and calculating the image loss when each face subregion is fitted to the corresponding face subregion in the three-dimensional face grid according to the key point information, the posture parameters and the key point index information.

4. The method of claim 3, wherein the image loss comprises a second shape-based loss and a second expression-based loss;

the calculating, according to the keypoint information, the pose parameter, and the keypoint index information, an image loss when each of the facial sub-regions is fitted to a corresponding facial sub-region in the three-dimensional face mesh, includes:

calculating the second shape base loss and the second expression base loss when each face sub-region is fitted to the corresponding face sub-region in the three-dimensional face grid according to the key point information, the posture parameters, the key point index information, and the shape base coefficient and the expression base coefficient corresponding to each face sub-region;

optimizing the shape base coefficient according to the second shape base loss, and calculating to obtain a second shape base coefficient after each face subregion is optimized;

optimizing the expression base coefficient according to the second expression base loss, and calculating to obtain a second expression base coefficient after each facial subregion is optimized;

and repeating the three steps related to calculating the second shape base loss and the second expression base loss, optimizing the second shape base coefficient and optimizing the second expression base coefficient until the second shape base loss and the second expression base loss are converged respectively.

5. The method of any of claims 1 to 4, further comprising:

after the at least two facial sub-regions are fitted, acquiring a transition region in the adjacent facial sub-regions, wherein the transition region is a region formed by common points between the adjacent facial sub-regions;

and carrying out the fusion processing on the transition region to obtain a three-dimensional face model corresponding to the target object.

6. The method according to claim 5, wherein the performing the fusion processing on the transition region to obtain the three-dimensional face model corresponding to the target object comprises:

acquiring a first distance and a second distance, wherein the first distance is the distance between a common point and a first face sub-region in the transition region, and the second distance is the distance between the common point and a second face sub-region in the transition region;

determining a first weight of the common point relative to the first facial sub-region as a function of the first distance, and a second weight of the common point relative to the second facial sub-region as a function of the second distance;

determining a boundary point according to the shape base coefficient and the expression base coefficient corresponding to the first face sub-region, the first weight, the shape base coefficient and the expression base coefficient corresponding to the second face sub-region, and the second weight;

replacing the common points with the junction points, wherein the junction points form an area subjected to the fusion treatment;

and obtaining a three-dimensional face model corresponding to the target object according to the region after the fusion processing.

7. A method of generating a three-dimensional object model, the method comprising:

after the at least two shape sub-regions are fitted, carrying out fusion processing on the adjacent shape sub-regions to obtain a three-dimensional object model corresponding to the target object;

deriving a group of three-dimensional expression bases corresponding to each shape subregion according to the shape base coefficient and the expression base coefficient of each shape subregion;

and performing the fusion processing on the group of three-dimensional expression bases corresponding to the at least two shape sub-regions to obtain a group of three-dimensional expression bases of the target object.

8. An apparatus for generating a three-dimensional face model, the apparatus comprising:

the first fusion module is used for performing fusion processing on the adjacent face subregions after the at least two face subregions are fitted to obtain a three-dimensional face model corresponding to the target object;

the first processing module is used for deriving a group of three-dimensional expression bases corresponding to each facial sub-region according to the shape base coefficient and the expression base coefficient of each facial sub-region; and performing the fusion processing on the group of three-dimensional expression bases corresponding to the at least two facial sub-regions to obtain a group of three-dimensional expression bases of the target object.

9. An apparatus for generating a three-dimensional object model, the apparatus comprising:

the second fusion module is used for performing fusion processing on the adjacent shape subregions after the at least two shape subregions are fitted to obtain a three-dimensional object model corresponding to the target object;

the second processing module is used for deriving a group of three-dimensional expression bases corresponding to each shape sub-region according to the shape base coefficient and the expression base coefficient of each shape sub-region; and performing the fusion processing on the group of three-dimensional expression bases corresponding to the at least two shape sub-regions to obtain a group of three-dimensional expression bases of the target object.

10. A computer device, characterized in that it comprises a processor and a memory, in which at least one instruction, at least one program, a set of codes or a set of instructions is stored, which is loaded and executed by the processor to implement the method of generating a three-dimensional face model according to any one of claims 1 to 6 or the method of generating a three-dimensional object model according to claim 7.

11. A computer-readable storage medium, having stored therein at least one instruction, at least one program, a set of codes, or a set of instructions, which is loaded and executed by a processor to implement the method of generating a three-dimensional face model according to any one of claims 1 to 6 or the method of generating a three-dimensional object model according to claim 7.