CN114373044A

CN114373044A - Method, device, computing equipment and storage medium for generating three-dimensional face model

Info

Publication number: CN114373044A
Application number: CN202111547342.1A
Authority: CN
Inventors: 刁俊; 尹义; 彭佳彬
Original assignee: Shenzhen Intellifusion Technologies Co Ltd
Current assignee: Shenzhen Intellifusion Technologies Co Ltd
Priority date: 2021-12-16
Filing date: 2021-12-16
Publication date: 2022-04-19

Abstract

The application provides a method, a device, a computing device and a storage medium for generating a three-dimensional face model. The method comprises the following steps: acquiring target image data, wherein the target image data comprises image characteristic information of a target face and face key point information; generating a basic three-dimensional model of the target face according to the image characteristic information; generating a current feature vector of the target face according to the face key point information; generating face texture data of the basic three-dimensional model based on the face texture data of at least one standard three-dimensional model according to the correlation degree of the current feature vector and the standard feature vector of the at least one standard three-dimensional model; and fusing the face texture data of the basic three-dimensional model with the basic three-dimensional model to obtain a three-dimensional reconstruction model of the target face. By means of the arrangement, the face texture data of the face three-dimensional model can be accurately generated, and the sense of reality of the three-dimensional model is enhanced.

Description

Method, device, computing equipment and storage medium for generating three-dimensional face model

Technical Field

The present application relates to the field of image processing technologies, and in particular, to a method, an apparatus, a computing device, and a storage medium for generating a three-dimensional model of a face.

Background

By the current face three-dimensional modeling technology, a face three-dimensional model can be formed according to one or more pictures of a face. With the development of science and technology, people develop more and more applications to the three-dimensional model of the human face. For example, 3D (3 dimensions) human face try-ups, try-ons, beauty, AR interactions, etc. Besides the photos, the real face three-dimensional model of the user can be restored according to the face videos of the user, which are shot by one or more cameras. The three-dimensional model of the face is consistent with the user expression in the picture or video so that the degree of restoration is higher. However, under the condition that the resolution of the two-dimensional picture or video is not ideal, the face texture of the three-dimensional face model generated by the existing processing method is poor, and the reality sense is not good, which greatly influences the user experience.

Disclosure of Invention

In view of this, embodiments of the present application provide a method, an apparatus, a computing device, and a storage medium for generating a three-dimensional face model, where the three-dimensional face model generated by the method can accurately show texture details of a face, so as to improve accuracy of face reduction.

A first aspect of an embodiment of the present application provides a method for generating a three-dimensional model of a face, including:

acquiring target image data, wherein the target image data comprises image characteristic information of a target face and face key point information;

generating a basic three-dimensional model of the target face according to the image characteristic information;

generating a current feature vector of the target face according to the face key point information;

obtaining at least one standard three-dimensional model, wherein each standard three-dimensional model comprises a corresponding standard feature vector, each standard feature vector is used for representing a basic expression, and each standard three-dimensional model comprises corresponding facial texture data;

generating face texture data of the base three-dimensional model based on the face texture data of the at least one standard three-dimensional model according to the degree of correlation of the current feature vector and a standard feature vector of the at least one standard three-dimensional model; and

and fusing the face texture data of the basic three-dimensional model with the basic three-dimensional model to obtain a three-dimensional reconstruction model of the target face.

According to the embodiment of the application, image data containing face image feature information and face key point information are firstly obtained, a basic three-dimensional model of a face is generated according to the face image feature information, face texture data of the basic three-dimensional model are generated according to the correlation degree of a current feature vector generated from the face key point information and a standard feature vector of at least one standard three-dimensional model, and finally the face texture data and the basic three-dimensional model are fused to obtain a three-dimensional reconstruction model of a target face. By generating face texture data to be used for the base three-dimensional model based on the correlation of the current feature vector and the standard feature vector, rather than using fixed face texture data, the generated face texture data can be made to more conform to the facial expression of the base three-dimensional model, thereby improving the realism of the face three-dimensional model.

In one embodiment, the obtaining at least one standard three-dimensional model comprises: and taking the basic three-dimensional model and the current feature vector as the input of the first class function model to obtain the at least one standard three-dimensional model as the output of the first class function model.

In one embodiment, the training samples of the first type of function model are a plurality of data samples, each data sample includes a three-dimensional model of the face and a corresponding current feature vector, and the index of each data sample is at least one type of standard three-dimensional model and a corresponding standard feature vector and corresponding facial texture data, wherein each type of standard three-dimensional model represents a basic expression.

In an embodiment, the generating face texture data of the base three-dimensional model based on the face texture data of the at least one standard three-dimensional model according to the degree of correlation of the current feature vector with the standard feature vector of the at least one standard three-dimensional model comprises:

decomposing the current feature vector into a weighted sum of standard feature vectors of the at least one standard three-dimensional model, wherein the standard feature vector of each standard three-dimensional model has a corresponding weighting coefficient;

the weighting coefficients are used as weighting coefficients for face texture data of the respective standard three-dimensional models, and a weighted sum of the face texture data of the at least one standard three-dimensional model is calculated as face texture data of the base three-dimensional model.

A second aspect of the embodiments of the present application provides an apparatus for generating a three-dimensional model of a face, including:

the target image acquisition module is used for acquiring target image data, and the target image data comprises image characteristic information of a target face and face key point information;

a basic three-dimensional model generation module for generating a basic three-dimensional model of the target face according to the image characteristic information;

the current feature vector generating module is used for generating a current feature vector of the target face according to the face key point information;

a standard three-dimensional model obtaining module for obtaining at least one standard three-dimensional model, wherein each standard three-dimensional model comprises a corresponding standard feature vector, each standard feature vector is used for representing one basic expression of the target face, and each standard three-dimensional model comprises corresponding facial texture data;

the face texture data generation module is used for generating face texture data of the basic three-dimensional model based on the face texture data of the at least one standard three-dimensional model according to the correlation degree of the current feature vector and the standard feature vector of the at least one standard three-dimensional model; and

and the fusion module is used for fusing the face texture data of the basic three-dimensional model with the basic three-dimensional model to obtain a three-dimensional reconstruction model of the target face.

A third aspect of embodiments of the present application provides a computing apparatus, including a memory, a processor, and a computer program stored in the memory and executable on the processor, the processor implementing the method for generating a three-dimensional model of a face as provided in the first aspect of embodiments of the present application when executing the computer program.

A fourth aspect of embodiments of the present application provides a computer-readable storage medium storing a computer program, which when executed by a processor implements the method for generating a three-dimensional model of a face as provided in the first aspect of embodiments of the present application.

A fifth aspect of the embodiments of the present application provides a computer program product, which, when running on a terminal device, causes the terminal device to execute the method for generating a three-dimensional model of a face according to the first aspect of the embodiments of the present application.

It is understood that the beneficial effects of the second aspect to the fifth aspect can be referred to the related description of the first aspect, and are not described herein again.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the embodiments or the prior art descriptions will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise.

FIG. 1 is a flow chart of one embodiment of a method for generating a three-dimensional model of a face as set forth in an embodiment of the present application;

FIG. 2 is a schematic diagram of an image data acquisition and processing system according to an embodiment of the present disclosure;

FIG. 3 is a schematic diagram of a number of key points of a main face proposed by an embodiment of the present application;

FIG. 4 is a schematic diagram of a three-dimensional model of a face with different orientations synthesized by an embodiment of the application;

fig. 5 is a schematic view illustrating a usage flow of the AR interaction system according to an embodiment of the present application;

FIG. 6 is a block diagram of an embodiment of an apparatus for generating a three-dimensional model of a face according to an embodiment of the present application;

fig. 7 is a schematic diagram of a computing device according to an embodiment of the present application.

Detailed Description

In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system structures, techniques, etc. in order to provide a thorough understanding of the embodiments of the present application. It will be apparent, however, to one skilled in the art that the present application may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail. Furthermore, in the description of the present application and the appended claims, the terms "first," "second," "third," and the like are used for distinguishing between descriptions and not necessarily for describing or implying relative importance.

Generating a three-dimensional model of a user's face from an image (photo or video) of the user's face has wide application in scenes such as makeup trials, beauty, AR interactions, animation, and the like. In some processing modes, a basic three-dimensional model of the face is generated according to image characteristic information of a user face image (photo or video), and then face texture data extracted from the photo or video is fused to the basic three-dimensional model to obtain a final three-dimensional model of the face. In the case where available face texture data cannot be extracted from a photograph or video, general face texture data stored in advance is generally used. However, when the facial expressions of users are different or greatly exaggerated, the facial texture data of the users are greatly different, and therefore, the three-dimensional model of the face formed by indiscriminately using the general facial texture data is not good in reality.

In view of this, the present application provides a method, an apparatus, a computing device and a storage medium for generating a three-dimensional model of a face, where the three-dimensional model of a face generated by the method has facial texture data that better conforms to facial expressions, thereby improving the sense of realism of the three-dimensional model of a face. It should be understood that the subject matter of the embodiments of the methods disclosed herein can be implemented in various types of computing devices, including various terminal devices or servers, such as mobile phones, tablet computers, notebook computers, desktop computers, wearable devices, cloud servers, and so on.

Referring to fig. 1, a method for generating a three-dimensional model of a face according to an embodiment of the present application is shown, the method including:

101. acquiring target image data, wherein the target image data comprises image characteristic information of a target face and face key point information;

in the embodiment of the present application, the target face may generally be a face of a user who makes any expression, and the target image data of the target face making the expression may be captured by using a camera. The target image data contains, on the one hand, image feature information (e.g., RGB features, grayscale features, depth features, or the like) of the target face and, on the other hand, face key point information of the target face, e.g., position information of each designated face key point (eyebrow, nose, both eyes and mouth, teeth, or the like). The location information for each facial keypoint is one or more spatial coordinates that can characterize the location and/or morphology of that facial keypoint (e.g., the angle of the eyebrows, the shape of the eyes, the location of the nasal wings, the inclination of the corners of the mouth, the shape of the mouth, etc.). For the key point of the face, the position information may include the coordinates of the head and the tail of the eye, the coordinates of the middle of the upper eyelid, the coordinates of the middle of the lower eyelid, etc., or may include the coordinates of more points on the boundary of the eye. It should be noted that the face key point information may be obtained by detecting when the camera having the face key point detection function captures image data, or may be obtained after the server detects the acquired image data by using a face key point detection algorithm. After the target image data is captured by the camera, the target image data is transmitted to the server, and the server may perform subsequent processing based on the acquired target image data.

In an implementation manner of the embodiment of the present application, the acquiring target image data may include:

acquiring image data of different orientations of the target face synchronously acquired by at least two cameras, and determining the image data of different orientations of the target face as the target image data, wherein the at least two cameras have a face key point detection function and are arranged at different orientations around the target face.

Conventional face keypoint detection can obtain better detection accuracy under the condition that the face faces the camera and is not overlapped or hidden, however, when the face does not face the camera, the visual field of a single camera cannot completely cover all face keypoints, and the accuracy of face keypoint detection is obviously reduced. To address this problem, in the embodiments of the present application, at least two cameras are used to synchronously acquire image data of a target face, and the at least two cameras have a face key point detection function and are respectively arranged at different positions around the target face. For example, if two cameras are used, one camera may be on the front left side of the face and the other camera may be on the front right side of the face; if four cameras are used, one camera may be directed to the front of the face, one camera directed to the back of the face, one camera directed to the left of the face, one camera directed to the right of the face, and so on. In order to ensure higher face key point detection accuracy, more than four cameras can be generally arranged around the head of the target object, the shooting range of each camera is adjusted to completely cover all shooting angles of the face of the target object, 360-degree dead-angle-free shooting is realized, and all face key points can be shot even if the face moves.

In one embodiment, the acquiring image data of different orientations of the target face synchronously acquired by at least two cameras may include:

and receiving image data of different directions of the target face through a network switch, wherein the at least two cameras collect the synchronously acquired image data to the network switch.

Each camera and the server can be networked through a network switch, each camera has a fixed IP address, the server also has an IP address, each camera can collect the synchronously acquired image data to the network switch, and the server can acquire corresponding image data through the network switch. The embodiment of the present application does not limit the implementation manner of implementing synchronous acquisition for each camera, for example, one acquisition synchronization device may be configured to control each camera to start shooting simultaneously, and after image data acquired by each camera is acquired, synchronization processing may be performed according to timestamps included in the image data, and so on.

In order to facilitate understanding of the manner of acquiring target image data proposed in the embodiments of the present application, a practical application scenario is listed below.

As shown in fig. 2, a schematic diagram of an image data acquisition and processing system provided in the embodiment of the present application is shown, where the system includes cameras (4 or more), acquisition synchronization equipment, a network switch, a server, a touch display screen, a sound box, and other equipment.

In fig. 2, 4 cameras (which may be web cameras with face key point detection function) are uniformly arranged around the center of a circle with a target object as the center of the circle, the distance between each camera and the target object can be kept at about 5.5 meters, the installation height of the camera can be kept at about 2.8 meters, so as to ensure the best face key point detection view angle, and the shooting range can completely cover the target face by adjusting parameters such as the focal length of each camera. In actual operation, each camera can be internally provided with a TOF functional module and a real-time face key point detection algorithm, so that on one hand, image feature information such as RGB and depth of the face can be collected in real time, and on the other hand, face key points (including eyebrows, a nose, two ears, two eyes, a mouth, teeth and the like) can be detected according to the image feature information. FIG. 3 is a schematic diagram of several key points of the main face.

The 4 cameras can be connected with a network switch in a network mode, the network switch is connected with a server, each camera has a fixed IP address, the server also has a fixed IP address, and the network switch is used for realizing networking and data exchange of all network equipment. In addition, the system is also connected with a collecting and synchronizing device for solving the problem that the images collected by different cameras are not synchronous. The time for starting shooting of different cameras may be different, so that the images captured by different cameras may not be at the same time, which is not beneficial to the subsequent processing of face key point detection, and therefore, a capture synchronization device needs to be added to control each camera to shoot images synchronously. The acquisition synchronization equipment sends a time synchronization instruction to all the cameras through the network switch, and each camera sends feedback information to the acquisition synchronization equipment after receiving the time synchronization instruction, so that the cameras are ready. And if the acquisition synchronization equipment does not receive the feedback information returned by all the cameras, the acquisition synchronization equipment continuously sends a time synchronization instruction until the feedback information returned by all the cameras is received. Then, the acquisition synchronization equipment sends a control instruction for starting shooting to all the cameras, and controls all the cameras to synchronously acquire image data in real time, so that the image data of different directions of the target face are obtained.

In practical operation, a three-dimensional point cloud map of the place where the target object is located may be constructed in advance by a three-dimensional scanning and positioning device (e.g., a laser radar or other mobile scanning system), so that a spatial coordinate system may be obtained. The three-dimensional space of the field can be obtained through reconstruction of the space coordinate system, and in addition, the space coordinate system can also be used for calibrating each camera with the TOF function. After the calibration of each camera is completed, the acquired image feature information of RGB, depth and the like of the target face can be represented in the same spatial coordinate system, and the position information of the detected key point of the face can also be represented in the same spatial coordinate system.

The image data of the target face collected by each camera is collected to a network switch and then transmitted to a server, and the server can realize the synthesis of the three-dimensional face model and the fusion processing of the three-dimensional face model and the face texture data based on the image data. In addition, the synthesized face three-dimensional model result can be displayed through a touch display screen connected with the server, and the server can be connected with a sound box and used for outputting corresponding prompts in a voice mode.

102. Generating a basic three-dimensional model of the target face according to the image characteristic information;

after the server acquires the target image data, the server can extract image characteristic information in the target image data, and then the image characteristic information is spliced and fused into a real-time facial basic three-dimensional model by adopting a real-time splicing algorithm.

In an implementation manner of the embodiment of the present application, the generating a basic three-dimensional model of the target face according to the image feature information may include:

(1) extracting a color feature image frame and a depth feature image frame from the target image data;

(2) performing image feature extraction processing on the color feature image frame and the depth feature image frame, and synthesizing a basic three-dimensional model of the target face based on the extracted image features.

The server respectively extracts image features of the color feature image frames and the depth feature image frames, and synthesizes a basic three-dimensional model of the target face based on the extracted image features (the color features extracted from the color feature image frames and the depth features extracted from the depth feature image frames). Compared with a mode of synthesizing a three-dimensional face model by simply using color features, the method has the advantage that the accuracy of three-dimensional model synthesis and face key point detection can be effectively improved by adding the depth features of the image. In addition, the target image data may include video data of different orientations of the target face collected by a plurality of cameras with different orientations, and the server may adopt an image stitching algorithm to stitch and fuse image features extracted from the image data of different orientations, so as to obtain a three-dimensional model of the face basis with fine details in different orientations, as shown in fig. 4.

In an implementation manner of the embodiment of the present application, each of the image data of the target face in different orientations includes a unique identifier and position information of each face key point. As described above, the location information of each facial keypoint is one or more spatial coordinates that can characterize the location and/or morphology of that facial keypoint (e.g., the angle of the eyebrows, the shape of the eyes, the location of the nose wing, the inclination of the corners of the mouth, the shape of the mouth, etc.). Due to the problem of the shooting angle, each image data of different orientations may not include all the position information of a certain face key point, but all the position information of the face key point, that is, the target space coordinates thereof, may be found from a plurality of image data shot at different orientations.

By arranging a plurality of cameras with different orientations, each camera can shoot image data of one orientation, and each image data contains a unique mark and a space coordinate of each face key point. The unique mark is used to distinguish different face key points, and for example, the name of the face key point may be used as the unique mark, "vertex", "left eye", "nose" …, or a number may be used as the unique mark, 1 indicates vertex, 2 indicates left eye, and 3 indicates right eye …. It should be noted that, for different image data, the unique labels for the same face key point should be the same, so that the actual positions of the face key points can be accurately calculated in the following. Assuming that there are N cameras with different orientations, each face key point may have N/group spatial coordinates (corresponding to image data of N different orientations), and since the spatial positions of the different cameras and the accuracy of the face key point detection algorithm are different, the corresponding spatial coordinate values in the N/group spatial coordinates are generally different, and may be the position coordinates of some discrete points with close distances in space, a central point of the discrete points may be obtained through an algorithm (e.g., averaging), and finally the central point is used to reflect and represent the actual position of the detected face key point, and the coordinate of the central point is one of the target spatial coordinates or one or more spatial coordinates included in the target spatial coordinates.

In an embodiment, the calculating the target spatial coordinates of each face key point according to the spatial coordinates of each face key point included in each image data may include:

and calculating the average value of the space coordinates of the target face key point in each image data aiming at any one target face key point in each face key point, and determining the average value as the target space coordinates of the target face key point. Here, it is explained as an example that the position information of the face key point includes one target spatial coordinate value. If the position information of the key points of the face comprises a plurality of target space coordinate values (for example, for eyes, the position information may comprise eye head position coordinates, eye tail position coordinates and upper and lower eyelid middle position coordinates), the above formula is adopted for each target space coordinate.

The coordinates of the center point may be determined by averaging spatial coordinates, for example, for the "top" of the face key point, N image data may be obtained by N cameras of different orientations, with N different spatial coordinates. And calculating the average value of the N different space coordinates to obtain the center point coordinate of the head top, and determining the center point coordinate as the target space coordinate of the head top. By analogy, the target space coordinates of all the face key points can be obtained.

In addition, it should be noted that some cameras may not be able to detect all face key points completely due to the problem of shooting angle occlusion, for example, only 5 face key points (assuming a total of 10 face key points) can be detected. Therefore, N pieces of image data can be obtained by N cameras with different orientations, but only N-2 pieces of spatial coordinates may exist for a certain face key point (the face key point cannot be detected by two cameras), and at this time, the center point coordinates of the N-2 pieces of spatial coordinates can still be calculated as the target spatial coordinates of the face key point.

103. And generating the current feature vector of the target face according to the face key point information.

After obtaining the face key point information from the target image data, the server may obtain the current feature vector of the target face by vectorizing the face key point information of each face key point. The vectorization method of the face key point information may employ various known information vectorization methods. The generated current feature vector of the target face characterizes the current facial expressive features and/or approximate appearance features of the target face through morphology, position data of facial five sense organs.

104. Obtaining at least one standard three-dimensional model, wherein each standard three-dimensional model comprises corresponding standard feature vectors, each standard feature vector is used for representing a basic expression, and each standard three-dimensional model comprises corresponding facial texture data.

In one embodiment, the at least one standard three-dimensional model may be one or more pre-stored characterizing three-dimensional models. For example, a plurality of standard three-dimensional models of a face are stored in advance, each standard three-dimensional model having a basic expression (e.g., smile, anger, surprise, laugh, sadness, frown, blankness, etc.). The standard three-dimensional model with one underlying expression may also include a plurality of, each having different appearance characteristics, such as different facial shapes and/or facial features. In one example, facial image data (including facial keypoint information and facial texture data) for multiple users may be pre-collected, with each user making various underlying expressions and collecting them separately. And then generating each standard three-dimensional model according to the acquired face image data, wherein a standard feature vector is generated by vectorizing the face key point information, and the standard feature vector and the corresponding face texture data are stored in association with the standard three-dimensional model. In one example, in choosing the acquisition object, users with different characteristics of appearance characteristics may be chosen.

In another embodiment, the at least one standard three-dimensional model may be generated from the base three-dimensional model and the current feature vector. For example, the at least one standard three-dimensional model is obtained as an output of the first type of function model by taking the base three-dimensional model and the current feature vector as inputs of the first type of function model. The first type of function model may include a plurality of function models, each function model corresponds to a basic expression, and a standard three-dimensional model of a face with a corresponding basic expression, a standard feature vector of the standard three-dimensional model, and facial texture data may be obtained by inputting a basic three-dimensional model of the face and a current feature vector into each function model.

The first class of function models can be built by machine learning. In one example, the training samples of the first type of function model are a plurality of data samples, each data sample includes a three-dimensional model of a face and a corresponding current feature vector, and each data sample is indexed to a standard three-dimensional model of the face with a base expression and a corresponding standard feature vector and corresponding facial texture data. Face image data (including face keypoint information and face texture data) for multiple users may be pre-collected. For each user, making various basic expressions and respectively collecting the basic expressions to obtain facial image data 1, making various arbitrary non-basic expressions and respectively collecting the non-basic expressions to obtain facial image data 2, then obtaining a data sample for training according to the facial image data 2, and obtaining an index of the data sample according to the facial image data 1. And training the original model by using a plurality of data samples to obtain a plurality of function models of the first type of function model.

For a plurality of function models of the trained first-class function model, respectively inputting a basic three-dimensional model and a current feature vector of the target face, and obtaining a plurality of standard three-dimensional models of the target face, wherein each standard three-dimensional model has a basic expression and comprises a corresponding standard feature vector and face texture data.

105. And generating face texture data of the basic three-dimensional model based on the face texture data of the at least one standard three-dimensional model according to the correlation degree of the current feature vector and the standard feature vector of the at least one standard three-dimensional model.

In one embodiment, generating the face texture data of the base three-dimensional model based on the face texture data of the at least one standard three-dimensional model according to a degree of correlation of the current feature vector with a standard feature vector of the at least one standard three-dimensional model comprises:

determining a standard three-dimensional model with the minimum value of a difference vector between a standard feature vector and the current feature vector in the at least one standard three-dimensional model; and

and determining the face texture data of the standard three-dimensional model as the face texture data of the basic three-dimensional model.

In this embodiment, the feature vector represents the expression and/or appearance features of the face, and by comparing the current feature vector with the standard feature vector and selecting the standard three-dimensional model corresponding to the standard feature vector having the smallest difference vector between the current feature vector and the standard feature vector, it is equivalent to selecting the standard three-dimensional model having the largest similarity/correlation with the expression and/or appearance features of the target face in the target image data, and the facial texture data of such standard three-dimensional model is most suitable for the target face in the target image data, so that the fused three-dimensional facial reconstruction model has a higher sense of realism.

In another embodiment, generating face texture data for the base three-dimensional model based on the face texture data for the at least one standard three-dimensional model according to a degree of correlation of the current feature vector with a standard feature vector of the at least one standard three-dimensional model comprises:

For example, assume that the current feature vector of the base three-dimensional model is a ═ a1, a2, a3, … …, an ], where n > 1. There are M-1 standard three-dimensional models (M >1) whose standard feature vectors are as follows:

B＝[b1，b2，b3，……，bn]，

C＝[c1，c2，c3，……，cn]，

……

M＝[m1，m2，m3，……，mn]。

the current feature vector can be decomposed as a into a weighted sum of the M-1 standard feature vectors:

A＝B0*B+C0*C+……+M0*M

wherein B0, C0, … … M0 are weighting coefficients which also characterize the similarity/correlation of the current feature vector and each standard feature vector, and thus between the base three-dimensional model and each standard three-dimensional model. Therefore, it is possible to apply these weighting coefficients to the face texture data of each standard three-dimensional model and to take the weighted sum of the face texture data of each standard three-dimensional model as the face texture data of the base three-dimensional model. Assuming that the face texture data of each standard three-dimensional model is B1, C1, … … and M1, respectively, the face texture data a1 of the base three-dimensional model is B0 × B1+ C0 × C1+ … … + M0 × M1.

In this embodiment, the weighting coefficients of the standard three-dimensional models are obtained by decomposing the current feature vector of the basic three-dimensional model of the target face into the weighted sum of the standard feature vectors, and the weighting coefficients are applied to the corresponding face texture data and the weighted sum is obtained, so that the face texture data of the basic three-dimensional model is obtained. The facial texture data generated by this method more closely conforms to the underlying three-dimensional model of the target face.

106. And fusing the face texture data of the basic three-dimensional model with the basic three-dimensional model to obtain a three-dimensional reconstruction model of the target face.

Although the basic three-dimensional model of the target face obtained by the processing in 102 has color and texture features, when the definition and resolution of the target image data are not ideal, the color and texture features may not be clear and real enough, and in this case, the face texture data generated by the processing in 105 needs to be fused on the basic three-dimensional model to obtain a higher sense of reality. In addition, even if the target image data is sufficiently clear, when the user wishes to employ the regenerated face texture data, the face texture data generated by the processing in 105 can be fused onto the base three-dimensional model. The fusion can be implemented by texture mapping, i.e. covering the face texture data on the surface of the basic three-dimensional model. Of course, other texture coloring methods can be used to achieve the above fusion.

In order to more clearly illustrate the method for generating the three-dimensional model of the face according to the embodiment of the present application, a practical application scenario is listed below.

The method for generating a three-dimensional model of a face provided in the embodiment of the present application can be applied to an AR interaction system, each device included in the system can refer to fig. 2, a flow diagram of using the AR interaction system is shown in fig. 5, and each step shown in fig. 5 is described as follows:

s501, a user makes various head motions and various facial expressions, and image data of the user, such as video data, are collected in real time in multiple directions through a plurality of TOF cameras around the head of the user, wherein the video data comprise RGB (red, green and blue) features, depth features and facial key point position information of an image of the user;

s502, collecting video data collected by a plurality of TOF cameras to a network switch, and transmitting the video data to a server;

s503, decomposing video frames in video data from different cameras by a server to obtain several groups of key expression data frames, wherein each group of key expression data frames is a combination of facial image features acquired from different cameras, and the expression forms of the expression data frames are facial three-dimensional models for executing several key expressions;

s504, the server obtains image feature information of the video data, identifies face key point information in the video data, generates a basic three-dimensional model corresponding to the key expression frames one by one, and fuses texture data generated in the mode to the basic three-dimensional model so as to obtain a face three-dimensional reconstruction model; the three-dimensional face reconstruction model corresponds to each group of key expression frames, namely when a user in the video makes head movements or facial expressions, the three-dimensional face reconstruction model changes correspondingly, and therefore interaction with the user is achieved.

In summary, in the embodiment of the present application, cameras are arranged in a plurality of orientations around the head of a target object, image feature information of different orientations of the target face of the target object and face key point information are collected in real time, and a corresponding face basic three-dimensional model is synthesized by image feature extraction and fusion, so that reconstruction of a face three-dimensional model is achieved. In addition, suitable face texture data are generated based on the face key point information and are fused on the basic three-dimensional model, so that the reality sense of the face three-dimensional model is stronger.

It should be understood that, the sequence numbers of the steps in the foregoing embodiments do not imply an execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present application.

The above mainly describes a method for generating a three-dimensional model of a face, and an apparatus for generating a three-dimensional model of a face will be described below.

Referring to fig. 6, an embodiment of an apparatus for generating a three-dimensional model of a face according to an embodiment of the present application includes:

a target image data obtaining module 601, configured to obtain target image data, where the target image data includes image feature information of a target face and face key point information;

a basic three-dimensional model generating module 602, configured to generate a basic three-dimensional model of the target face according to the image feature information;

a current feature vector generating module 603, configured to generate a current feature vector of the target face according to the face keypoint information;

a standard three-dimensional model obtaining module 604 for obtaining at least one standard three-dimensional model, wherein each standard three-dimensional model includes corresponding standard feature vectors, each standard feature vector is used for representing a basic expression of the target face, and each standard three-dimensional model includes corresponding facial texture data;

a face texture data generating module 605, configured to generate face texture data of the base three-dimensional model based on the face texture data of the at least one standard three-dimensional model according to a degree of correlation between the current feature vector and a standard feature vector of the at least one standard three-dimensional model; and

and a fusion module 606, configured to fuse the face texture data of the basic three-dimensional model with the basic three-dimensional model to obtain a three-dimensional reconstruction model of the target face.

In an implementation manner of the embodiment of the present application, the standard three-dimensional model obtaining module 604 includes a standard three-dimensional model generating module, configured to use the basic three-dimensional model and the current feature vector as inputs of a first-class function model, and obtain the at least one standard three-dimensional model as an output of the first-class function model. The training samples of the first type of function model are a plurality of data samples, each data sample comprises a three-dimensional face model and a corresponding current feature vector, the index of each data sample is at least one type of standard three-dimensional model, a corresponding standard feature vector and corresponding facial texture data, and each type of standard three-dimensional model represents one basic expression.

In an implementation manner of the embodiment of the present application, the face texture data generating module 605 includes:

a comparing unit, configured to determine a standard three-dimensional model with a minimum value of a difference vector between a standard feature vector and the current feature vector in the at least one standard three-dimensional model; and

and the determining unit is used for determining the face texture data of the standard three-dimensional model as the face texture data of the basic three-dimensional model.

In another implementation manner of the embodiment of the present application, the facial texture data generating module 605 includes:

a decomposition unit for decomposing the current feature vector into a weighted sum of standard feature vectors of the at least one standard three-dimensional model, wherein the standard feature vector of each standard three-dimensional model has a corresponding weighting coefficient;

a generating unit for using the weighting coefficients as weighting coefficients of face texture data of the respective standard three-dimensional models and calculating a weighted sum of the face texture data of the at least one standard three-dimensional model as face texture data of the base three-dimensional model.

In another implementation manner of the embodiment of the present application, the target image data obtaining module 601 obtains the target image data by:

In one example, the acquiring image data of different orientations of the target face synchronously acquired by at least two cameras includes:

In one example, the image face key point information includes a unique identifier of each face key point and a target space coordinate of each face key point, wherein the target image data acquisition module 601 further includes:

and the target space coordinate determination unit is used for calculating the target space coordinate of each face key point according to the space coordinates of each face key point in the image data of the target face in different directions.

In one example, the target spatial coordinate determination unit implements the calculation of the target spatial coordinates by:

and calculating the average value of the spatial coordinates of any one face key point in the image data of the target face in different directions, and determining the average value as the target spatial coordinate of the face key point.

In another implementation manner of the embodiment of the present application, the basic three-dimensional model generation module 602 includes:

an extraction unit configured to extract a color feature image frame and a depth feature image frame from the target image data;

a synthesizing unit configured to perform image feature extraction processing on the color feature image frame and the depth feature image frame, and synthesize a three-dimensional model of the target face based on the extracted image features.

Embodiments of the present application further provide a computer-readable storage medium, which stores a computer program, and when the computer program is executed by a processor, the computer program implements any one of the methods for generating a three-dimensional model of a face as shown in fig. 1.

Embodiments of the present application further provide a computer program product, which when run on a terminal device, causes the terminal device to execute a method for implementing any one of the methods for generating a three-dimensional model of a face as shown in fig. 1.

Fig. 7 is a schematic diagram of a computing device according to an embodiment of the present application, which may be, for example, a server as described above, or other computing devices, such as a general computer, a tablet computer, a terminal device, and the like. As shown in fig. 7, the computing device 7 of this embodiment includes: a processor 70, a memory 71 and a computer program 72 stored in said memory 71 and executable on said processor 70. The processor 70, when executing the computer program 72, implements the steps in the various embodiments of the method of generating a three-dimensional model of a face described above, such as the steps 101 to 106 shown in fig. 1. Alternatively, the processor 70, when executing the computer program 72, implements the functions of each module/unit in each device embodiment described above, for example, the functions of the modules 601 to 606 shown in fig. 6.

The computer program 72 may be divided into one or more modules/units, which are stored in the memory 71 and executed by the processor 70 to accomplish the present application. The one or more modules/units may be a series of computer program instruction segments capable of performing specific functions, which are used to describe the execution of the computer program 72 in the computing device 7.

The Processor 70 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field-Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The memory 71 may be an internal storage unit of the computing device 7, such as a hard disk or a memory of the computing device 7. The memory 71 may also be an external storage device of the computing device 7, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, provided on the computing device 7. Further, the memory 71 may also include both an internal storage unit and an external storage device of the computing apparatus 7. The memory 71 is used for storing the computer program and other programs and data required by the server. The memory 71 may also be used to temporarily store data that has been output or is to be output.

It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-mentioned division of the functional units and modules is illustrated, and in practical applications, the above-mentioned function distribution may be performed by different functional units and modules according to needs, that is, the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-mentioned functions. Each functional unit and module in the embodiments may be integrated in one processing unit, or each unit may exist alone physically, or two or more units are integrated in one unit, and the integrated unit may be implemented in a form of hardware, or in a form of software functional unit. In addition, specific names of the functional units and modules are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working processes of the units and modules in the system may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and reference may be made to the related descriptions of other embodiments for parts that are not described or illustrated in a certain embodiment.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described system embodiments are merely illustrative, and for example, the division of the modules or units is only one logical division, and there may be other divisions when actually implemented, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, all or part of the flow in the method of the embodiments described above can be realized by a computer program, which can be stored in a computer-readable storage medium and can realize the steps of the embodiments of the methods described above when the computer program is executed by a processor. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, and the like. It should be noted that the computer readable medium may contain other components which may be suitably increased or decreased as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer readable media which may not include electrical carrier signals and telecommunications signals in accordance with legislation and patent practice.

The above-mentioned embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present application and are intended to be included within the scope of the present application.

Claims

1. A method of generating a three-dimensional model of a face, comprising:

2. The method of claim 1, wherein the generating facial texture data for the base three-dimensional model based on the facial texture data for the at least one standard three-dimensional model based on the correlation of the current feature vector with the standard feature vector for the at least one standard three-dimensional model comprises:

3. The method of claim 1, wherein said acquiring target image data comprises:

4. The method of claim 3, wherein said acquiring image data of different orientations of the target face synchronously acquired by at least two cameras comprises:

5. The method of claim 3, wherein the image face keypoint information comprises a unique identification of the respective face keypoints and target spatial coordinates for each face keypoint, wherein the method further comprises:

and calculating the target space coordinates of each face key point according to the space coordinates of each face key point in the image data of the target face in different directions.

6. The method of claim 5, wherein said calculating the target spatial coordinates of each of said facial key points based on the spatial coordinates of each of said facial key points in said image data for different orientations of said target face comprises:

7. The method of claim 1, wherein said generating a base three-dimensional model of the target face from the image feature information comprises:

extracting a color feature image frame and a depth feature image frame from the target image data;

performing image feature extraction processing on the color feature image frame and the depth feature image frame, and synthesizing a three-dimensional model of the target face based on the extracted image features.

8. An apparatus for generating a three-dimensional model of a face, comprising:

9. A computing device comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor implements the method of generating a three-dimensional model of a face of any one of claims 1 to 7 when executing the computer program.

10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the method of generating a three-dimensional model of a face according to any one of claims 1 to 7.