CN113628327A

CN113628327A - Head three-dimensional reconstruction method and equipment

Info

Publication number: CN113628327A
Application number: CN202110921998.9A
Authority: CN
Inventors: 刘帅; 任子健; 吴连朋
Original assignee: Juhaokan Technology Co Ltd
Current assignee: Juhaokan Technology Co Ltd
Priority date: 2021-08-12
Filing date: 2021-08-12
Publication date: 2021-11-09
Anticipated expiration: 2041-08-12
Also published as: CN113628327B

Abstract

The application relates to the technical field of three-dimensional reconstruction, and provides a method and equipment for three-dimensional reconstruction of a head, in particular, a face image is obtained by carrying out face recognition on an original image; extracting driving parameters from the face image, and driving a pre-constructed parameterized head model to move by using the extracted driving parameters to obtain a driven head three-dimensional geometric model; performing semantic segmentation on the face image to obtain independent sub-regions; respectively completing textures of each subarea according to the difference value of the color value of the pixel point of each subarea and the color value of the mirror image pixel point by considering the symmetry of the human face and the consistency of the skin color; fusing each supplemented sub-area to obtain complete texture data of the face; and rendering the head three-dimensional geometric model according to the complete texture data, thereby improving the authenticity of the head dense surface model after reconstruction.

Description

Head three-dimensional reconstruction method and equipment

Technical Field

The present application relates to the field of three-dimensional reconstruction technologies, and in particular, to a method and an apparatus for three-dimensional reconstruction of a head.

Background

The head reconstruction has wide application in aspects of game role modeling, virtual reality application, virtual fitting, personalized statue customization and the like. The head reconstruction method comprises the following steps: 1) professional modeling software (such as Maya, 3Ds Max and the like) is adopted for head modeling, so that a user needs to deeply understand the software and master related art knowledge, and personalized customization is difficult; 2) the head data collected by professional three-dimensional laser scanning equipment is used for reconstruction, and the laser scanning equipment cannot be popularized due to high cost; 3) the head model in the driving database (head database) moves to be reconstructed, and the method is more beneficial to popularization and application compared with the former two modes.

A three-dimensional model of the human head can be reconstructed from a single picture by using the parameterized head model to generate a head database. The parameterized head model is to perform dimensionality reduction analysis (such as principal component analysis or network self-coding) on pre-acquired high-precision three-dimensional human head data to obtain a group of basis functions, and perform linear or nonlinear mixing on the group of basis functions to generate different head models. Wherein the mixed parameters of the basis functions are used as the parameterized expression of the human head.

Generally, when a parameterized head model is used to reconstruct a three-dimensional head model from a single picture, the geometric shape of the head will maintain a complete topological shape, but the texture data of the head is limited by the shooting angle of a camera, and the texture data of a human face cannot be completely acquired, so that the reconstructed three-dimensional head model (especially a side face region) has distortion, and the three-dimensional expression effect of the reconstructed model is affected.

At present, two main technical schemes for solving distortion of a three-dimensional head model are provided, one scheme is that a face image with a shielding artifact and a flaw, which is rotated from any angle to a current angle by a 3D head model, is rendered through a single image, so that a training data pair is constructed with an original image to form a self-supervision training deep learning model, a side face generation and a face completion are performed based on the trained deep learning model, two times of face texture acquisition, two times of three-dimensional space rotation and two times of rendering are used in a reconstruction process, original texture details and illumination are reserved, but a calculation process is complex, the real-time operation cannot be performed at present, and the applicability in a real-time communication scene is poor; and secondly, performing region fusion on a plurality of texture pictures acquired by a plurality of cameras according to a certain mode to obtain complete face texture data, wherein the texture fusion effect is limited by the arrangement of acquisition equipment, the selection of acquisition scenes and the like, and the texture fusion is performed only on images acquired by the cameras arranged at multiple angles, so that texture deviation occurs when the face moves rapidly, and the face cannot be completed.

Disclosure of Invention

The embodiment of the application provides a method and equipment for three-dimensional reconstruction of a head, which are used for completing texture data of a human face and improving the authenticity of a head reconstruction model.

In a first aspect, an embodiment of the present application provides a method for three-dimensional reconstruction of a head, including:

acquiring an original image acquired by a camera, and performing face recognition on the original image to obtain a face image;

extracting driving parameters from the face image, and driving a parameterized head model to move by using the extracted driving parameters to obtain a driven head three-dimensional geometric model, wherein the parameterized head model is constructed in advance based on the head parameters extracted from the initial face image;

performing semantic segmentation on the face image to obtain independent sub-regions;

respectively performing texture completion on each subarea according to the difference value between the color value of the pixel point of each subarea and the color value of the mirror image pixel point;

fusing each supplemented sub-area to obtain complete texture data of the face;

and rendering the head three-dimensional geometric model according to the complete texture data to obtain a reconstructed head dense surface model.

In a second aspect, an embodiment of the present application provides a reconstruction device, including a memory, a processor;

the memory configured to store computer program instructions and a preset parameterized head model;

the processor configured to perform the following operations in accordance with the computer program instructions:

fusing each supplemented sub-area to obtain complete texture data of the face;

In a third aspect, the present application provides a computer-readable storage medium storing computer-executable instructions for causing a computer to execute a head three-dimensional reconstruction method provided in an embodiment of the present application.

In the above embodiment of the present application, the face image is obtained by recognizing the original image, the parameterized head model pre-constructed based on the head parameters is driven to move by using the driving parameters extracted from the face image, so as to obtain the three-dimensional head geometric model, and semantically analyzing the face image to obtain independent sub-regions, considering the deviation between the face angle and the camera angle, to a certain extent, the texture data of the face (especially the side face area) is lost, so that the human face is identified by utilizing the symmetry and skin color consistency of the human face, according to the difference between the color value of the pixel point of each sub-region and the color value of the mirror image pixel point, the texture completion is carried out on each sub-region respectively to obtain the complete texture data of the human face, the head three-dimensional geometric model is rendered based on the complete texture data, the authenticity of the reconstructed head dense surface model is improved, and further the immersion sense of remote three-dimensional interaction is enhanced.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to these drawings without inventive exercise.

FIG. 1 illustrates a reconstruction system architecture diagram provided by embodiments of the present application;

fig. 2 is a flowchart illustrating a method for three-dimensional reconstruction of a head according to an embodiment of the present application;

FIG. 3 is a diagram illustrating the relationship between three head parameters and a head model provided by an embodiment of the present application;

FIG. 4a is a schematic diagram illustrating semantic segmentation of a human face provided by an embodiment of the present application;

FIG. 4b is a schematic diagram illustrating another human face semantic segmentation provided by an embodiment of the present application;

FIG. 5 is a schematic diagram illustrating a determination of a hole region according to an embodiment of the present application;

FIG. 6 is a schematic diagram illustrating another hole area determination provided by an embodiment of the present application;

FIG. 7 is a complete flow chart illustrating completion of texture data provided by an embodiment of the present application;

fig. 8 is a diagram illustrating a hardware structure of a reconstruction device according to an embodiment of the present application.

Detailed Description

To make the objects, embodiments and advantages of the present application clearer, the following description of exemplary embodiments of the present application will clearly and completely describe the exemplary embodiments of the present application with reference to the accompanying drawings in the exemplary embodiments of the present application, and it is to be understood that the described exemplary embodiments are only a part of the embodiments of the present application, and not all of the embodiments.

All other embodiments, which can be derived by a person skilled in the art from the exemplary embodiments described herein without inventive step, are intended to be within the scope of the claims appended hereto. In addition, while the disclosure herein has been presented in terms of one or more exemplary examples, it should be appreciated that aspects of the disclosure may be implemented solely as a complete embodiment.

It should be noted that the brief descriptions of the terms in the present application are only for the convenience of understanding the embodiments described below, and are not intended to limit the embodiments of the present application. These terms should be understood in their ordinary and customary meaning unless otherwise indicated.

Furthermore, the terms "comprises" and "comprising," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a product or device that comprises a list of elements is not necessarily limited to those elements explicitly listed, but may include other elements not expressly listed or inherent to such product or device.

The term "module," as used herein, refers to any known or later developed hardware, software, firmware, artificial intelligence, fuzzy logic, or combination of hardware and/or software code that is capable of performing the functionality associated with that element.

The embodiment of the application provides a method and equipment for three-dimensional head reconstruction, which are characterized in that a face image is subjected to semantic segmentation to distinguish subregions such as eyebrows, eyes, a nose, lips, ears, cheeks, hair and accessories, the missing texture data in each subregion are complemented by utilizing face symmetry and skin color consistency, the complemented texture data of each subregion are fused to obtain complete texture data of a face, and a driven three-dimensional head set model is subjected to texture rendering on the basis of a parameterized head model, so that the reality sense of a reconstructed model is improved under the condition that the topological shape of the head is not changed. Compared with the deep learning algorithm for solving the head distortion, the method does not need to construct training samples in advance for model training, and has simple calculation process and strong real-time performance; compared with the fusion of a plurality of images shot by a plurality of cameras, the method reduces the equipment configuration requirement and is suitable for the scene of real-time motion.

Explanations are given below of terms in the embodiments of the present application.

Two-color reflection model: the method is used for describing the physical illumination phenomenon of the surface of a non-homogeneous object, light rays can be subjected to diffuse reflection and specular reflection after being reflected by the surface of the object, and the spectral components of the reflected light are determined by the spectral components of the two parts of light. In the embodiment of the application, the human body is a non-homogeneous object.

Diffuse reflection: the spectral components of the light which is incident on the surface of the object and returns to the surface of the object after being reflected and absorbed for many times are determined by the reflection characteristics of the material of the object.

Specular reflection: is the direct reflection of incident light on the surface of the object, and is related to the orientation and coarse overrun of the surface of the object relative to the position of the light source, and the spectral components of the incident light are similar to those of the light source.

Embodiments of the present application are described in detail below with reference to the accompanying drawings.

FIG. 1 illustrates a reconstruction system architecture diagram provided by an embodiment of the present application; as shown in fig. 1, the camera 100 acquires an original image of the target object during the movement in real time, and transmits the acquired original image to the reconstruction device 200 in a wired or wireless manner, and the reconstruction device 200 reconstructs a head dense surface model based on the received original image.

It should be noted that the reconstruction device 200 provided in the embodiment of the present application is only an example, and includes, but is not limited to, a display terminal with an interactive function, such as a laptop, a desktop, a tablet, a smart phone, and VR/AR glasses with an interactive function.

It should be noted that, for a reconstruction apparatus with a camera, an original image of the target object may also be acquired by the reconstruction apparatus.

Fig. 2 schematically shows a flowchart of a head three-dimensional reconstruction method provided by an embodiment of the application, and as shown in fig. 2, the flowchart is executed by a reconstruction device and mainly includes the following steps:

s201: and acquiring an original image acquired by a camera, and performing face recognition on the original image to obtain a face image.

In S201, to solve the distortion of the head model, the original image needs to be segmented and recognized to obtain a face image. The acquisition method of the face image includes, but is not limited to, a Model detection method (e.g., Hidden Markov Model (HMM), Support Vector Machine (SVM), etc.), edge feature detection (e.g., Canny edge detection, Sobel edge detection, etc.), a statistical theory method (e.g., bayesian learning method, K-means clustering method, etc.), and the like. Some embodiments of the present application consider the influence of factors such as illumination and posture during image acquisition, and also use a Convolutional Neural Network (CNN) (e.g., dlib, libfacedetection, and other open source libraries) to perform face recognition and output.

In some embodiments, the image capture process is affected by capturing scene illumination, the illumination intensity of the human body surface is actually a linear combination of diffuse reflected light and specular reflected light, wherein the diffuse reflected light represents the color of the human body surface and the specular reflected light represents the chromaticity of the light source. Because the material of the human body surface and the angles between each organ component and the light source are different, the brightness of different parts of the human body is different. In order to reduce the influence of illumination on the texture data, it is necessary to perform a highlight removal process on the face image.

An optional mode is that after the light source chromaticity is estimated by using the specular reflection, highlight removal is performed on the face image by combining a bicolor reflection model and color information of the face image.

Another optional mode is to establish a highlight distribution map data set of the face image in advance, learn a relationship between specular reflection and diffuse reflection by using a cyclic antagonistic learning generation network (CycleGAN), and further perform highlight removal on the face image based on the relationship between specular reflection and diffuse reflection.

S202: and extracting driving parameters from the face image, and driving the parameterized head model to move by using the extracted driving parameters to obtain a driven head three-dimensional geometric model.

In S202, a parameterized head model is pre-constructed based on the head parameters extracted from the initial face image. The classical parameterized head Model mainly comprises a three-dimensional deformable Face Model (3D deformable Face Model, 3DMM), a FLAME Model and the like, head parameters mainly comprise head shape parameters, facial expression parameters and head pose parameters, and the shape of the Face can be regarded as a result of the combined action of the three parameters.

The parameterized head model expresses a human head model with the characteristic of real-time non-rigid deformation through a small amount of parameters, can generate a head three-dimensional geometric model with consistent topology on the basis of a single picture, and is not influenced by geometric deficiency of an invisible area.

In S202, a parameterized head model constructed based on the FLAM model is employed. The parameterized head model is composed of two parts, namely a standard Linear Blend Skin (LBS) and a Blend Shape (Blend Shape), wherein the number of grid vertexes in the adopted standard grid model is N-5023, and the number of joints is K-4 (respectively located on the neck, the lower jaw and two eyeballs). The parameterized head model formula is:

wherein the content of the first and second substances,

the parameters of the shape of the head are represented,

representing head pose parameters (including motion parameters of the head skeleton),

are facial expression parameters.

One vertex coordinate of the head three-dimensional geometric model can be uniquely identified. W () represents a linear skin function for transforming the head model mesh T along the joint, and J () representsFunction for predicting positions of different head joint points, T representing head model mesh, B_s() Representing the influence function of the head shape parameters on the head model mesh T, B_p() Representing the influence function of the head pose parameters on the head model mesh T, B_e() Representing the function of influence of facial expression parameters on the head model mesh T, Tp₍) And s, p, e and omega respectively represent head shape weight, head posture weight, facial expression weight and skinning weight. s, p, e, ω are obtained by training pre-constructed head sample data.

Fig. 3 exemplarily shows a relationship diagram of three head parameters and a head model provided by an embodiment of the present application, wherein (a) part represents an influence of a head shape parameter on a geometric model, (b) part represents an influence of a head posture parameter on the geometric model, and (c) part represents an influence of a facial expression parameter on the geometric model.

In S202, based on the pre-constructed parameterized head model, driving parameters, including head pose parameters, are extracted from the real-time acquired face image (RGB image)

Facial expression parameters

And driving the parameterized head model to move through the extracted driving parameters to obtain the driven head three-dimensional geometric model motion.

In some embodiments, when the used camera is an RGBD camera, the obtained initial face image is an RGBD face image with depth information, and the parameterized head model can be optimized according to the depth information in the RGBD face image. Specifically, the depth information of the human face is extracted from the RGBD human face image, the extracted depth information is mapped to the head parameters of the human face, and the head shape parameters, the facial expression parameters and the head pose parameters extracted from the RGB image are optimized by using the mapped head parameters, so that the relatively rough result of the parameterized head model is compensated, the geometric precision of the head is improved, and the sense of reality of the three-dimensional head geometric model is enhanced.

S203: and performing semantic segmentation on the face image to obtain each independent sub-region.

In S203, face parsing is a special case of semantic image segmentation, and given a face image, pixel-level label mapping of different semantic components (such as hair, facial skin, eyes, eyebrows, nose, mouth, ears, cheeks, etc., wherein organs such as nose, eyes, mouth, etc. are regarded as internal components of the face, and organs such as hair, hat, facial skin, etc. are regarded as external components of the face) in the face image is calculated.

In specific implementation, a Region-of-Interest tangent deformation method (Region of Interest Tanh-warping, RoI Tanh-warping) is adopted to deform a face image to a preset scale, then the face image is input into a trained analysis model to carry out semantic segmentation on the face to obtain a plurality of sub-regions such as hair, face skin, eyes, eyebrows, nose, mouth, ears and the like, and the segmented face image is deformed back through an inverse function of the RoI Tanh-warping. As shown in fig. 4a, part (a) is the original face image, and part (b) is the segmented image.

In some embodiments, when the face image is semantically segmented, a sub-region of the head accessory (such as a hat, a hairpin, etc.) can be obtained in consideration of the global features of the face image, as shown in fig. 4 b.

S204: and respectively performing texture completion on each subarea according to the difference value of the color value of the pixel point of each subarea and the color value of the mirror image pixel point.

The loss of face texture data mainly results from excessive deviation of the face angle from the camera (single-addition) angle, which results in the loss of hollow and striped texture on the face (especially on the side face areas of ears and cheeks). And (3) by utilizing the consistency of the skin color of the human face and the symmetry of the human face, according to the difference between the color value of the respective pixel point of each subarea and the color value of the mirror image pixel point, the texture completion can be respectively carried out on each subarea. Since the sub-regions are independent of each other, and the same sub-region of the frame-by-frame face image can be texture-superimposed in the time domain (for example, texture weighting is performed on the sub-region of the nose in the first frame face image and the sub-region of the nose in the second frame face image), the sub-regions can be respectively complemented.

The texture completion process is described below with respect to any one of the sub-regions as an example.

Obtaining any pixel point from the sub-region, determining a first difference value between the color value of the obtained pixel point and the color value of the mirror image pixel point of the pixel point, comparing the first difference value with a preset color threshold value, if the first difference value is greater than the preset color threshold value, indicating that the skin color difference between the two symmetrical pixel points is large and the skin color consistency is not satisfied, selecting a preset number of adjacent pixel points from the target neighborhood of the obtained pixel point, and respectively determining a second difference value between the color value of each adjacent pixel point and the color value of each mirror image pixel point, if the number of the adjacent pixel points corresponding to the second difference value greater than the preset pixel threshold value in each second difference value is greater than the preset value, determining that the target neighborhood is a cavity region, filling the texture data of the cavity region, and obtaining the sub-region after texture completion.

Optionally, the mirror image pixel is a pixel in the same sub-region, as shown in fig. 5, the point Q is a pixel in the nose sub-region, the mirror image pixel is Q ', a difference between color values of the point Q and the point Q' is greater than a preset color threshold, a target neighborhood of the point Q is circled by an irregular solid line, and N (N ═ 6) adjacent pixels are selected from the target neighborhood, wherein a color difference between 5 adjacent pixels (represented by dotted circles in fig. 5) and their respective mirror image pixels is greater than the preset pixel threshold, and is greater than a preset value 4, so that the target neighborhood is determined to be a cavity region requiring texture completion.

Optionally, the mirror image pixel point is a pixel point in different sub-regions, as shown in fig. 6, a point P is a pixel point in a left ear sub-region, a mirror image pixel point P 'is a pixel point in a right ear sub-region, a difference value between color values of the point P and the point P' is greater than a preset color threshold, a target neighborhood of the point P is circled by an irregular solid line, and N (N is 6) adjacent pixel points are selected from the target neighborhood, wherein, a color difference value between 5 adjacent pixel points (represented by dotted circles in fig. 6) and their respective mirror image pixel points is greater than a preset pixel threshold, and is greater than a preset value 4, so that the target neighborhood is determined to be a cavity region requiring texture completion.

Further, after the cavity area is determined, the texture data of the cavity area is completed according to a preset initial color value, a color mean value of the face image and a color weighted value of a sub-area in the multi-frame face image. The completion formula is as follows:

T_ct + epsilon (x) + theta equation 3

Wherein, t represents a preset initial color value, epsilon (x) represents a color weighted value of a sub-region corresponding to the pixel point, and theta represents a color mean value of the face image.

And the preset initial color value is set according to the face texture template. The face texture template mainly utilizes a face image data set to carry out color weighting, a texture template containing complete texture data is generated, and the preset initial color value is a color mean value after the face image data set is weighted.

In the embodiment of the application, because the camera collects the original image in the interactive process in real time, the multi-frame sequence face image can be obtained, and therefore, the color values of the same sub-region in the multi-frame face image can be weighted and calculated. Specifically, assuming that the first frame of face image is used as an initial image, the color values of the same sub-region in each sub-region of the multi-frame face image are weighted, and the weighted color values of each sub-region are obtained respectively. The multi-frame face image can be continuous multi-frame face images or discontinuous multi-frame face images.

For example, the color values of the nose region in the first frame of face image to the 10 th frame of face image are weighted to obtain the color values of the weighted nose region, and the color values of the left ear sub-region in the first frame of face image to the 10 th frame of face image are weighted to obtain the color values of the weighted left ear sub-region.

It should be noted that, in the embodiment of the present application, there is no limitation on the selection of the starting image, and for example, an image with the largest range of the visible region of the face may be selected as the starting image.

In some embodiments, due to the problem of the shielding or the rotation angle, when any one pixel point is acquired from the sub-region as a single pixel point without a corresponding mirror image pixel point, the color value of the single pixel point is set to be a preset initial color value.

In other embodiments, when any pixel point obtained from the sub-region has a corresponding mirror image pixel point but is not an effective pixel point in the cavity region, the color value of the effective pixel point is determined according to the color mean value of the face image and the color weighted value of the sub-region in the multi-frame face image.

S205: and fusing the supplemented sub-regions to obtain complete texture data of the face.

In S204, texture data of each sub-region is fused by using a Poisson fusion algorithm (Poisson Blending), so as to obtain complete texture data of the face.

S206: and rendering the head three-dimensional geometric model according to the complete texture data to obtain a reconstructed head dense surface model.

In S205, the head three-dimensional geometric model is rendered according to the complete texture data, so that the reality of the head dense surface model is improved, and the immersion sense in the interaction process is enhanced.

The integrity of the face texture data directly influences the authenticity of the reconstructed model, and in the embodiment of the application, the texture completion is carried out by utilizing the symmetry and the skin color consistency of the face. Fig. 7 illustrates a complete flow chart of completing texture data provided by an embodiment of the present application, and as shown in fig. 7, the complete flow chart mainly includes the following steps:

s701: and determining the color mean value of the face image according to the color values of the pixel points in the face image.

S702: and acquiring a preset texture template to obtain a preset initial color value.

S703: and selecting an initial image, and carrying out color weighting on the same sub-regions in each sub-region in the multi-frame face image to respectively obtain the respective color weighted values of each sub-region.

In this step, the initial image may be the first frame image, or may be an image with the largest area of the visible region of the human face. Each sub-region is a region obtained by segmenting the face image and comprises a plurality of sub-regions such as hair, face skin, eyes, eyebrows, a nose, a mouth, ears and the like.

S704 to S705: and aiming at any one subregion in each subregion, acquiring any one pixel point from the subregion, and determining whether the acquired pixel point is a single pixel point without mirror image pixel points, if so, executing S706, otherwise, executing S707.

S706: and setting the color value of the single pixel point as a preset initial color value so as to complement the texture data of the single pixel point.

S707 to S708: determining a first difference value between the color value of the acquired pixel point and the color value of the mirror image pixel point of the pixel point, and determining whether the first difference value is greater than a preset color threshold, if so, executing S709, otherwise, executing S712.

In this step, if the first difference is greater than the preset color threshold, it is indicated that the pixel point is an invalid cavity point according to the symmetry and skin color consistency of the human face.

S709: and selecting a preset number of adjacent pixels from the target neighborhood of the acquired pixels, and respectively determining a second difference value between the color value of each adjacent pixel and the color value of each mirror image pixel.

S710: and determining whether the number of adjacent pixel points corresponding to the second difference value larger than the preset color threshold value in each second difference value is larger than a preset value, if so, executing S711, otherwise, indicating that the obtained pixel points are valid pixel points, and executing S712.

In this step, if the number of adjacent pixel points corresponding to the second difference value larger than the preset color threshold value in each second difference value exceeds the preset value, it indicates that there are many void points in the target neighborhood, and the target neighborhood may be a void area, and the texture is really serious, and texture completion needs to be performed.

S711: and determining that the target neighborhood is a cavity area, and completing texture data of the cavity area according to a preset initial color value, the color mean value of the face image and the color weighted value of a sub-area corresponding to the pixel point in the multi-frame face image.

S712: and determining the color value of the effective pixel point according to the color mean value of the face image and the color weighted value of the sub-region corresponding to the pixel point in the multi-frame face image.

Based on the same technical concept, the embodiment of the present application provides a reconstruction apparatus, which can perform the head three-dimensional reconstruction method provided by the embodiment of the present application, and can achieve the same technical effect, which is not repeated here.

Referring to fig. 8, the reconstruction device comprises a memory 801 and a processor 802, the memory 801 being configured to store computer-executable instructions and a pre-constructed parameterized head model, the processor 802 being configured to perform the following operations in accordance with the computer program instructions stored by the memory 801:

extracting driving parameters from the face image, and driving the parameterized head model to move by using the extracted driving parameters to obtain a driven head three-dimensional geometric model, wherein the parameterized head model is constructed in advance based on the head parameters extracted from the initial face image;

fusing each supplemented sub-area to obtain complete texture data of the face;

Optionally, the processor 802 performs texture completion on each sub-region according to a difference between a color value of a pixel point of each sub-region and a color value of a mirror image pixel point, and is specifically configured to:

aiming at any one subregion in each subregion, acquiring any one pixel point from the subregion, and determining a first difference value between the color value of the acquired pixel point and the color value of the mirror image pixel point of the pixel point;

if the first difference is larger than a preset color threshold, selecting a preset number of adjacent pixels from the target neighborhood of the pixels, and respectively determining second differences between the color values of the adjacent pixels and the color values of the mirror image pixels;

and if the number of adjacent pixel points corresponding to the second difference value which is greater than the preset color threshold value in the second difference values is greater than the preset value, determining that the target neighborhood is a cavity region, and completing texture data of the cavity region to obtain a sub-region after texture completion.

Optionally, the processor 802 completes the texture data of the hole region, and is specifically configured to:

and completing texture data of the cavity area according to a preset initial color value, the color mean value of the face image and the color weighted value of the sub-area in the multi-frame face image.

Optionally, the processor 802 is further configured to:

setting the color value of a single pixel point as a preset initial color value aiming at the single pixel point without the corresponding mirror image pixel point in the sub-area; and

and determining the color value of the effective pixel point according to the color mean value of the face image and the color weighted value of the sub-region in the multi-frame face image aiming at the effective pixel point which has the corresponding mirror image pixel point but is not in the cavity region.

Optionally, when the camera is an RGBD camera, the initial face image is an RGBD face image, and the processor 802 pre-constructs a parameterized head model based on head parameters extracted from the initial face image, which is specifically configured as:

extracting face depth information from the RGBD face image;

optimizing head parameters according to the face depth information;

and pre-constructing a parameterized head model based on the optimized head parameters.

Optionally, the head parameters include a head shape parameter, a head posture parameter, and a facial expression parameter, and the parameterized head model formula is:

wherein the content of the first and second substances,

the parameters of the shape of the head are represented,

a parameter representing the pose of the head,

for facial expression parameters, W () represents a linear skinning function, J () represents a function that predicts the location of various head joint points, T represents a head model mesh, B_s() Representing the influence function of the head shape parameters on the head model mesh T, B_p() Representing the influence function of the head pose parameters on the head model mesh T, B_e() Representing the influence function of facial expression parameters on the head model mesh T, T_p() And s, p, e and omega respectively represent head shape weight, head posture weight, facial expression weight and skinning weight.

It should be noted that fig. 8 only shows the hardware necessary for implementing the embodiment of the present application, and in addition, the hardware may further include conventional hardware such as a display, a controller, a speaker, and the like.

It should be noted that the processor referred to in the embodiments of the present application may be a Central Processing Unit (CPU), a general purpose processor, a Digital Signal Processor (DSP), an application-specific integrated circuit (ASIC), a Field Programmable Gate Array (FPGA) or other programmable logic devices, a transistor logic device, a hardware component, or any combination thereof. Which may implement or perform the various illustrative logical blocks, modules, and circuits described in connection with the disclosure. A processor may also be a combination of computing functions, e.g., comprising one or more microprocessors, a DSP and a microprocessor, or the like. Wherein the memory may be integrated in the processor or may be provided separately from the processor.

The embodiment of the present application further provides a computer-readable storage medium, where computer-executable instructions are stored, and the computer-executable instructions are used to enable a computer to execute the method in the foregoing embodiment.

The embodiments of the present application also provide a computer program product for storing a computer program, where the computer program is used to execute the method of the foregoing embodiments.

Finally, it should be noted that: the above embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present application.

The foregoing description, for purposes of explanation, has been presented in conjunction with specific embodiments. However, the illustrative discussions above are not intended to be exhaustive or to limit the embodiments to the precise forms disclosed above. Many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles and the practical application, to thereby enable others skilled in the art to best utilize the embodiments and various embodiments with various modifications as are suited to the particular use contemplated.

Claims

1. A method of three-dimensional reconstruction of a head, comprising:

fusing each supplemented sub-area to obtain complete texture data of the face;

2. The method of claim 1, wherein the texture completion of each sub-region according to the difference between the color value of the pixel point of each sub-region and the color value of the mirror image pixel point comprises:

if the first difference is larger than a preset color threshold, selecting a preset number of adjacent pixels from the target neighborhood of the pixels, and respectively determining a second difference between the color value of each adjacent pixel and the color value of each mirror image pixel;

and if the number of adjacent pixel points corresponding to the second difference value which is greater than the preset color threshold value in the second difference values is greater than the preset value, determining that the target neighborhood is a cavity region, and completing texture data of the cavity region to obtain the sub-region after texture completion.

3. The method of claim 2, wherein the complementing the texture data of the hole region comprises:

and completing the texture data of the cavity area according to a preset initial color value, the color mean value of the face image and the color weighted value of the sub-area in the multi-frame face image.

4. The method of claim 2, wherein the method further comprises:

and aiming at effective pixels which have corresponding mirror image pixels but are not in the cavity region, determining the color value of the effective pixels according to the color mean value of the face image and the color weighted value of the sub region in the multi-frame face image.

5. The method according to any one of claims 1-4, wherein when the camera is an RGBD camera, the initial face image is an RGBD face image, and the pre-constructing of the parameterized head model based on the head parameters extracted from the initial face image comprises:

extracting face depth information from the RGBD face image;

optimizing the head parameters according to the face depth information;

6. The method of any one of claims 1-4, wherein the head parameters include head shape parameters, head pose parameters, facial expression parameters, and the parameterized head model formula is:

wherein, the

Representing head shape parameters, said

Representing head pose parameters, said

7. A reconstruction device comprising a memory, a processor;

the memory configured to store computer program instructions and a pre-built parameterized head model;

fusing each supplemented sub-area to obtain complete texture data of the face;

8. The reconstruction device of claim 7, wherein the processor performs texture completion on each of the sub-regions according to a difference between a color value of a pixel of each of the sub-regions and a color value of a pixel of a mirror image, and is specifically configured to:

9. The reconstruction device of claim 8, wherein the processor complements texture data of the hole region, specifically configured to:

10. The reconstruction device of claim 8, wherein the processor is further configured to: