CN113628327B

CN113628327B - Head three-dimensional reconstruction method and device

Info

Publication number: CN113628327B
Application number: CN202110921998.9A
Authority: CN
Inventors: 刘帅; 任子健; 吴连朋
Original assignee: Juhaokan Technology Co Ltd
Current assignee: Juhaokan Technology Co Ltd
Priority date: 2021-08-12
Filing date: 2021-08-12
Publication date: 2023-07-25
Anticipated expiration: 2041-08-12
Also published as: CN113628327A

Abstract

The application relates to the technical field of three-dimensional reconstruction, and provides a head three-dimensional reconstruction method and device, in particular to a face image obtained by carrying out face recognition on an original image; extracting driving parameters from the face image, and driving a pre-constructed parameterized head model to move by using the extracted driving parameters to obtain a driven head three-dimensional geometric model; carrying out semantic segmentation on the face image to obtain each independent subarea; taking the symmetry of the face and the consistency of the complexion into consideration, and respectively carrying out texture complementation on each subarea according to the difference value between the color value of each pixel point of each subarea and the color value of the mirror image pixel point; fusing the sub-regions after the completion to obtain the complete texture data of the human face; and rendering the head three-dimensional geometric model according to the complete texture data, so that the authenticity of the reconstructed head dense surface model is improved.

Description

Head three-dimensional reconstruction method and device

Technical Field

The present disclosure relates to the field of three-dimensional reconstruction technologies, and in particular, to a method and apparatus for three-dimensional reconstruction of a head.

Background

The head reconstruction has wide application in the aspects of game character modeling, virtual reality application, virtual fitting, personalized statue customization, and the like. The head reconstruction method comprises the following steps: 1) The head modeling is carried out by adopting professional modeling software (such as Maya, 3Ds Max and the like), so that a user needs to have deep knowledge on the software, master related artistic knowledge and is not easy to customize in a personalized way; 2) The reconstruction is carried out by utilizing the head data collected by the professional three-dimensional laser scanning equipment, and the popularization is impossible due to the higher cost of the laser scanning equipment; 3) The head model motion in the driving database (human head database) is reconstructed, and compared with the former two modes, the method is more beneficial to popularization and application.

The parameterized head model can be used for reconstructing a three-dimensional model of the human head from a single picture to generate a human head database. The parameterized head model refers to performing dimension reduction analysis (such as principal component analysis or network self-coding) on pre-acquired high-precision three-dimensional human head data to obtain a group of basis functions, and performing linear or nonlinear mixing on the group of basis functions to generate different head models. Wherein the mixing parameters of the basis functions are expressed as parameterizations of the human head.

Generally, when reconstructing a three-dimensional model of a head from a single picture by using a parameterized head model, the geometry of the head will keep a complete topology, but the texture data of the head is limited by the shooting angle of a camera, and the texture data of a human face cannot be completely acquired, so that the reconstructed three-dimensional model of the head (particularly a side face area) has distortion, and the three-dimensional expression effect of the reconstructed model is affected.

At present, two main technical schemes for solving the distortion of a three-dimensional model of a head are adopted, one scheme is that a single image is adopted to render a face image with shielding artifacts and flaws, wherein the 3D head model rotates from any angle to the current angle, so that a training data pair is constructed with an original image, a self-supervision training deep learning model is formed, the side face generation and the face complementation are carried out based on the trained deep learning model, two face texture acquisitions, two three-dimensional space rotations and two renderings are adopted in the reconstruction process, original texture details and illumination are reserved, but the calculation process is complex, the real-time operation cannot be carried out at present, and the applicability in a real-time communication scene is poor; secondly, the region fusion is carried out on a plurality of texture pictures acquired by a plurality of cameras according to a certain mode, so that complete face texture data is obtained, but the texture fusion effect is limited by the arrangement of acquisition equipment, the selection of acquisition scenes and the like, and the texture fusion is carried out only through images acquired by the cameras which are arranged at multiple angles, so that texture deviation occurs when the face moves rapidly, and the face cannot be complemented.

Disclosure of Invention

The embodiment of the application provides a three-dimensional reconstruction method and device for a head, which are used for complementing texture data of a human face and improving the authenticity of a reconstruction model of the head.

In a first aspect, an embodiment of the present application provides a method for three-dimensional reconstruction of a head, including:

acquiring an original image acquired by a camera, and carrying out face recognition on the original image to obtain a face image;

extracting driving parameters from the face image, and driving a parameterized head model to move by using the extracted driving parameters to obtain a three-dimensional geometrical model of the head after driving, wherein the parameterized head model is pre-constructed based on head parameters extracted from the initial face image;

carrying out semantic segmentation on the face image to obtain each independent subarea;

respectively carrying out texture complementation on each subarea according to the difference value between the color value of each pixel point of each subarea and the color value of the mirror image pixel point;

fusing the sub-regions after the completion to obtain the complete texture data of the human face;

and rendering the head three-dimensional geometric model according to the complete texture data to obtain a reconstructed head dense surface model.

In a second aspect, an embodiment of the present application provides a reconstruction device, including a memory, a processor;

the memory is configured to store computer program instructions and a preset parameterized header model;

the processor is configured to perform the following operations in accordance with the computer program instructions:

In a third aspect, the present application provides a computer-readable storage medium storing computer-executable instructions for causing a computer to perform the head three-dimensional reconstruction method provided by the embodiments of the present application.

In the above embodiment of the present application, a face image is obtained by identifying an original image, driving a parameterized head model pre-constructed based on head parameters to move by using driving parameters extracted from the face image, so as to obtain a head three-dimensional geometric model, performing semantic analysis on the face image, so as to obtain each independent sub-region, considering that a deviation exists between a face angle and a camera angle, so that texture data of a face (particularly a side face region) is missing to a certain extent, and therefore, by using symmetry and skin color consistency of the face, texture completion is performed on each sub-region according to differences between color values of respective pixel points and color values of mirror image pixel points of each sub-region, so as to obtain complete texture data of the face, and rendering the head three-dimensional geometric model based on the complete texture data, so that the authenticity of a reconstructed head dense surface model is improved, and further the immersion sense of remote three-dimensional interaction is enhanced.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, and it is obvious that the drawings in the following description are some embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort to a person skilled in the art.

FIG. 1 schematically illustrates a reconstruction system architecture diagram provided by an embodiment of the present application;

FIG. 2 illustrates a flow chart of a method for three-dimensional reconstruction of a head provided in an embodiment of the present application;

FIG. 3 schematically illustrates a relationship between three head parameters and a head model according to an embodiment of the present application;

fig. 4a illustrates a schematic view of face semantic segmentation provided in an embodiment of the present application;

FIG. 4b illustrates another face semantic segmentation schematic provided by an embodiment of the present application;

FIG. 5 schematically illustrates a cavity area determination scheme provided in an embodiment of the present application;

FIG. 6 schematically illustrates another void region determination scheme provided by embodiments of the present application;

FIG. 7 illustrates a complete flow chart of the complement texture data provided by embodiments of the present application;

fig. 8 illustrates a hardware structure diagram of a reconstruction device provided in an embodiment of the present application.

Detailed Description

For purposes of clarity, embodiments and advantages of the present application, the following description will make clear and complete the exemplary embodiments of the present application, with reference to the accompanying drawings in the exemplary embodiments of the present application, it being apparent that the exemplary embodiments described are only some, but not all, of the examples of the present application.

Based on the exemplary embodiments described herein, all other embodiments that may be obtained by one of ordinary skill in the art without making any inventive effort are within the scope of the claims appended hereto. Furthermore, while the disclosure is presented in the context of an exemplary embodiment or embodiments, it should be appreciated that the various aspects of the disclosure may, separately, comprise a complete embodiment.

It should be noted that the brief description of the terms in the present application is only for convenience in understanding the embodiments described below, and is not intended to limit the embodiments of the present application. Unless otherwise indicated, these terms should be construed in their ordinary and customary meaning.

Furthermore, the terms "comprise" and "have," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a product or apparatus that comprises a list of elements is not necessarily limited to those elements expressly listed, but may include other elements not expressly listed or inherent to such product or apparatus.

The term "module" as used in this application refers to any known or later developed hardware, software, firmware, artificial intelligence, fuzzy logic, or combination of hardware and/or software code that is capable of performing the function associated with that element.

The embodiment of the application provides a three-dimensional reconstruction method and device for a head, which are characterized in that semantic segmentation is carried out on a face image to distinguish subregions such as eyebrows, eyes, noses, lips, ears, cheeks, hair, accessories and the like, the symmetry and skin color consistency of the face are utilized to complement the missing texture data in each subregion, the texture data of each subregion after being complemented are fused to obtain complete texture data of the face, and the three-dimensional set model of the head after being driven is subjected to texture rendering on the basis of a parameterized head model, so that the realism of the reconstruction model is improved under the condition that the topological shape of the head is unchanged. Compared with the deep learning algorithm for solving the head distortion, the model training is carried out without constructing training samples in advance, the calculation process is simple, and the real-time performance is strong; compared with the fusion of a plurality of images shot by a plurality of cameras, the method reduces the equipment configuration requirement and is suitable for scenes of real-time movement.

An explanation is given below for nouns in the examples of the present application.

Bicolor reflection model: to describe the physical illumination of a non-homogenous object surface, light may undergo both diffuse and specular reflection after being reflected by the object surface, and the spectral content of the reflected light is determined by both spectral components of the light. In the embodiment of the application, the human body is a non-homogeneous object.

Diffuse reflection: the light of the incident light ray entering the object surface and returning to the object surface after being absorbed by multiple refraction and reflection is determined by the reflection characteristics of the material of the object.

Specular reflection: is the direct reflection of the incident light at the object surface, and is related to the orientation of the object surface relative to the position of the light source, and the degree of rough superstration, the spectral composition of which approximates that of the light source.

Embodiments of the present application are described in detail below with reference to the accompanying drawings.

FIG. 1 schematically illustrates a reconstruction system architecture diagram provided by an embodiment of the present application; as shown in fig. 1, a camera 100 acquires an original image of a target object in a moving process in real time, and transmits the acquired original image to a reconstruction device 200 in a wired or wireless manner, and a dense surface model of the head is reconstructed by the reconstruction device 200 based on the received original image.

It should be noted that, the reconstruction device 200 provided in the embodiment of the present application is only an example, and includes, but is not limited to, a notebook computer, a desktop computer, a tablet, a smart phone, VR/AR glasses, and other display terminals with interactive functions.

It should be noted that, for the reconstruction device having the camera, the original image of the target object may also be acquired by the reconstruction device.

Fig. 2 schematically illustrates a flow chart of a method for three-dimensional reconstruction of a head according to an embodiment of the present application, as shown in fig. 2, where the flow is executed by a reconstruction device, and mainly includes the following steps:

s201: and acquiring an original image acquired by a camera, and carrying out face recognition on the original image to obtain a face image.

In S201, to solve the distortion of the head model, the original image needs to be segmented and identified to obtain a face image. The face image acquisition mode includes, but is not limited to, model detection methods (for example, hidden markov models (Hidden Markov Model, HMM), support vector machines (Support Vector Machine, SVM), edge feature detection (for example, canny edge detection, sobel edge detection, etc.), statistical theory methods (for example, bayesian learning, K-means clustering, etc.), and the like. Some embodiments of the present application consider the influence of factors such as illumination and pose during image acquisition, and further use convolutional neural networks (Convolutional Neural Networks, CNN) (e.g., dlib, libfacedetection, etc. open source libraries) to identify and output faces.

In some embodiments, the process of image acquisition is affected by the illumination of the acquisition scene, and the illumination intensity of the human surface is effectively a combination of diffuse reflected light representing the color of the human surface and specular reflected light representing the chromaticity of the light source. The brightness of different parts of the human body is different due to the different materials of the surface of the human body and the different angles between each organ component and the light source. In order to reduce the influence of illumination on texture data, a face image needs to be subjected to a delustering process.

Alternatively, after estimating the chromaticity of the light source by using specular reflection, the specular reflection model is combined with color information of the face image to perform highlight removal on the face image.

Alternatively, a highlight distribution map data set of the face image is established in advance, a relation between specular reflection and diffuse reflection is learned by using a cyclic countermeasure learning generation network (Cycle Generator Adversarial Networks, cycleGAN), and further, highlight removal is performed on the face image based on the relation between specular reflection and diffuse reflection.

S202: and extracting driving parameters from the face image, and driving the parameterized head model to move by using the extracted driving parameters to obtain the three-dimensional geometric model of the head after driving.

In S202, a parameterized head model is pre-constructed based on head parameters extracted from the initial face image. Classical parameterized head models mainly comprise three-dimensional deformable face models (3D Morphable Face Model,3DMM), FLAME models and the like, head parameters mainly comprise head shape parameters, facial expression parameters and head pose parameters, and the shape of the face can be regarded as the result of the combined action of the three parameters.

The parameterized head model expresses a human head model with real-time non-rigid deformation characteristics through a small amount of parameters, can generate a head three-dimensional geometric model with consistent topology based on a single picture, and is not influenced by geometric deletion of an invisible area.

In S202, a parameterized header model constructed based on a FLAM model is employed. The parameterized head model is composed of a standard linear hybrid skin (Linear blend skinning, LBS) and a hybrid shape (BlendShape), and the number of grid tops in the standard grid model is n=5023, and the number of joints is k=4 (located in the neck, the chin and the two eyeballs respectively). The parameterized head model formula is:

wherein, the liquid crystal display device comprises a liquid crystal display device,representing head shape parameters, ++>Representing head pose parameters (including the motion parameters of the head skeleton),>is a facial expression parameter. />A vertex coordinate of the three-dimensional geometric model of the head may be uniquely identified. W () represents a linear skin function for transforming the head model mesh T along the joint, J () represents a function predicting the position of the different head nodes, T represents the head model mesh, B _s () Representing the influence function of the head shape parameters on the head model grid T, B _p () Representing the influence function of the head posture parameter on the head model grid T, B _e () Representing the influence function of facial expression parameters on the head model grid T _p () The function of deforming the head model grid T under the combined action of the head shape parameter, the head posture parameter and the facial expression parameter is represented, and s, p, e and omega respectively represent the head shape weight, the head posture weight, the facial expression weight and the skin weight. s, p, e, ω are obtained by training pre-constructed head sample data.

Fig. 3 schematically illustrates a relationship between three head parameters and a head model provided in an embodiment of the present application, where (a) part represents an influence of a head shape parameter on the geometric model, (b) part represents an influence of a head posture parameter on the geometric model, and (c) part represents an influence of a facial expression parameter on the geometric model.

In S202, driving parameters, including head pose parameters, are extracted from a face image (RGB image) acquired in real time based on a pre-constructed parameterized head modelFacial expression parameter->And driving the parameterized head model to move through the extracted driving parameters to obtain the three-dimensional geometrical model of the head after driving to move.

In some embodiments, when the camera used is an RGBD camera, the resulting initial face image is an RGBD face image with depth information, and the parameterized head model may be optimized based on the depth information in the RGBD face image. Specifically, depth information of a human face is extracted from an RGBD human face image, the extracted depth information is mapped into head parameters of the human face, and the head shape parameters, the facial expression parameters and the head pose parameters extracted from the RGB image are optimized by utilizing the mapped head parameters, so that a relatively rough result of a parameterized head model is made up, the geometric accuracy of the head is improved, and the realism of the three-dimensional geometric model of the head is further enhanced.

S203: and carrying out semantic segmentation on the face image to obtain each independent subarea.

In S203, face parsing is a special case of semantic image segmentation, given a single face image, and a pixel-level label mapping of different semantic components (such as hair, facial skin, eyes, eyebrows, nose, mouth, ears, cheeks, etc., where organs such as nose, eyes, mouth are considered as internal components of the face, and hair, hat, facial skin, etc. are considered as external components of the face) in the face image is calculated.

In specific implementation, a face image is deformed to a preset scale by adopting a region of interest tangential deformation method (Region of Interest Tanh-warping, roI Tanh-warping), then the face image is input into a trained analytical model to carry out semantic segmentation on the face, a plurality of subregions such as hair, facial skin, eyes, eyebrows, nose, mouth and ears are obtained, and the segmented face image is deformed back through an inverse function of the RoI Tanh-warping. As shown in fig. 4a, (a) is a part of the original face image, and (b) is a part of the segmented image.

In some embodiments, considering global features of the face image, when the face image is semantically segmented, a head accessory (such as a hat, a hairpin, etc.) sub-region may also be obtained, as shown in fig. 4 b.

S204: and respectively carrying out texture complementation on each subarea according to the difference value between the color value of each pixel point of each subarea and the color value of the mirror image pixel point.

The lack of facial texture data mainly results from excessive deviation of facial angles from camera (single phase plus) angles, which to some extent results in the appearance of voids and striped texture in the face (particularly in the side face areas of the ears, cheeks). And respectively carrying out texture completion on each subarea according to the difference between the color value of each pixel point of each subarea and the color value of the mirror image pixel point by utilizing the consistency of the human face skin color and the human face symmetry. Since the sub-regions are independent of each other, and the same sub-region of the frame-by-frame face image can be texture-superimposed in the time domain (e.g., the sub-region of the nose in the first frame face image and the sub-region of the nose in the second frame face image are texture-weighted), the sub-regions can be complemented separately.

The texture completion process is described below with respect to any one of the sub-regions as an example.

Any pixel point is obtained from the subareas, a first difference value between the color value of the obtained pixel point and the color value of the mirror image pixel point of the pixel point is determined, the first difference value is compared with a preset color threshold value, if the first difference value is larger than the preset color threshold value, the skin color difference between the two symmetrical pixel points is larger, the skin color consistency is not met, a preset number of adjacent pixel points are selected from the target adjacent areas of the obtained pixel points, second difference values between the color values of the adjacent pixel points and the color values of the mirror image pixel points are respectively determined, if the number of adjacent pixel points corresponding to the second difference value larger than the preset pixel threshold value in the second difference values is larger than the preset value, the target adjacent area is determined to be a cavity area, texture data of the cavity area are complemented, and the subareas after texture complementation is obtained.

Optionally, the mirror pixel is a pixel in the same sub-area, as shown in fig. 5, the point Q is a pixel in the nose sub-area, the difference between the color values of the point Q and the point Q' is greater than a preset color threshold, the target neighborhood of the point Q is formed by using an irregular solid coil, N (n=6) adjacent pixels are selected from the target neighborhood, wherein the color difference between 5 adjacent pixels (represented by dotted circles in fig. 5) and the respective mirror pixel is greater than the preset pixel threshold and greater than a preset value 4, and the target neighborhood is determined to be a hole area needing texture complementation.

Optionally, the mirror pixel points are pixels in different sub-areas, as shown in fig. 6, the point P is a pixel in the left ear sub-area, the mirror pixel point P 'is a pixel in the right ear sub-area, the difference value between the color values of the point P and the point P' is greater than a preset color threshold, the target neighborhood of the point P is selected from the target neighborhood by using an irregular solid coil, and N (n=6) adjacent pixel points are selected from the target neighborhood, wherein the color difference value between 5 adjacent pixel points (represented by dotted circles in fig. 6) and the respective mirror pixel points is greater than the preset pixel threshold and greater than a preset value 4, and then the target neighborhood is determined to be a cavity area needing texture completion.

Further, after the cavity area is determined, the texture data of the cavity area are complemented according to a preset initial color value, a color average value of the face image and a color weighting value of a sub-area in the multi-frame face image. The complement formula is as follows:

T _c =t+ε (x) +θ formula 3

Wherein t represents a preset initial color value, epsilon (x) represents a color weighted value of a subarea corresponding to the pixel point, and theta represents a color mean value of the face image.

The preset initial color value is set according to the face texture template. The face texture template mainly uses a face image dataset to carry out color weighting to generate a texture template containing complete texture data, and a preset initial color value is a color average value of the face image dataset after weighting.

In the embodiment of the application, the camera acquires the original image in the interaction process in real time, so that the multi-frame sequence face image can be obtained, and the color value of the same subarea in the multi-frame face image can be weighted and calculated. Specifically, assuming that a first frame of face image is used as a starting image, weighting color values of the same subarea in all subareas of the multi-frame face image to obtain weighted color values of all subareas respectively. The multi-frame face image may be a continuous multi-frame face image or a discontinuous multi-frame face image.

For example, the color values of the nose subregions in the first to 10 th frame face images are weighted to obtain the weighted color values of the nose subregions, and the color values of the left ear subregions in the first to 10 th frame face images are weighted to obtain the weighted color values of the left ear subregions.

It should be noted that, in the embodiment of the present application, the selection of the starting image is not limited, and for example, an image with the largest visible area of the face may be selected as the starting image.

In some embodiments, due to the problem of occlusion or rotation angle, when any one pixel is acquired from the sub-area as a single pixel where no corresponding mirror pixel exists, the color value of the single pixel is set to a preset initial color value.

In other embodiments, when any one pixel point is obtained from the sub-area and has a corresponding mirror image pixel point, but is not an effective pixel point in the hole area, the color value of the effective pixel point is determined according to the color average value of the face image and the color weighted value of the sub-area in the multi-frame face image.

S205: and fusing the all the sub-regions after the completion to obtain the complete texture data of the human face.

In S204, texture data of each sub-region is fused by using Poisson Blending algorithm (Poisson Blending), so as to obtain complete texture data of the face.

S206: and rendering the head three-dimensional geometric model according to the complete texture data to obtain a reconstructed head dense surface model.

In S205, the three-dimensional geometric model of the head is rendered according to the complete texture data, which improves the authenticity of the dense surface model of the head, thereby enhancing the sense of immersion in the interaction process.

The integrity of the face texture data directly affects the authenticity of the reconstructed model, and in the embodiment of the application, the symmetry of the face and the consistency of skin colors are utilized for texture completion. Fig. 7 is a complete flowchart of the complement texture data provided in the embodiment of the present application, and as shown in fig. 7, mainly includes the following steps:

s701: and determining the color average value of the face image according to the color value of the pixel point in the face image.

S702: and acquiring a preset texture template to obtain a preset initial color value.

S703: and selecting an initial image, and carrying out color weighting on the same subareas in all subareas in the multi-frame face image to obtain respective color weighted values of all subareas.

In this step, the initial image may be the first frame image, or may be the image with the largest visible area of the face. Each subarea is a segmented area of the face image and comprises a plurality of subareas such as hair, facial skin, eyes, eyebrows, nose, mouth, ears and the like.

S704 to S705: and for any one sub-area in each sub-area, acquiring any one pixel point from the sub-area, determining whether the acquired pixel point is a single pixel point without mirror image pixel points, if so, executing S706, otherwise executing S707.

S706: the color value of the single pixel point is set to be a preset initial color value so as to complement the texture data of the single pixel point.

S707 to S708: and determining a first difference value between the acquired color value of the pixel point and the color value of the mirror image pixel point of the pixel point, and determining whether the first difference value is larger than a preset color threshold value, if so, executing S709, otherwise, executing S712.

In the step, if the first difference value is larger than a preset color threshold value, the pixel point is indicated to be an invalid cavity point according to the symmetry of the face and the consistency of the skin color.

S709: selecting a preset number of adjacent pixel points from the obtained target adjacent pixels of the pixel points, and respectively determining second difference values of the color values of the adjacent pixel points and the color values of the mirror image pixel points.

S710: and determining whether the number of adjacent pixel points corresponding to the second difference value larger than the preset color threshold value in each second difference value is larger than a preset value, if so, executing S711, otherwise, indicating that the acquired pixel point is an effective pixel point, and executing S712.

In this step, if the number of adjacent pixel points corresponding to the second difference value greater than the preset color threshold value in each second difference value exceeds the preset value, it indicates that there are more hole points in the target neighborhood, and the target neighborhood may be a hole area, and the texture is indeed serious, and needs to be completed.

S711: determining the target neighborhood as a hole area, and complementing texture data of the hole area according to a preset initial color value, a color average value of a face image and a color weighting value of a sub-area corresponding to the pixel point in the multi-frame face image.

S712: and determining the color value of the effective pixel point according to the color mean value of the face image and the color weighting value of the sub-region corresponding to the pixel point in the multi-frame face image.

Based on the same technical concept, the embodiments of the present application provide a reconstruction apparatus, which may perform the head three-dimensional reconstruction method provided by the embodiments of the present application, and may achieve the same technical effects, and is not repeated here.

Referring to fig. 8, the reconstruction device comprises a memory 801 and a processor 802, the memory 801 being configured to store computer-resident instructions and a pre-built parameterized header model, the processor 802 being configured to perform the following operations in accordance with computer program instructions stored by the memory 801:

extracting driving parameters from the face image, and driving the parameterized head model to move by using the extracted driving parameters to obtain a three-dimensional geometrical model of the head after driving, wherein the parameterized head model is pre-constructed based on the head parameters extracted from the initial face image;

Optionally, the processor 802 is configured to perform texture complementation on each sub-region according to a difference between a color value of a pixel point of each sub-region and a color value of a mirror image pixel point, and specifically configured to:

for any one sub-area in each sub-area, any one pixel point is obtained from the sub-area, and a first difference value between the obtained color value of the pixel point and the color value of the mirror image pixel point of the pixel point is determined;

if the first difference value is larger than the preset color threshold value, selecting a preset number of adjacent pixel points from the target adjacent of the pixel points, and respectively determining second difference values of the color values of the adjacent pixel points and the color values of the mirror image pixel points;

if the number of the adjacent pixel points corresponding to the second difference value larger than the preset color threshold value in each second difference value is larger than the preset value, determining the target neighborhood as a hole area, and complementing the texture data of the hole area to obtain a sub-area after the texture is complemented.

Optionally, the processor 802 complements texture data of the hole area, specifically configured to:

and supplementing texture data of the cavity area according to the preset initial color value, the color average value of the face image and the color weighted value of the sub-area in the multi-frame face image.

Optionally, the processor 802 is further configured to:

aiming at a single pixel point in which no corresponding mirror image pixel point exists in the subarea, setting the color value of the single pixel point as a preset initial color value; the method comprises the steps of,

and aiming at the effective pixel points which are corresponding to the mirror image pixel points but are not in the cavity area, determining the color value of the effective pixel points according to the color average value of the face image and the color weighting value of the sub-area in the multi-frame face image.

Optionally, when the camera is an RGBD camera, the initial face image is an RGBD face image, and the processor 802 constructs a parameterized head model in advance based on the head parameters extracted from the initial face image, specifically configured to:

face depth information is extracted from RGBD face images;

optimizing head parameters according to the face depth information;

and constructing a parameterized head model in advance based on the optimized head parameters.

Optionally, the head parameters include a head shape parameter, a head posture parameter, and a facial expression parameter, and the parameterized head model formula is:

wherein, the liquid crystal display device comprises a liquid crystal display device,representing head shape parameters, ++>Representing head pose parameters, ++>For the facial expression parameters, W () represents a linear skin function, J () represents a function predicting the position of different head nodes, T represents a head model mesh, B _s () Representing the influence function of the head shape parameters on the head model grid T, B _p () Representing the influence function of the head posture parameter on the head model grid T, B _e () Representing the influence function of facial expression parameters on the head model grid T _p () The function of deforming the head model grid T under the combined action of the head shape parameter, the head posture parameter and the facial expression parameter is represented, and s, p, e and omega respectively represent the head shape weight, the head posture weight, the facial expression weight and the skin weight.

It should be noted that fig. 8 only shows the hardware necessary for implementing the embodiments of the present application, and may include conventional hardware such as a display, a controller, and a speaker.

It should be noted that the processor referred to above in the embodiments of the present application may be a central processing unit (central processing unit, CPU), a general purpose processor, a digital signal processor (digital signal processor, DSP), an application-specific integrated circuit (application-specific integrated circuit, ASIC), a field programmable gate array (field programmable gatearray, FPGA) or other programmable logic device, a transistor logic device, a hardware component, or any combination thereof. Which may implement or perform the various exemplary logic blocks, modules, and circuits described in connection with this disclosure. A processor may also be a combination that performs computing functions, e.g., including one or more microprocessors, a combination of a DSP and a microprocessor, and so forth. The memory may be integrated into the processor or may be provided separately from the processor.

The present application also provides a computer-readable storage medium storing computer-executable instructions for causing a computer to perform the methods of the above embodiments.

The present application also provides a computer program product for storing a computer program for performing the method of the foregoing embodiments.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some or all of the technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit of the corresponding technical solutions from the scope of the technical solutions of the embodiments of the present application.

The foregoing description, for purposes of explanation, has been presented in conjunction with specific embodiments. However, the illustrative discussions above are not intended to be exhaustive or to limit the embodiments to the precise forms disclosed above. Many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles and the practical application, to thereby enable others skilled in the art to best utilize the embodiments and various embodiments with various modifications as are suited to the particular use contemplated.

Claims

1. A method of three-dimensional reconstruction of a head, comprising:

if the first difference value is larger than a preset color threshold value, selecting a preset number of adjacent pixel points from the target adjacent areas of the pixel points, and respectively determining second difference values of the color values of the adjacent pixel points and the color values of the mirror image pixel points;

if the number of the adjacent pixel points corresponding to the second difference value larger than the preset color threshold value in each second difference value is larger than the preset value, determining the target neighborhood as a cavity area, and complementing the texture data of the cavity area to obtain the sub-area after the texture is complemented;

2. The method of claim 1, wherein the complementing the texture data of the void region comprises:

and supplementing texture data of the cavity area according to a preset initial color value, a color average value of the face image and a color weighting value of the subarea in the multi-frame face image.

3. The method of claim 1, wherein the method further comprises:

setting the color value of a single pixel point which does not exist in the subarea and corresponds to the mirror image pixel point as a preset initial color value; and

and aiming at the effective pixel points which are corresponding to the mirror image pixel points but are not in the cavity area, determining the color value of the effective pixel points according to the color average value of the face image and the color weighting value of the subareas in the multi-frame face image.

4. A method according to any one of claims 1-3, wherein when the camera is an RGBD camera, the initial face image is an RGBD face image, and the pre-constructing a parameterized head model based on head parameters extracted from the initial face image comprises:

extracting face depth information from the RGBD face image;

optimizing the head parameters according to the face depth information;

5. A method according to any one of claims 1-3, wherein the head parameters include a head shape parameter, a head pose parameter, a facial expression parameter, and the parameterized head model formula is:

wherein the saidRepresenting a head shape parameter, said +.>Representing head posture parameters, said +.>For the facial expression parameters, W () represents a linear skin function, J () represents a function predicting the position of different head nodes, T represents a head model mesh, B _s () Representing head shape parameters versus head modelInfluence function of grid T, B _p () Representing the influence function of the head posture parameter on the head model grid T, B _e () Representing the influence function of facial expression parameters on the head model grid T _p () The function of deforming the head model grid T under the combined action of the head shape parameter, the head posture parameter and the facial expression parameter is represented, and s, p, e and omega respectively represent the head shape weight, the head posture weight, the facial expression weight and the skin weight.

6. A reconstruction device, comprising a memory, a processor;

the memory is configured to store computer program instructions and a pre-constructed parameterized header model;

7. The reconstruction device of claim 6, wherein the processor complements texture data of the hole area, in particular configured to:

8. The reconstruction device of claim 6 wherein the processor is further configured to: