CN113628327B - Head three-dimensional reconstruction method and device - Google Patents

Head three-dimensional reconstruction method and device Download PDF

Info

Publication number
CN113628327B
CN113628327B CN202110921998.9A CN202110921998A CN113628327B CN 113628327 B CN113628327 B CN 113628327B CN 202110921998 A CN202110921998 A CN 202110921998A CN 113628327 B CN113628327 B CN 113628327B
Authority
CN
China
Prior art keywords
head
color
value
face image
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110921998.9A
Other languages
Chinese (zh)
Other versions
CN113628327A (en
Inventor
刘帅
任子健
吴连朋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Juhaokan Technology Co Ltd
Original Assignee
Juhaokan Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Juhaokan Technology Co Ltd filed Critical Juhaokan Technology Co Ltd
Priority to CN202110921998.9A priority Critical patent/CN113628327B/en
Publication of CN113628327A publication Critical patent/CN113628327A/en
Application granted granted Critical
Publication of CN113628327B publication Critical patent/CN113628327B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three dimensional [3D] modelling, e.g. data description of 3D objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T15/003D [Three Dimensional] image rendering
    • G06T15/04Texture mapping
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Abstract

The application relates to the technical field of three-dimensional reconstruction, and provides a head three-dimensional reconstruction method and device, in particular to a face image obtained by carrying out face recognition on an original image; extracting driving parameters from the face image, and driving a pre-constructed parameterized head model to move by using the extracted driving parameters to obtain a driven head three-dimensional geometric model; carrying out semantic segmentation on the face image to obtain each independent subarea; taking the symmetry of the face and the consistency of the complexion into consideration, and respectively carrying out texture complementation on each subarea according to the difference value between the color value of each pixel point of each subarea and the color value of the mirror image pixel point; fusing the sub-regions after the completion to obtain the complete texture data of the human face; and rendering the head three-dimensional geometric model according to the complete texture data, so that the authenticity of the reconstructed head dense surface model is improved.

Description

Head three-dimensional reconstruction method and device
Technical Field
The present disclosure relates to the field of three-dimensional reconstruction technologies, and in particular, to a method and apparatus for three-dimensional reconstruction of a head.
Background
The head reconstruction has wide application in the aspects of game character modeling, virtual reality application, virtual fitting, personalized statue customization, and the like. The head reconstruction method comprises the following steps: 1) The head modeling is carried out by adopting professional modeling software (such as Maya, 3Ds Max and the like), so that a user needs to have deep knowledge on the software, master related artistic knowledge and is not easy to customize in a personalized way; 2) The reconstruction is carried out by utilizing the head data collected by the professional three-dimensional laser scanning equipment, and the popularization is impossible due to the higher cost of the laser scanning equipment; 3) The head model motion in the driving database (human head database) is reconstructed, and compared with the former two modes, the method is more beneficial to popularization and application.
The parameterized head model can be used for reconstructing a three-dimensional model of the human head from a single picture to generate a human head database. The parameterized head model refers to performing dimension reduction analysis (such as principal component analysis or network self-coding) on pre-acquired high-precision three-dimensional human head data to obtain a group of basis functions, and performing linear or nonlinear mixing on the group of basis functions to generate different head models. Wherein the mixing parameters of the basis functions are expressed as parameterizations of the human head.
Generally, when reconstructing a three-dimensional model of a head from a single picture by using a parameterized head model, the geometry of the head will keep a complete topology, but the texture data of the head is limited by the shooting angle of a camera, and the texture data of a human face cannot be completely acquired, so that the reconstructed three-dimensional model of the head (particularly a side face area) has distortion, and the three-dimensional expression effect of the reconstructed model is affected.
At present, two main technical schemes for solving the distortion of a three-dimensional model of a head are adopted, one scheme is that a single image is adopted to render a face image with shielding artifacts and flaws, wherein the 3D head model rotates from any angle to the current angle, so that a training data pair is constructed with an original image, a self-supervision training deep learning model is formed, the side face generation and the face complementation are carried out based on the trained deep learning model, two face texture acquisitions, two three-dimensional space rotations and two renderings are adopted in the reconstruction process, original texture details and illumination are reserved, but the calculation process is complex, the real-time operation cannot be carried out at present, and the applicability in a real-time communication scene is poor; secondly, the region fusion is carried out on a plurality of texture pictures acquired by a plurality of cameras according to a certain mode, so that complete face texture data is obtained, but the texture fusion effect is limited by the arrangement of acquisition equipment, the selection of acquisition scenes and the like, and the texture fusion is carried out only through images acquired by the cameras which are arranged at multiple angles, so that texture deviation occurs when the face moves rapidly, and the face cannot be complemented.
Disclosure of Invention
The embodiment of the application provides a three-dimensional reconstruction method and device for a head, which are used for complementing texture data of a human face and improving the authenticity of a reconstruction model of the head.
In a first aspect, an embodiment of the present application provides a method for three-dimensional reconstruction of a head, including:
acquiring an original image acquired by a camera, and carrying out face recognition on the original image to obtain a face image;
extracting driving parameters from the face image, and driving a parameterized head model to move by using the extracted driving parameters to obtain a three-dimensional geometrical model of the head after driving, wherein the parameterized head model is pre-constructed based on head parameters extracted from the initial face image;
carrying out semantic segmentation on the face image to obtain each independent subarea;
respectively carrying out texture complementation on each subarea according to the difference value between the color value of each pixel point of each subarea and the color value of the mirror image pixel point;
fusing the sub-regions after the completion to obtain the complete texture data of the human face;
and rendering the head three-dimensional geometric model according to the complete texture data to obtain a reconstructed head dense surface model.
In a second aspect, an embodiment of the present application provides a reconstruction device, including a memory, a processor;
the memory is configured to store computer program instructions and a preset parameterized header model;
the processor is configured to perform the following operations in accordance with the computer program instructions:
acquiring an original image acquired by a camera, and carrying out face recognition on the original image to obtain a face image;
extracting driving parameters from the face image, and driving a parameterized head model to move by using the extracted driving parameters to obtain a three-dimensional geometrical model of the head after driving, wherein the parameterized head model is pre-constructed based on head parameters extracted from the initial face image;
carrying out semantic segmentation on the face image to obtain each independent subarea;
respectively carrying out texture complementation on each subarea according to the difference value between the color value of each pixel point of each subarea and the color value of the mirror image pixel point;
fusing the sub-regions after the completion to obtain the complete texture data of the human face;
and rendering the head three-dimensional geometric model according to the complete texture data to obtain a reconstructed head dense surface model.
In a third aspect, the present application provides a computer-readable storage medium storing computer-executable instructions for causing a computer to perform the head three-dimensional reconstruction method provided by the embodiments of the present application.
In the above embodiment of the present application, a face image is obtained by identifying an original image, driving a parameterized head model pre-constructed based on head parameters to move by using driving parameters extracted from the face image, so as to obtain a head three-dimensional geometric model, performing semantic analysis on the face image, so as to obtain each independent sub-region, considering that a deviation exists between a face angle and a camera angle, so that texture data of a face (particularly a side face region) is missing to a certain extent, and therefore, by using symmetry and skin color consistency of the face, texture completion is performed on each sub-region according to differences between color values of respective pixel points and color values of mirror image pixel points of each sub-region, so as to obtain complete texture data of the face, and rendering the head three-dimensional geometric model based on the complete texture data, so that the authenticity of a reconstructed head dense surface model is improved, and further the immersion sense of remote three-dimensional interaction is enhanced.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, and it is obvious that the drawings in the following description are some embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort to a person skilled in the art.
FIG. 1 schematically illustrates a reconstruction system architecture diagram provided by an embodiment of the present application;
FIG. 2 illustrates a flow chart of a method for three-dimensional reconstruction of a head provided in an embodiment of the present application;
FIG. 3 schematically illustrates a relationship between three head parameters and a head model according to an embodiment of the present application;
fig. 4a illustrates a schematic view of face semantic segmentation provided in an embodiment of the present application;
FIG. 4b illustrates another face semantic segmentation schematic provided by an embodiment of the present application;
FIG. 5 schematically illustrates a cavity area determination scheme provided in an embodiment of the present application;
FIG. 6 schematically illustrates another void region determination scheme provided by embodiments of the present application;
FIG. 7 illustrates a complete flow chart of the complement texture data provided by embodiments of the present application;
fig. 8 illustrates a hardware structure diagram of a reconstruction device provided in an embodiment of the present application.
Detailed Description
For purposes of clarity, embodiments and advantages of the present application, the following description will make clear and complete the exemplary embodiments of the present application, with reference to the accompanying drawings in the exemplary embodiments of the present application, it being apparent that the exemplary embodiments described are only some, but not all, of the examples of the present application.
Based on the exemplary embodiments described herein, all other embodiments that may be obtained by one of ordinary skill in the art without making any inventive effort are within the scope of the claims appended hereto. Furthermore, while the disclosure is presented in the context of an exemplary embodiment or embodiments, it should be appreciated that the various aspects of the disclosure may, separately, comprise a complete embodiment.
It should be noted that the brief description of the terms in the present application is only for convenience in understanding the embodiments described below, and is not intended to limit the embodiments of the present application. Unless otherwise indicated, these terms should be construed in their ordinary and customary meaning.
Furthermore, the terms "comprise" and "have," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a product or apparatus that comprises a list of elements is not necessarily limited to those elements expressly listed, but may include other elements not expressly listed or inherent to such product or apparatus.
The term "module" as used in this application refers to any known or later developed hardware, software, firmware, artificial intelligence, fuzzy logic, or combination of hardware and/or software code that is capable of performing the function associated with that element.
The embodiment of the application provides a three-dimensional reconstruction method and device for a head, which are characterized in that semantic segmentation is carried out on a face image to distinguish subregions such as eyebrows, eyes, noses, lips, ears, cheeks, hair, accessories and the like, the symmetry and skin color consistency of the face are utilized to complement the missing texture data in each subregion, the texture data of each subregion after being complemented are fused to obtain complete texture data of the face, and the three-dimensional set model of the head after being driven is subjected to texture rendering on the basis of a parameterized head model, so that the realism of the reconstruction model is improved under the condition that the topological shape of the head is unchanged. Compared with the deep learning algorithm for solving the head distortion, the model training is carried out without constructing training samples in advance, the calculation process is simple, and the real-time performance is strong; compared with the fusion of a plurality of images shot by a plurality of cameras, the method reduces the equipment configuration requirement and is suitable for scenes of real-time movement.
An explanation is given below for nouns in the examples of the present application.
Bicolor reflection model: to describe the physical illumination of a non-homogenous object surface, light may undergo both diffuse and specular reflection after being reflected by the object surface, and the spectral content of the reflected light is determined by both spectral components of the light. In the embodiment of the application, the human body is a non-homogeneous object.
Diffuse reflection: the light of the incident light ray entering the object surface and returning to the object surface after being absorbed by multiple refraction and reflection is determined by the reflection characteristics of the material of the object.
Specular reflection: is the direct reflection of the incident light at the object surface, and is related to the orientation of the object surface relative to the position of the light source, and the degree of rough superstration, the spectral composition of which approximates that of the light source.
Embodiments of the present application are described in detail below with reference to the accompanying drawings.
FIG. 1 schematically illustrates a reconstruction system architecture diagram provided by an embodiment of the present application; as shown in fig. 1, a camera 100 acquires an original image of a target object in a moving process in real time, and transmits the acquired original image to a reconstruction device 200 in a wired or wireless manner, and a dense surface model of the head is reconstructed by the reconstruction device 200 based on the received original image.
It should be noted that, the reconstruction device 200 provided in the embodiment of the present application is only an example, and includes, but is not limited to, a notebook computer, a desktop computer, a tablet, a smart phone, VR/AR glasses, and other display terminals with interactive functions.
It should be noted that, for the reconstruction device having the camera, the original image of the target object may also be acquired by the reconstruction device.
Fig. 2 schematically illustrates a flow chart of a method for three-dimensional reconstruction of a head according to an embodiment of the present application, as shown in fig. 2, where the flow is executed by a reconstruction device, and mainly includes the following steps:
s201: and acquiring an original image acquired by a camera, and carrying out face recognition on the original image to obtain a face image.
In S201, to solve the distortion of the head model, the original image needs to be segmented and identified to obtain a face image. The face image acquisition mode includes, but is not limited to, model detection methods (for example, hidden markov models (Hidden Markov Model, HMM), support vector machines (Support Vector Machine, SVM), edge feature detection (for example, canny edge detection, sobel edge detection, etc.), statistical theory methods (for example, bayesian learning, K-means clustering, etc.), and the like. Some embodiments of the present application consider the influence of factors such as illumination and pose during image acquisition, and further use convolutional neural networks (Convolutional Neural Networks, CNN) (e.g., dlib, libfacedetection, etc. open source libraries) to identify and output faces.
In some embodiments, the process of image acquisition is affected by the illumination of the acquisition scene, and the illumination intensity of the human surface is effectively a combination of diffuse reflected light representing the color of the human surface and specular reflected light representing the chromaticity of the light source. The brightness of different parts of the human body is different due to the different materials of the surface of the human body and the different angles between each organ component and the light source. In order to reduce the influence of illumination on texture data, a face image needs to be subjected to a delustering process.
Alternatively, after estimating the chromaticity of the light source by using specular reflection, the specular reflection model is combined with color information of the face image to perform highlight removal on the face image.
Alternatively, a highlight distribution map data set of the face image is established in advance, a relation between specular reflection and diffuse reflection is learned by using a cyclic countermeasure learning generation network (Cycle Generator Adversarial Networks, cycleGAN), and further, highlight removal is performed on the face image based on the relation between specular reflection and diffuse reflection.
S202: and extracting driving parameters from the face image, and driving the parameterized head model to move by using the extracted driving parameters to obtain the three-dimensional geometric model of the head after driving.
In S202, a parameterized head model is pre-constructed based on head parameters extracted from the initial face image. Classical parameterized head models mainly comprise three-dimensional deformable face models (3D Morphable Face Model,3DMM), FLAME models and the like, head parameters mainly comprise head shape parameters, facial expression parameters and head pose parameters, and the shape of the face can be regarded as the result of the combined action of the three parameters.
The parameterized head model expresses a human head model with real-time non-rigid deformation characteristics through a small amount of parameters, can generate a head three-dimensional geometric model with consistent topology based on a single picture, and is not influenced by geometric deletion of an invisible area.
In S202, a parameterized header model constructed based on a FLAM model is employed. The parameterized head model is composed of a standard linear hybrid skin (Linear blend skinning, LBS) and a hybrid shape (BlendShape), and the number of grid tops in the standard grid model is n=5023, and the number of joints is k=4 (located in the neck, the chin and the two eyeballs respectively). The parameterized head model formula is:
wherein, the liquid crystal display device comprises a liquid crystal display device,representing head shape parameters, ++>Representing head pose parameters (including the motion parameters of the head skeleton),>is a facial expression parameter. />A vertex coordinate of the three-dimensional geometric model of the head may be uniquely identified. W () represents a linear skin function for transforming the head model mesh T along the joint, J () represents a function predicting the position of the different head nodes, T represents the head model mesh, B s () Representing the influence function of the head shape parameters on the head model grid T, B p () Representing the influence function of the head posture parameter on the head model grid T, B e () Representing the influence function of facial expression parameters on the head model grid T p () The function of deforming the head model grid T under the combined action of the head shape parameter, the head posture parameter and the facial expression parameter is represented, and s, p, e and omega respectively represent the head shape weight, the head posture weight, the facial expression weight and the skin weight. s, p, e, ω are obtained by training pre-constructed head sample data.
Fig. 3 schematically illustrates a relationship between three head parameters and a head model provided in an embodiment of the present application, where (a) part represents an influence of a head shape parameter on the geometric model, (b) part represents an influence of a head posture parameter on the geometric model, and (c) part represents an influence of a facial expression parameter on the geometric model.
In S202, driving parameters, including head pose parameters, are extracted from a face image (RGB image) acquired in real time based on a pre-constructed parameterized head modelFacial expression parameter->And driving the parameterized head model to move through the extracted driving parameters to obtain the three-dimensional geometrical model of the head after driving to move.
In some embodiments, when the camera used is an RGBD camera, the resulting initial face image is an RGBD face image with depth information, and the parameterized head model may be optimized based on the depth information in the RGBD face image. Specifically, depth information of a human face is extracted from an RGBD human face image, the extracted depth information is mapped into head parameters of the human face, and the head shape parameters, the facial expression parameters and the head pose parameters extracted from the RGB image are optimized by utilizing the mapped head parameters, so that a relatively rough result of a parameterized head model is made up, the geometric accuracy of the head is improved, and the realism of the three-dimensional geometric model of the head is further enhanced.
S203: and carrying out semantic segmentation on the face image to obtain each independent subarea.
In S203, face parsing is a special case of semantic image segmentation, given a single face image, and a pixel-level label mapping of different semantic components (such as hair, facial skin, eyes, eyebrows, nose, mouth, ears, cheeks, etc., where organs such as nose, eyes, mouth are considered as internal components of the face, and hair, hat, facial skin, etc. are considered as external components of the face) in the face image is calculated.
In specific implementation, a face image is deformed to a preset scale by adopting a region of interest tangential deformation method (Region of Interest Tanh-warping, roI Tanh-warping), then the face image is input into a trained analytical model to carry out semantic segmentation on the face, a plurality of subregions such as hair, facial skin, eyes, eyebrows, nose, mouth and ears are obtained, and the segmented face image is deformed back through an inverse function of the RoI Tanh-warping. As shown in fig. 4a, (a) is a part of the original face image, and (b) is a part of the segmented image.
In some embodiments, considering global features of the face image, when the face image is semantically segmented, a head accessory (such as a hat, a hairpin, etc.) sub-region may also be obtained, as shown in fig. 4 b.
S204: and respectively carrying out texture complementation on each subarea according to the difference value between the color value of each pixel point of each subarea and the color value of the mirror image pixel point.
The lack of facial texture data mainly results from excessive deviation of facial angles from camera (single phase plus) angles, which to some extent results in the appearance of voids and striped texture in the face (particularly in the side face areas of the ears, cheeks). And respectively carrying out texture completion on each subarea according to the difference between the color value of each pixel point of each subarea and the color value of the mirror image pixel point by utilizing the consistency of the human face skin color and the human face symmetry. Since the sub-regions are independent of each other, and the same sub-region of the frame-by-frame face image can be texture-superimposed in the time domain (e.g., the sub-region of the nose in the first frame face image and the sub-region of the nose in the second frame face image are texture-weighted), the sub-regions can be complemented separately.
The texture completion process is described below with respect to any one of the sub-regions as an example.
Any pixel point is obtained from the subareas, a first difference value between the color value of the obtained pixel point and the color value of the mirror image pixel point of the pixel point is determined, the first difference value is compared with a preset color threshold value, if the first difference value is larger than the preset color threshold value, the skin color difference between the two symmetrical pixel points is larger, the skin color consistency is not met, a preset number of adjacent pixel points are selected from the target adjacent areas of the obtained pixel points, second difference values between the color values of the adjacent pixel points and the color values of the mirror image pixel points are respectively determined, if the number of adjacent pixel points corresponding to the second difference value larger than the preset pixel threshold value in the second difference values is larger than the preset value, the target adjacent area is determined to be a cavity area, texture data of the cavity area are complemented, and the subareas after texture complementation is obtained.
Optionally, the mirror pixel is a pixel in the same sub-area, as shown in fig. 5, the point Q is a pixel in the nose sub-area, the difference between the color values of the point Q and the point Q' is greater than a preset color threshold, the target neighborhood of the point Q is formed by using an irregular solid coil, N (n=6) adjacent pixels are selected from the target neighborhood, wherein the color difference between 5 adjacent pixels (represented by dotted circles in fig. 5) and the respective mirror pixel is greater than the preset pixel threshold and greater than a preset value 4, and the target neighborhood is determined to be a hole area needing texture complementation.
Optionally, the mirror pixel points are pixels in different sub-areas, as shown in fig. 6, the point P is a pixel in the left ear sub-area, the mirror pixel point P 'is a pixel in the right ear sub-area, the difference value between the color values of the point P and the point P' is greater than a preset color threshold, the target neighborhood of the point P is selected from the target neighborhood by using an irregular solid coil, and N (n=6) adjacent pixel points are selected from the target neighborhood, wherein the color difference value between 5 adjacent pixel points (represented by dotted circles in fig. 6) and the respective mirror pixel points is greater than the preset pixel threshold and greater than a preset value 4, and then the target neighborhood is determined to be a cavity area needing texture completion.
Further, after the cavity area is determined, the texture data of the cavity area are complemented according to a preset initial color value, a color average value of the face image and a color weighting value of a sub-area in the multi-frame face image. The complement formula is as follows:
T c =t+ε (x) +θ formula 3
Wherein t represents a preset initial color value, epsilon (x) represents a color weighted value of a subarea corresponding to the pixel point, and theta represents a color mean value of the face image.
The preset initial color value is set according to the face texture template. The face texture template mainly uses a face image dataset to carry out color weighting to generate a texture template containing complete texture data, and a preset initial color value is a color average value of the face image dataset after weighting.
In the embodiment of the application, the camera acquires the original image in the interaction process in real time, so that the multi-frame sequence face image can be obtained, and the color value of the same subarea in the multi-frame face image can be weighted and calculated. Specifically, assuming that a first frame of face image is used as a starting image, weighting color values of the same subarea in all subareas of the multi-frame face image to obtain weighted color values of all subareas respectively. The multi-frame face image may be a continuous multi-frame face image or a discontinuous multi-frame face image.
For example, the color values of the nose subregions in the first to 10 th frame face images are weighted to obtain the weighted color values of the nose subregions, and the color values of the left ear subregions in the first to 10 th frame face images are weighted to obtain the weighted color values of the left ear subregions.
It should be noted that, in the embodiment of the present application, the selection of the starting image is not limited, and for example, an image with the largest visible area of the face may be selected as the starting image.
In some embodiments, due to the problem of occlusion or rotation angle, when any one pixel is acquired from the sub-area as a single pixel where no corresponding mirror pixel exists, the color value of the single pixel is set to a preset initial color value.
In other embodiments, when any one pixel point is obtained from the sub-area and has a corresponding mirror image pixel point, but is not an effective pixel point in the hole area, the color value of the effective pixel point is determined according to the color average value of the face image and the color weighted value of the sub-area in the multi-frame face image.
S205: and fusing the all the sub-regions after the completion to obtain the complete texture data of the human face.
In S204, texture data of each sub-region is fused by using Poisson Blending algorithm (Poisson Blending), so as to obtain complete texture data of the face.
S206: and rendering the head three-dimensional geometric model according to the complete texture data to obtain a reconstructed head dense surface model.
In S205, the three-dimensional geometric model of the head is rendered according to the complete texture data, which improves the authenticity of the dense surface model of the head, thereby enhancing the sense of immersion in the interaction process.
The integrity of the face texture data directly affects the authenticity of the reconstructed model, and in the embodiment of the application, the symmetry of the face and the consistency of skin colors are utilized for texture completion. Fig. 7 is a complete flowchart of the complement texture data provided in the embodiment of the present application, and as shown in fig. 7, mainly includes the following steps:
s701: and determining the color average value of the face image according to the color value of the pixel point in the face image.
S702: and acquiring a preset texture template to obtain a preset initial color value.
S703: and selecting an initial image, and carrying out color weighting on the same subareas in all subareas in the multi-frame face image to obtain respective color weighted values of all subareas.
In this step, the initial image may be the first frame image, or may be the image with the largest visible area of the face. Each subarea is a segmented area of the face image and comprises a plurality of subareas such as hair, facial skin, eyes, eyebrows, nose, mouth, ears and the like.
S704 to S705: and for any one sub-area in each sub-area, acquiring any one pixel point from the sub-area, determining whether the acquired pixel point is a single pixel point without mirror image pixel points, if so, executing S706, otherwise executing S707.
S706: the color value of the single pixel point is set to be a preset initial color value so as to complement the texture data of the single pixel point.
S707 to S708: and determining a first difference value between the acquired color value of the pixel point and the color value of the mirror image pixel point of the pixel point, and determining whether the first difference value is larger than a preset color threshold value, if so, executing S709, otherwise, executing S712.
In the step, if the first difference value is larger than a preset color threshold value, the pixel point is indicated to be an invalid cavity point according to the symmetry of the face and the consistency of the skin color.
S709: selecting a preset number of adjacent pixel points from the obtained target adjacent pixels of the pixel points, and respectively determining second difference values of the color values of the adjacent pixel points and the color values of the mirror image pixel points.
S710: and determining whether the number of adjacent pixel points corresponding to the second difference value larger than the preset color threshold value in each second difference value is larger than a preset value, if so, executing S711, otherwise, indicating that the acquired pixel point is an effective pixel point, and executing S712.
In this step, if the number of adjacent pixel points corresponding to the second difference value greater than the preset color threshold value in each second difference value exceeds the preset value, it indicates that there are more hole points in the target neighborhood, and the target neighborhood may be a hole area, and the texture is indeed serious, and needs to be completed.
S711: determining the target neighborhood as a hole area, and complementing texture data of the hole area according to a preset initial color value, a color average value of a face image and a color weighting value of a sub-area corresponding to the pixel point in the multi-frame face image.
S712: and determining the color value of the effective pixel point according to the color mean value of the face image and the color weighting value of the sub-region corresponding to the pixel point in the multi-frame face image.
Based on the same technical concept, the embodiments of the present application provide a reconstruction apparatus, which may perform the head three-dimensional reconstruction method provided by the embodiments of the present application, and may achieve the same technical effects, and is not repeated here.
Referring to fig. 8, the reconstruction device comprises a memory 801 and a processor 802, the memory 801 being configured to store computer-resident instructions and a pre-built parameterized header model, the processor 802 being configured to perform the following operations in accordance with computer program instructions stored by the memory 801:
acquiring an original image acquired by a camera, and carrying out face recognition on the original image to obtain a face image;
extracting driving parameters from the face image, and driving the parameterized head model to move by using the extracted driving parameters to obtain a three-dimensional geometrical model of the head after driving, wherein the parameterized head model is pre-constructed based on the head parameters extracted from the initial face image;
carrying out semantic segmentation on the face image to obtain each independent subarea;
respectively carrying out texture complementation on each subarea according to the difference value between the color value of each pixel point of each subarea and the color value of the mirror image pixel point;
fusing the sub-regions after the completion to obtain the complete texture data of the human face;
and rendering the head three-dimensional geometric model according to the complete texture data to obtain a reconstructed head dense surface model.
Optionally, the processor 802 is configured to perform texture complementation on each sub-region according to a difference between a color value of a pixel point of each sub-region and a color value of a mirror image pixel point, and specifically configured to:
for any one sub-area in each sub-area, any one pixel point is obtained from the sub-area, and a first difference value between the obtained color value of the pixel point and the color value of the mirror image pixel point of the pixel point is determined;
if the first difference value is larger than the preset color threshold value, selecting a preset number of adjacent pixel points from the target adjacent of the pixel points, and respectively determining second difference values of the color values of the adjacent pixel points and the color values of the mirror image pixel points;
if the number of the adjacent pixel points corresponding to the second difference value larger than the preset color threshold value in each second difference value is larger than the preset value, determining the target neighborhood as a hole area, and complementing the texture data of the hole area to obtain a sub-area after the texture is complemented.
Optionally, the processor 802 complements texture data of the hole area, specifically configured to:
and supplementing texture data of the cavity area according to the preset initial color value, the color average value of the face image and the color weighted value of the sub-area in the multi-frame face image.
Optionally, the processor 802 is further configured to:
aiming at a single pixel point in which no corresponding mirror image pixel point exists in the subarea, setting the color value of the single pixel point as a preset initial color value; the method comprises the steps of,
and aiming at the effective pixel points which are corresponding to the mirror image pixel points but are not in the cavity area, determining the color value of the effective pixel points according to the color average value of the face image and the color weighting value of the sub-area in the multi-frame face image.
Optionally, when the camera is an RGBD camera, the initial face image is an RGBD face image, and the processor 802 constructs a parameterized head model in advance based on the head parameters extracted from the initial face image, specifically configured to:
face depth information is extracted from RGBD face images;
optimizing head parameters according to the face depth information;
and constructing a parameterized head model in advance based on the optimized head parameters.
Optionally, the head parameters include a head shape parameter, a head posture parameter, and a facial expression parameter, and the parameterized head model formula is:
wherein, the liquid crystal display device comprises a liquid crystal display device,representing head shape parameters, ++>Representing head pose parameters, ++>For the facial expression parameters, W () represents a linear skin function, J () represents a function predicting the position of different head nodes, T represents a head model mesh, B s () Representing the influence function of the head shape parameters on the head model grid T, B p () Representing the influence function of the head posture parameter on the head model grid T, B e () Representing the influence function of facial expression parameters on the head model grid T p () The function of deforming the head model grid T under the combined action of the head shape parameter, the head posture parameter and the facial expression parameter is represented, and s, p, e and omega respectively represent the head shape weight, the head posture weight, the facial expression weight and the skin weight.
It should be noted that fig. 8 only shows the hardware necessary for implementing the embodiments of the present application, and may include conventional hardware such as a display, a controller, and a speaker.
It should be noted that the processor referred to above in the embodiments of the present application may be a central processing unit (central processing unit, CPU), a general purpose processor, a digital signal processor (digital signal processor, DSP), an application-specific integrated circuit (application-specific integrated circuit, ASIC), a field programmable gate array (field programmable gatearray, FPGA) or other programmable logic device, a transistor logic device, a hardware component, or any combination thereof. Which may implement or perform the various exemplary logic blocks, modules, and circuits described in connection with this disclosure. A processor may also be a combination that performs computing functions, e.g., including one or more microprocessors, a combination of a DSP and a microprocessor, and so forth. The memory may be integrated into the processor or may be provided separately from the processor.
The present application also provides a computer-readable storage medium storing computer-executable instructions for causing a computer to perform the methods of the above embodiments.
The present application also provides a computer program product for storing a computer program for performing the method of the foregoing embodiments.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some or all of the technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit of the corresponding technical solutions from the scope of the technical solutions of the embodiments of the present application.
The foregoing description, for purposes of explanation, has been presented in conjunction with specific embodiments. However, the illustrative discussions above are not intended to be exhaustive or to limit the embodiments to the precise forms disclosed above. Many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles and the practical application, to thereby enable others skilled in the art to best utilize the embodiments and various embodiments with various modifications as are suited to the particular use contemplated.

Claims (8)

1. A method of three-dimensional reconstruction of a head, comprising:
acquiring an original image acquired by a camera, and carrying out face recognition on the original image to obtain a face image;
extracting driving parameters from the face image, and driving a parameterized head model to move by using the extracted driving parameters to obtain a three-dimensional geometrical model of the head after driving, wherein the parameterized head model is pre-constructed based on head parameters extracted from the initial face image;
carrying out semantic segmentation on the face image to obtain each independent subarea;
for any one sub-area in each sub-area, any one pixel point is obtained from the sub-area, and a first difference value between the obtained color value of the pixel point and the color value of the mirror image pixel point of the pixel point is determined;
if the first difference value is larger than a preset color threshold value, selecting a preset number of adjacent pixel points from the target adjacent areas of the pixel points, and respectively determining second difference values of the color values of the adjacent pixel points and the color values of the mirror image pixel points;
if the number of the adjacent pixel points corresponding to the second difference value larger than the preset color threshold value in each second difference value is larger than the preset value, determining the target neighborhood as a cavity area, and complementing the texture data of the cavity area to obtain the sub-area after the texture is complemented;
fusing the sub-regions after the completion to obtain the complete texture data of the human face;
and rendering the head three-dimensional geometric model according to the complete texture data to obtain a reconstructed head dense surface model.
2. The method of claim 1, wherein the complementing the texture data of the void region comprises:
and supplementing texture data of the cavity area according to a preset initial color value, a color average value of the face image and a color weighting value of the subarea in the multi-frame face image.
3. The method of claim 1, wherein the method further comprises:
setting the color value of a single pixel point which does not exist in the subarea and corresponds to the mirror image pixel point as a preset initial color value; and
and aiming at the effective pixel points which are corresponding to the mirror image pixel points but are not in the cavity area, determining the color value of the effective pixel points according to the color average value of the face image and the color weighting value of the subareas in the multi-frame face image.
4. A method according to any one of claims 1-3, wherein when the camera is an RGBD camera, the initial face image is an RGBD face image, and the pre-constructing a parameterized head model based on head parameters extracted from the initial face image comprises:
extracting face depth information from the RGBD face image;
optimizing the head parameters according to the face depth information;
and constructing a parameterized head model in advance based on the optimized head parameters.
5. A method according to any one of claims 1-3, wherein the head parameters include a head shape parameter, a head pose parameter, a facial expression parameter, and the parameterized head model formula is:
wherein the saidRepresenting a head shape parameter, said +.>Representing head posture parameters, said +.>For the facial expression parameters, W () represents a linear skin function, J () represents a function predicting the position of different head nodes, T represents a head model mesh, B s () Representing head shape parameters versus head modelInfluence function of grid T, B p () Representing the influence function of the head posture parameter on the head model grid T, B e () Representing the influence function of facial expression parameters on the head model grid T p () The function of deforming the head model grid T under the combined action of the head shape parameter, the head posture parameter and the facial expression parameter is represented, and s, p, e and omega respectively represent the head shape weight, the head posture weight, the facial expression weight and the skin weight.
6. A reconstruction device, comprising a memory, a processor;
the memory is configured to store computer program instructions and a pre-constructed parameterized header model;
the processor is configured to perform the following operations in accordance with the computer program instructions:
acquiring an original image acquired by a camera, and carrying out face recognition on the original image to obtain a face image;
extracting driving parameters from the face image, and driving a parameterized head model to move by using the extracted driving parameters to obtain a three-dimensional geometrical model of the head after driving, wherein the parameterized head model is pre-constructed based on head parameters extracted from the initial face image;
carrying out semantic segmentation on the face image to obtain each independent subarea;
for any one sub-area in each sub-area, any one pixel point is obtained from the sub-area, and a first difference value between the obtained color value of the pixel point and the color value of the mirror image pixel point of the pixel point is determined;
if the first difference value is larger than a preset color threshold value, selecting a preset number of adjacent pixel points from the target adjacent areas of the pixel points, and respectively determining second difference values of the color values of the adjacent pixel points and the color values of the mirror image pixel points;
if the number of the adjacent pixel points corresponding to the second difference value larger than the preset color threshold value in each second difference value is larger than the preset value, determining the target neighborhood as a cavity area, and complementing the texture data of the cavity area to obtain the sub-area after the texture is complemented;
fusing the sub-regions after the completion to obtain the complete texture data of the human face;
and rendering the head three-dimensional geometric model according to the complete texture data to obtain a reconstructed head dense surface model.
7. The reconstruction device of claim 6, wherein the processor complements texture data of the hole area, in particular configured to:
and supplementing texture data of the cavity area according to a preset initial color value, a color average value of the face image and a color weighting value of the subarea in the multi-frame face image.
8. The reconstruction device of claim 6 wherein the processor is further configured to:
setting the color value of a single pixel point which does not exist in the subarea and corresponds to the mirror image pixel point as a preset initial color value; and
and aiming at the effective pixel points which are corresponding to the mirror image pixel points but are not in the cavity area, determining the color value of the effective pixel points according to the color average value of the face image and the color weighting value of the subareas in the multi-frame face image.
CN202110921998.9A 2021-08-12 2021-08-12 Head three-dimensional reconstruction method and device Active CN113628327B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110921998.9A CN113628327B (en) 2021-08-12 2021-08-12 Head three-dimensional reconstruction method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110921998.9A CN113628327B (en) 2021-08-12 2021-08-12 Head three-dimensional reconstruction method and device

Publications (2)

Publication Number Publication Date
CN113628327A CN113628327A (en) 2021-11-09
CN113628327B true CN113628327B (en) 2023-07-25

Family

ID=78384685

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110921998.9A Active CN113628327B (en) 2021-08-12 2021-08-12 Head three-dimensional reconstruction method and device

Country Status (1)

Country Link
CN (1) CN113628327B (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114049464A (en) * 2021-11-15 2022-02-15 聚好看科技股份有限公司 Reconstruction method and device of three-dimensional model
CN114140580A (en) * 2021-11-22 2022-03-04 聚好看科技股份有限公司 Texture adjusting method and equipment for hand three-dimensional model
CN114373043A (en) * 2021-12-16 2022-04-19 聚好看科技股份有限公司 Head three-dimensional reconstruction method and equipment
CN114339190B (en) * 2021-12-29 2023-06-23 中国电信股份有限公司 Communication method, device, equipment and storage medium
CN114648613B (en) * 2022-05-18 2022-08-23 杭州像衍科技有限公司 Three-dimensional head model reconstruction method and device based on deformable nerve radiation field
CN115049016A (en) * 2022-07-20 2022-09-13 聚好看科技股份有限公司 Model driving method and device based on emotion recognition
CN116246014A (en) * 2022-12-28 2023-06-09 支付宝(杭州)信息技术有限公司 Image generation method and device, storage medium and electronic equipment

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108765550A (en) * 2018-05-09 2018-11-06 华南理工大学 A kind of three-dimensional facial reconstruction method based on single picture
CN109377557A (en) * 2018-11-26 2019-02-22 中山大学 Real-time three-dimensional facial reconstruction method based on single frames facial image
CN109410133A (en) * 2018-09-30 2019-03-01 北京航空航天大学青岛研究院 A kind of face texture repairing method based on 3DMM
CN110197462A (en) * 2019-04-16 2019-09-03 浙江理工大学 A kind of facial image beautifies in real time and texture synthesis method
CN111160136A (en) * 2019-12-12 2020-05-15 天目爱视(北京)科技有限公司 Standardized 3D information acquisition and measurement method and system
CN111445582A (en) * 2019-01-16 2020-07-24 南京大学 Single-image human face three-dimensional reconstruction method based on illumination prior
CN113066171A (en) * 2021-04-20 2021-07-02 南京大学 Face image generation method based on three-dimensional face deformation model

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP3994896B2 (en) * 2002-09-25 2007-10-24 コニカミノルタホールディングス株式会社 Video display device

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108765550A (en) * 2018-05-09 2018-11-06 华南理工大学 A kind of three-dimensional facial reconstruction method based on single picture
CN109410133A (en) * 2018-09-30 2019-03-01 北京航空航天大学青岛研究院 A kind of face texture repairing method based on 3DMM
CN109377557A (en) * 2018-11-26 2019-02-22 中山大学 Real-time three-dimensional facial reconstruction method based on single frames facial image
CN111445582A (en) * 2019-01-16 2020-07-24 南京大学 Single-image human face three-dimensional reconstruction method based on illumination prior
CN110197462A (en) * 2019-04-16 2019-09-03 浙江理工大学 A kind of facial image beautifies in real time and texture synthesis method
CN111160136A (en) * 2019-12-12 2020-05-15 天目爱视(北京)科技有限公司 Standardized 3D information acquisition and measurement method and system
CN113066171A (en) * 2021-04-20 2021-07-02 南京大学 Face image generation method based on three-dimensional face deformation model

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
an unobtrusive intelligent multisensory mirror for well-being status self-assessment and visualization;Henriquez P;《IEEE transactions on multimedia》;1467-1481 *

Also Published As

Publication number Publication date
CN113628327A (en) 2021-11-09

Similar Documents

Publication Publication Date Title
CN113628327B (en) Head three-dimensional reconstruction method and device
Grassal et al. Neural head avatars from monocular rgb videos
US10559111B2 (en) Systems and methods for generating computer ready animation models of a human head from captured data images
KR102616010B1 (en) System and method for photorealistic real-time human animation
Sharma et al. 3d face reconstruction in deep learning era: A survey
US10217275B2 (en) Methods and systems of performing eye reconstruction using a parametric model
WO2022143645A1 (en) Three-dimensional face reconstruction method and apparatus, device, and storage medium
US10217265B2 (en) Methods and systems of generating a parametric eye model
CN111652123B (en) Image processing and image synthesizing method, device and storage medium
US11562536B2 (en) Methods and systems for personalized 3D head model deformation
US11587288B2 (en) Methods and systems for constructing facial position map
JP7462120B2 (en) Method, system and computer program for extracting color from two-dimensional (2D) facial images
WO2023066120A1 (en) Image processing method and apparatus, electronic device, and storage medium
AU2022231680B2 (en) Techniques for re-aging faces in images and video frames
CN114450719A (en) Human body model reconstruction method, reconstruction system and storage medium
KR20230110787A (en) Methods and systems for forming personalized 3D head and face models
US11769309B2 (en) Method and system of rendering a 3D image for automated facial morphing with a learned generic head model
CN113822965A (en) Image rendering processing method, device and equipment and computer storage medium
CN114373043A (en) Head three-dimensional reconstruction method and equipment
US20210074076A1 (en) Method and system of rendering a 3d image for automated facial morphing

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant