CN114387388A - Close-range three-dimensional face reconstruction device - Google Patents

Close-range three-dimensional face reconstruction device Download PDF

Info

Publication number
CN114387388A
CN114387388A CN202111441281.0A CN202111441281A CN114387388A CN 114387388 A CN114387388 A CN 114387388A CN 202111441281 A CN202111441281 A CN 202111441281A CN 114387388 A CN114387388 A CN 114387388A
Authority
CN
China
Prior art keywords
image
face
human face
reconstruction
lower half
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111441281.0A
Other languages
Chinese (zh)
Inventor
王国伟
李宁
孙佳媛
张善秀
李鹂鹏
魏丽
聂芸
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
CETC 15 Research Institute
Original Assignee
CETC 15 Research Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by CETC 15 Research Institute filed Critical CETC 15 Research Institute
Priority to CN202111441281.0A priority Critical patent/CN114387388A/en
Publication of CN114387388A publication Critical patent/CN114387388A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three dimensional [3D] modelling, e.g. data description of 3D objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/194Segmentation; Edge detection involving foreground-background segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20212Image combination
    • G06T2207/20221Image fusion; Image merging
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person
    • G06T2207/30201Face

Abstract

The invention provides a close-range three-dimensional face reconstruction device. The device comprises: the image acquisition device comprises a first processing unit, a second processing unit and a display unit, wherein the first processing unit is configured to call the image acquisition device to simultaneously acquire a first image and a second image of a human face, the first image is an image of the lower half of the left side of the human face, and the second image is an image of the lower half of the right side of the human face; the second processing unit is configured to perform preprocessing on the first image and the second image, wherein the preprocessing comprises rectification processing and splicing processing so as to obtain a lower half fusion image of the face; a third processing unit, configured to invoke a pre-trained feature extraction model to determine feature points of the face from the lower half fused image, so as to segment the face from the background image; and the fourth processing unit is configured to call a pre-trained face reconstruction model, and construct a complete image of the face by taking the segmented face image and the feature points of the face as input.

Description

Close-range three-dimensional face reconstruction device
Technical Field
The invention belongs to the technical field of facial image reconstruction, and particularly relates to a close-range three-dimensional facial reconstruction device.
Background
Augmented reality refers to superimposing a virtual object in an observed real scene, and utilizing the virtual object to enhance a real world; the three-dimensional reconstruction is widely applied to the fields of augmented reality, virtual reality and the like. The three-dimensional face reconstruction aims at effectively restoring the real three-dimensional shape and depth information of the face through a two-dimensional image of the face, and can be applied to the fields of virtual image generation, virtual courses, special effect film production, game production and the like, for example, a three-dimensional face model is reconstructed based on key points of the face, head posture estimation and tracking, expression action capturing technology and the like, so that the interactive interest and the richness of industries such as short videos, live broadcast, mobile phones and the like can be effectively improved. Currently, there are three important types of three-dimensional face reconstruction methods: face reconstruction based on a traditional method, face reconstruction based on a model and face reconstruction based on deep learning end-to-end.
The three-dimensional reconstruction technology applied to a virtual reality/augmented reality scene generally needs to shoot RGB images at different angles of an object to obtain complete image information for three-dimensional reconstruction; or combining with a sensing device such as a stress sensor and the like, and overlapping partial image information by analyzing the change of the strain signal so as to reconstruct a complete three-dimensional model. However, this method generates various loss errors, which seriously affect the reconstruction accuracy.
The head-mounted display device adopted in the prior art can realize the animation driven by the performance of the three-dimensional face in real time; the wearable system adopts a plurality of ultra-thin flexible electronic materials, installs on the foam pad of earphone, measures the surface strain signal that corresponds with above facial expression. The strain signal is combined with a head mounted RGB-D camera to enhance tracking of the oral area and give inaccurate head mounted display device position. To map the input signal to a three-dimensional face model, a single instance off-line training is performed for each person. For re-use and accurate online operation, a brief calibration step is proposed to re-adjust the mapped gaussian mixture distribution before each use. The resulting animation is visually comparable to advanced depth sensor-driven facial expression capture systems and, therefore, is suitable for social interaction in a virtual world. Fig. 1 is a frame diagram of a head-mounted display device according to a comparative example of the present invention for implementing face reconstruction. Sensors are embedded in the foam board to capture the expression of the player's upper face by sensing the twitching of facial muscles, and a front protruding camera is used to track the user's chin and mouth movements.
From a hardware perspective, while each user requires only one complete training session to use the device, a short trial calibration is required before each use. And is not suitable for real-time three-dimensional face reconstruction. Since the pressure distribution varies with the placement position of the head-mounted display device and the head direction, the strain signal measurement may drift or slightly degrade the accuracy. Strain gauges sometimes make loose contact with the face. And too many external measuring devices add burden to the existing hardware equipment. In this case, it is difficult to accurately restore the expression around the eyebrow, and further, it is difficult to capture a subtle expression such as blinking or squinting using a strain gauge sparsely placed on a foam pad. From the software perspective, reconstruction involves face detail reconstruction of the mouth and eyes, involving many coefficients and parameters, and the reconstruction process between people is complicated.
Disclosure of Invention
In order to solve the technical problem, the application provides a close-range three-dimensional face reconstruction scheme, which includes acquiring the lower half face of a face through two fisheye cameras near a nose pad, and analyzing the lower half face expression by means of deep learning so as to restore the complete face; the reconstruction of the complete face image can be carried out only by the lower half part image of the face, and the face information is not required to be acquired from a plurality of visual angles; the reconstruction effect of deep learning is used with the expression details of the human face, so that the emotion of the human face can be accurately expressed; the whole reconstruction does not need other sensor hardware, the problem of loose contact caused by a strain gauge does not need to be considered, and the equipment is simple, light and easy to realize.
The invention discloses a close-range three-dimensional face reconstruction method in a first aspect. The method comprises the following steps:
step S1, simultaneously acquiring a first image and a second image of a human face by using an image acquisition device, wherein the first image is an image of the lower half of the left side of the human face, and the second image is an image of the lower half of the right side of the human face;
step S2, preprocessing the first image and the second image, wherein the preprocessing comprises correction processing and splicing processing to obtain a lower half fusion image of the face;
step S3, determining feature points of the human face from the lower half fusion image by using a pre-trained feature extraction model so as to segment the human face from a background image;
and step S4, constructing a complete image of the face by using the pre-trained face reconstruction model and taking the segmented face image and the feature points of the face as input.
According to the method of the first aspect of the present invention, the image capturing device includes a first fisheye image capturing device for capturing the first image and a second fisheye image capturing device for capturing the second image, the image capturing device is mounted on a wearable device, so that when the wearable device is worn, the image capturing device is located at any position of an eye fundus, a temple, an eye corner, or a nose support of the human face, and the first fisheye image capturing device and the second fisheye image capturing device are respectively located on the left and right sides of any position of the eye fundus, the temple, the eye corner, or the nose support of the human face.
According to the method of the first aspect of the present invention, in the step S2, the correction processing performed on the first image and the second image includes calculating a corrected distorted projection point P using a camera model based on an actual position point P in the real world and an imaging position point P' where distortion occurs during imaging0The camera model is as follows:
rd=fθd=θ(1+k1θ2+k2θ4+k3θ6+k4θ8) (1)
wherein r isd| OP '|, which represents the distance from the imaging position point P' to the imaging plane midpoint O, f is the focal length, θ is the P-point imaging incident angle, θ isdIs the exit angle, k, of point P 'at the imaging position point P' where distortion occurs during the imaging process1、k2、k3、k4Is a distortion parameter.
According to the method of the first aspect of the present invention, in the step S2, the splicing process includes:
extracting features in the calibrated first image and the calibrated second image, and carrying out corner point detection on the features to establish a geometric corresponding relation of the calibrated first image and the calibrated second image in the same coordinate system;
based on the geometric correspondence, deleting regions outside an overlapping region of the calibrated first image and the calibrated second image using homography matrix estimation to fuse the calibrated first image and the calibrated second image based on the overlapping region.
According to the method of the first aspect of the present invention, in said step S3:
the pre-training of the feature extraction model comprises: training the feature extraction model by utilizing an open source data set, so that the feature extraction model has the capability of extracting all feature points of a complete human face;
the feature points of the human face determined by the feature extraction model include: and extracting the feature data of the lower half part of the human face and predicting the feature data of the upper half part of the human face.
According to the method of the first aspect of the present invention, in step S4, the training environment of the pre-trained face reconstruction model is: windows10+ cuda10.2+ cudann 7.6.5+ pyrrch 1.6+ pyrrch 3d0.2, the iteration number is 300, and the number of selected samples is 48, so as to finish the pre-training of the face reconstruction model.
According to the method of the first aspect of the present invention, in step S4, in the pre-training process of the face reconstruction model, for the pre-training of the loss-coarse face reconstruction, the loss of face key points, the loss of eye closure, the loss based on photos, the loss of shape continuity, and the loss of contour regularization are used as the measurement basis, and for the pre-training of the detail face reconstruction, the loss of photo details, the loss of soft symmetry, and the loss of detail regularization are used as the measurement basis.
The invention discloses a close-range three-dimensional face reconstruction device in a second aspect. The device comprises:
the image acquisition device comprises a first processing unit, a second processing unit and a display unit, wherein the first processing unit is configured to call the image acquisition device to simultaneously acquire a first image and a second image of a human face, the first image is an image of the lower half of the left side of the human face, and the second image is an image of the lower half of the right side of the human face;
the second processing unit is configured to perform preprocessing on the first image and the second image, wherein the preprocessing comprises rectification processing and splicing processing so as to obtain a lower half fusion image of the face;
a third processing unit, configured to invoke a pre-trained feature extraction model to determine feature points of the face from the lower half fused image, so as to segment the face from the background image;
and the fourth processing unit is configured to call a pre-trained face reconstruction model, and construct a complete image of the face by taking the segmented face image and the feature points of the face as input.
According to the device of the second aspect of the present invention, the image capturing device includes a first fisheye image capturing device for capturing the first image and a second fisheye image capturing device for capturing the second image, the image capturing device is mounted on the wearable device, so that when the wearable device is worn, the image capturing device is located at any position of the fundus, temple, eye corner or nose support of the human face, and the first fisheye image capturing device and the second fisheye image capturing device are respectively located on the left and right sides of any position of the fundus, temple, eye corner or nose support of the human face.
According to the apparatus of the second aspect of the present invention, the second processing unit is specifically configured such that the correction processing performed on the first image and the second image includes calculating a corrected distorted projection point P using a camera model based on an actual position point P in the real world and an imaging position point P' where distortion is generated during imaging0The camera model is as follows:
rd=fθd=θ(1+k1θ2+k2θ4+k 6+k4θ8) (1)
wherein r isd| OP '|, which represents the distance from the imaging position point P' to the imaging plane midpoint O, f is the focal length, θ is the P-point imaging incident angle, θ isdIs the exit angle, k, of point P 'at the imaging position point P' where distortion occurs during the imaging process1、k2、k3、k4Is a distortion parameter.
According to the apparatus of the second aspect of the invention, the second processing unit is specifically configured such that the stitching process comprises:
extracting features in the calibrated first image and the calibrated second image, and carrying out corner point detection on the features to establish a geometric corresponding relation of the calibrated first image and the calibrated second image in the same coordinate system;
based on the geometric correspondence, deleting regions outside an overlapping region of the calibrated first image and the calibrated second image using homography matrix estimation to fuse the calibrated first image and the calibrated second image based on the overlapping region.
According to the apparatus of the second aspect of the present invention, the third processing unit is specifically configured to:
the pre-training of the feature extraction model comprises: training the feature extraction model by utilizing an open source data set, so that the feature extraction model has the capability of extracting all feature points of a complete human face;
the feature points of the human face determined by the feature extraction model include: and extracting the feature data of the lower half part of the human face and predicting the feature data of the upper half part of the human face.
According to the apparatus of the second aspect of the present invention, the training environment of the pre-trained face reconstruction model is: windows10+ cuda10.2+ cudann 7.6.5+ pyrrch 1.6+ pyrrch 3d0.2, the iteration number is 300, and the number of selected samples is 48, so as to finish the pre-training of the face reconstruction model.
According to the device in the second aspect of the present invention, in the pre-training process of the face reconstruction model, for the pre-training of the loss coarse face reconstruction, the loss of the key points of the face, the eye closing loss, the loss based on the picture, the continuous loss of the shape and the loss of the contour regularization are used as the measurement basis, and for the pre-training of the detail face reconstruction, the loss of the picture details, the loss of the soft symmetry and the loss of the detail regularization are used as the measurement basis.
A third aspect of the invention discloses an electronic device. The electronic device comprises a memory and a processor, the memory stores a computer program, and the processor implements the steps of the method for reconstructing a close-range three-dimensional face according to any one of the first aspect of the present disclosure when executing the computer program.
A fourth aspect of the invention discloses a computer-readable storage medium. The computer readable storage medium has stored thereon a computer program which, when executed by a processor, implements the steps of a method for reconstructing a close-range three-dimensional face according to any one of the first aspect of the present disclosure.
The technical scheme provided by the invention can effectively reduce the dependence of the system on the sensor, and the action acquisition, the three-dimensional face modeling and the data synchronization of the behavior content can be realized only by wearing VR glasses and the three-dimensional face reconstruction device described herein by a user; the glasses can be accessed into AR/VR glasses of different brands or models through a communication interface, and the transportability is high; meanwhile, the application scene range of the existing commercial hardware is not influenced, the shielding of external image acquisition equipment can be eliminated facing to an open space, and the device has stronger stability, and is attractive and portable.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the embodiments or the description in the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.
Fig. 1 is a frame diagram of a head-mounted display device according to a comparative example of the present invention for implementing face reconstruction;
fig. 2a is a flowchart of a close-range three-dimensional face reconstruction method according to an embodiment of the present invention;
fig. 2b is a first schematic view of a wearable device and an image capturing apparatus according to an embodiment of the present invention;
fig. 2c is a second schematic view of the wearable device and the image capturing apparatus according to the embodiment of the invention;
FIG. 2d is a schematic illustration of distortion correction according to an embodiment of the present invention;
FIG. 2e is a schematic diagram of stitching fusion according to an embodiment of the present invention;
FIG. 2f is an example of an image of an open source data set in accordance with an embodiment of the present invention;
FIG. 2g is a flowchart of face reconstruction according to an embodiment of the present invention;
FIG. 3a is a flowchart of a specific example one according to an embodiment of the present invention;
FIG. 3b is a flowchart of a second specific example according to an embodiment of the present invention;
FIG. 3c is a flowchart of specific example three according to an embodiment of the present invention;
FIG. 4 is a block diagram of an intelligent internal threat detection apparatus based on spatiotemporal feature fusion according to an embodiment of the present invention;
fig. 5 is a block diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Fig. 2a is a flowchart of a close-range three-dimensional face reconstruction method according to an embodiment of the present invention; as shown in fig. 2a, the method comprises:
step S1, simultaneously acquiring a first image and a second image of a human face by using an image acquisition device, wherein the first image is an image of the lower half of the left side of the human face, and the second image is an image of the lower half of the right side of the human face;
step S2, preprocessing the first image and the second image, wherein the preprocessing comprises correction processing and splicing processing to obtain a lower half fusion image of the face;
step S3, determining feature points of the human face from the lower half fusion image by using a pre-trained feature extraction model so as to segment the human face from a background image;
and step S4, constructing a complete image of the face by using the pre-trained face reconstruction model and taking the segmented face image and the feature points of the face as input.
In step S1, an image capturing device is used to capture a first image and a second image of a human face at the same time, where the first image is the image of the lower left half of the human face and the second image is the image of the lower right half of the human face.
In some embodiments, the image capturing device includes a first fisheye image capturing device for capturing the first image and a second fisheye image capturing device for capturing the second image, and the image capturing device is installed on a wearable device, so that when the wearable device is worn, the image capturing device is located at any position of a fundus of the human face, a temple, an eye corner, or a nose support, and the first fisheye image capturing device and the second fisheye image capturing device are respectively located on left and right sides of any position of the fundus of the human face, the temple, the eye corner, or the nose support. Referring to fig. 2b and fig. 2c in detail, fig. 2b is a first schematic view of a wearable device and an image capturing device according to an embodiment of the present invention, and fig. 2c is a second schematic view of the wearable device and the image capturing device according to the embodiment of the present invention.
In step S2, the first image and the second image are preprocessed, where the preprocessing includes rectification and stitching to obtain a lower half fused image of the face.
In some embodiments, in the step S2, the correction processing performed on the first image and the second image includes calculating a corrected distorted projection point P using a camera model based on an actual position point P in the real world and an imaging position point P' where distortion occurs during imaging0The camera model is as follows:
rd=fθd=θ(1+k1θ2+k2θ4+k3θ6+k4θ8) (1)
wherein r isd| OP '|, which represents the distance from the imaging position point P' to the imaging plane midpoint O, f is the focal length, θ is the P-point imaging incident angle, θ isdIs the exit angle, k, of point P 'at the imaging position point P' where distortion occurs during the imaging process1、k2、k3、k4Is a distortion parameter.
Specifically, two paths of wide-angle cameras are connected into the device, distorted images of the left half part and the right half part of the human face are obtained respectively, and distortion correction is performed on the distorted images. Taking a fisheye camera (capable of shooting a short-distance object scene of 15mm-45 mm) as an example, in reality, an equidistant projection model is generally used for analyzing distortion correction due to the distortion of fisheye images. The first five terms of the taylor expansion are generally taken to approximate the actual projection function of a wide-angle camera:
r(θ)≈k0θ+k1θ3+k2θ5+k3θ7+k4θ9
FIG. 2d is a schematic diagram of distortion correction according to an embodiment of the present invention, where there is a point P (x, y, z) in the real world, and if there is no distortion projected by the pinhole camera, the image point is P0(a, b). Assuming that the focal length f is 1, P can be obtained according to the following formula0Point coordinates and incident angle θ:
Figure RE-GDA0003512304050000101
due to the distortion, the position of the actual image point is P ' (x ', y '), and in combination with the equidistant projection function, the final wide-angle camera model is:
rd=fθd=k0θ+k1θ3+k2θ5+k3θ7+k4θ9
because f is 1, the first order coefficient k of theta0Can be approximated to 1, the final fish-eye camera model can be expressed as:
rd=fθd=θ(1+k1θ2+k2θ4+k3θ6+k4θ8)
and the similar triangle principle is adopted:
Figure RE-GDA0003512304050000102
the coordinates of the distorted point P' are obtained as follows:
Figure RE-GDA0003512304050000103
and finally, converting the points on the image plane into a pixel coordinate system by using camera internal parameters to obtain the points on the final image:
Figure RE-GDA0003512304050000104
the distortion correction process of the fisheye camera actually finds the actual incident angle theta from the known distorted image point position (x ', y'). The camera parameters are known, and θ can be obtained from (x ', y') and the camera focal lengthdSo the nature of the distortion correction is to solve a unitary high-order equation for θ:
θd=θ(1+k1θ2+k2θ4+k3θ6+k4θ8)
common methods for solving the unary high-order equation include dichotomy, stationary point iteration and Newton iteration. f (theta) ═ theta (1+ k)1θ2+k2θ4+k3θ6+k4θ8)-θdAnd (5) iterating until f (theta) is approximately equal to 0 or the iteration number reaches an upper limit.
Finally, the distortion projection point P is obtained0The coordinates of (a):
Figure RE-GDA0003512304050000111
finally, P is transformed by camera internal parameters0And (a, b) converting the pixel coordinate system to obtain undistorted pixel coordinates.
In some embodiments, in the step S2, the correction processing performed on the first image and the second image includes calculating a corrected distorted projection point P using a camera model based on an actual position point P in the real world and an imaging position point P' where distortion occurs during imaging0The camera model is as follows:
rd=fθd=θ(1+k1θ2+k2θ4+k3θ6+k4θ8) (1)
wherein r isdF is focal length, theta is incident angle of imaging at point P, and thetadIs the exit angle, k, of point P 'at the imaging position point P' where distortion occurs during the imaging process1、k2、k3、k4Is a distortion parameter.
Specifically, as shown in fig. 2e, fig. 2e is a schematic diagram of splicing and fusion according to an embodiment of the present invention. Because the corrected image 1 and the corrected image 2 have overlapping contents, the image 1 and the image 2 are spliced to obtain a complete lower half face image. The image stitching outputs the union of the two images. The basic flow is as follows: (1) the input image is loaded, and two photos captured by the camera at the same point in time are loaded. And (2) feature extraction, namely detecting feature points in all input images and carrying out corner point detection on the images by utilizing an SURF algorithm. (3) Image registration, which establishes geometric correspondence between images so that they can be transformed, compared and analyzed in a common frame of reference, in this embodiment, the correspondence is determined by a normalized cross-correlation method. (4) Homography matrices are calculated, and in homography matrix estimation, unneeded corners not belonging to overlapping regions are deleted. (5) And deforming and fusing all input images into a consistent output image. Basically, all the input images can simply be warped onto one plane, which is the composite panoramic plane.
In step S3, feature points of the face are determined from the lower half of the fused image by using a pre-trained feature extraction model, so as to segment the face from the background image.
In some embodiments, in said step S3:
the pre-training of the feature extraction model comprises: training the feature extraction model by utilizing an open source data set, so that the feature extraction model has the capability of extracting all feature points of a complete human face;
the feature points of the human face determined by the feature extraction model include: and extracting the feature data of the lower half part of the human face and predicting the feature data of the upper half part of the human face.
Specifically, the VGGFace2 data set, the BUPT-balanced face data set, and the VoxCeleb2 data set form an open source data set including 30000 face images, and the data set includes a large number of occluded face images, as shown in fig. 2f, where fig. 2f is an image example of the open source data set according to an embodiment of the present invention. And training the multi-view face image data in the data set by using a deep convolutional neural network to obtain a pre-training model. For the input user video stream data, after each frame is subjected to the above correction splicing process, 68 face 2D feature points are first detected, and then face segmentation is performed. In some embodiments, three-dimensional face marker points of a picture of the training data set are obtained using a face alignment algorithm, and a face segmentation result is obtained using a face segmentation algorithm.
In step S4, a pre-trained face reconstruction model is used to construct a complete image of the face by using the segmented face image and the feature points of the face as input.
In some embodiments, in the step S4, the training environment of the pre-trained face reconstruction model is: windows10+ cuda10.2+ cudann 7.6.5+ pyrrch 1.6+ pyrrch 3d0.2, the iteration number is 300, and the number of selected samples is 48, so as to finish the pre-training of the face reconstruction model.
In some embodiments, in the step S4, in the pre-training process of the face reconstruction model, the pre-training for the loss-coarse face reconstruction is measured by face key point loss, eye closing loss, photo-based loss, shape continuity loss, and contour regularization loss, and the pre-training for the detail face reconstruction is measured by photo detail loss, soft symmetry loss, and detail regularization loss.
Specifically, in the reconstruction stage, based on the segmented face image and the feature points of the face, parameters such as predicted face details, shape, albedo, expression, posture, illumination and the like are output by using a trained face reconstruction model, and are input into a FLAME (a lightweight and powerful universal head model which can be obtained from 33,000 times of 3D scanning with accurate alignment), a linear identification shape space (trained by head scanning of 3800 subjects) is combined with a hinged neck, a lower jaw, an eyeball, a posture-related correction mixed shape and other global expression mixed shapes) model to obtain vertex data of the face, and finally a face three-dimensional model file is obtained, so that the reconstruction process of the face is completed. As shown in fig. 2g, a deep convolutional neural network is used to train the multi-view face image data in the data set, so as to obtain a training model. And (3) outputting the predicted human face details, shape, albedo, expression, posture, illumination parameters and the like by the training model, inputting the parameters into the FLAME model to obtain vertex data, a triangular patch and texture characteristics, and writing the three-dimensional information into an obj file to obtain a human face three-dimensional model file.
Specific examples
FIG. 3a is a flowchart of a specific example one according to an embodiment of the present invention; FIG. 3b is a flowchart of a second specific example according to an embodiment of the present invention; fig. 3c is a flowchart of specific example three according to the embodiment of the present invention.
Specifically, in example two:
starting up and opening a camera and a reconstruction system switch;
the angle of the fisheye camera is 180 mm, and the focal length is 15 mm;
carrying out fisheye camera distortion correction on the obtained face picture;
the splicing uses panoramic splicing;
the three-dimensional face reconstruction code training phase operating environment based on deep learning is as follows:
windows10+ cuda10.2+ cudann 7.6.5+ pytorch1.6+ pytorch3d0.2, 3 ten thousand pieces of face image data are used for training in the early stage, the iteration is carried out for 300 times, and the batch is 48;
and applying the trained model to model reasoning for face reconstruction. The operating environment of the inference model is windows10+ cuda10.2+ cudann 7.6.5+ pyrrch 1.6+ pyrrch 3d0.2;
three-dimensional face driving data is obtained, and the data can be used in a teleconference process.
Specifically, in example three:
starting up and opening a camera and a reconstruction system switch;
the fisheye camera angle is 180;
carrying out fisheye camera distortion correction on the obtained face picture;
the splicing uses panoramic splicing;
the three-dimensional face reconstruction code training phase operating environment based on deep learning is as follows:
windows10+ cuda10.2+ cudann 7.6.5+ pytorch1.6+ pytorch3d0.2, 3 ten thousand pieces of face image data are used for training in the early stage, the iteration is carried out for 300 times, and the batch is 48;
and applying the trained model to model reasoning for face reconstruction. The operating environment of the inference model is windows10+ cuda10.2+ cudann 7.6.5+ pyrrch 1.6+ pyrrch 3d0.2;
a three-dimensional face model of the actor is obtained.
The invention discloses a close-range three-dimensional face reconstruction device in a second aspect. FIG. 4 is a block diagram of an intelligent internal threat detection apparatus based on spatiotemporal feature fusion according to an embodiment of the present invention; as shown in fig. 4, the apparatus 400 includes:
a first processing unit 401, configured to invoke an image acquisition device to simultaneously acquire a first image and a second image of a human face, where the first image is an image of a left lower half of the human face, and the second image is an image of a right lower half of the human face;
a second processing unit 402, configured to perform preprocessing on the first image and the second image, where the preprocessing includes rectification processing and stitching processing to obtain a lower half fusion image of the face;
a third processing unit 403, configured to invoke a pre-trained feature extraction model to determine feature points of the face from the lower half fused image, so as to segment the face from the background image;
a fourth processing unit 404, configured to invoke a pre-trained face reconstruction model, and construct a complete image of the face by using the segmented face image and the feature points of the face as input.
According to the device of the second aspect of the present invention, the image capturing device includes a first fisheye image capturing device for capturing the first image and a second fisheye image capturing device for capturing the second image, the image capturing device is mounted on the wearable device, so that when the wearable device is worn, the image capturing device is located at any position of the fundus, temple, eye corner or nose support of the human face, and the first fisheye image capturing device and the second fisheye image capturing device are respectively located on the left and right sides of any position of the fundus, temple, eye corner or nose support of the human face.
According to the apparatus of the second aspect of the present invention, the second processing unit 402 is specifically configured such that the correction processing performed on the first image and the second image includes calculating a corrected distorted projection point P using a camera model based on an actual position point P in the real world and an imaging position point P' where distortion is generated during imaging0The camera model is as follows:
rd=fθd=θ(1+k1θ2+k2θ4+k3θ6+k4θ8) (1)
wherein r isd| OP '|, which represents the distance from the imaging position point P' to the imaging plane midpoint O, f is the focal length, θ is the P-point imaging incident angle, θ isdIs the exit angle, k, of point P 'at the imaging position point P' where distortion occurs during the imaging process1、k2、k3、k4Is a distortion parameter.
According to the apparatus of the second aspect of the present invention, the second processing unit 402 is specifically configured to, the splicing process includes:
extracting features in the calibrated first image and the calibrated second image, and carrying out corner point detection on the features to establish a geometric corresponding relation of the calibrated first image and the calibrated second image in the same coordinate system;
based on the geometric correspondence, deleting regions outside an overlapping region of the calibrated first image and the calibrated second image using homography matrix estimation to fuse the calibrated first image and the calibrated second image based on the overlapping region.
According to the apparatus of the second aspect of the present invention, the third processing unit 403 is specifically configured to:
the pre-training of the feature extraction model comprises: training the feature extraction model by utilizing an open source data set, so that the feature extraction model has the capability of extracting all feature points of a complete human face;
the feature points of the human face determined by the feature extraction model include: and extracting the feature data of the lower half part of the human face and predicting the feature data of the upper half part of the human face.
According to the apparatus of the second aspect of the present invention, the training environment of the pre-trained face reconstruction model is: windows10+ cuda10.2+ cudann 7.6.5+ pyrrch 1.6+ pyrrch 3d0.2, the iteration number is 300, and the number of selected samples is 48, so as to finish the pre-training of the face reconstruction model.
According to the device in the second aspect of the present invention, in the pre-training process of the face reconstruction model, for the pre-training of the loss coarse face reconstruction, the loss of the key points of the face, the eye closing loss, the loss based on the picture, the continuous loss of the shape and the loss of the contour regularization are used as the measurement basis, and for the pre-training of the detail face reconstruction, the loss of the picture details, the loss of the soft symmetry and the loss of the detail regularization are used as the measurement basis.
A third aspect of the invention discloses an electronic device. The electronic device comprises a memory and a processor, the memory stores a computer program, and the processor implements the steps of the method for reconstructing a close-range three-dimensional face according to any one of the first aspect of the present disclosure when executing the computer program.
Fig. 5 is a block diagram of an electronic device according to an embodiment of the present invention, and as shown in fig. 5, the electronic device includes a processor, a memory, a communication interface, a display screen, and an input device, which are connected by a system bus. Wherein the processor of the electronic device is configured to provide computing and control capabilities. The memory of the electronic equipment comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The communication interface of the electronic device is used for carrying out wired or wireless communication with an external terminal, and the wireless communication can be realized through WIFI, an operator network, Near Field Communication (NFC) or other technologies. The display screen of the electronic equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the electronic equipment can be a touch layer covered on the display screen, a key, a track ball or a touch pad arranged on the shell of the electronic equipment, an external keyboard, a touch pad or a mouse and the like.
It will be understood by those skilled in the art that the structure shown in fig. 5 is only a partial block diagram related to the technical solution of the present disclosure, and does not constitute a limitation of the electronic device to which the solution of the present application is applied, and a specific electronic device may include more or less components than those shown in the drawings, or combine some components, or have a different arrangement of components.
A fourth aspect of the invention discloses a computer-readable storage medium. The computer readable storage medium has stored thereon a computer program which, when executed by a processor, implements the steps of a method for reconstructing a close-range three-dimensional face according to any one of the first aspect of the present disclosure.
To sum up. The invention obtains the lower half face of the face through two fisheye cameras near the nose pads, and then the lower half face expression is analyzed by utilizing deep learning so as to restore the complete face; the reconstruction of the complete face image can be carried out only by the lower half part image of the face, and the face information is not required to be acquired from a plurality of visual angles; the reconstruction effect of deep learning is used with the expression details of the human face, so that the emotion of the human face can be accurately expressed; the whole reconstruction does not need other sensor hardware, the problem of loose contact caused by a strain gauge does not need to be considered, and the equipment is simple, light and easy to realize.
The technical scheme provided by the invention can effectively reduce the dependence of the system on the sensor, and the action acquisition, the three-dimensional face modeling and the data synchronization of the behavior content can be realized only by wearing VR glasses and the three-dimensional face reconstruction device described herein by a user; the glasses can be accessed into AR/VR glasses of different brands or models through a communication interface, and the transportability is high; meanwhile, the application scene range of the existing commercial hardware is not influenced, the shielding of external image acquisition equipment can be eliminated facing to an open space, and the device has stronger stability, and is attractive and portable.
It should be noted that the technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, however, as long as there is no contradiction between the combinations of the technical features, the scope of the present description should be considered. The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims (10)

1. A close-range three-dimensional face reconstruction method is characterized by comprising the following steps:
step S1, simultaneously acquiring a first image and a second image of a human face by using an image acquisition device, wherein the first image is an image of the lower half of the left side of the human face, and the second image is an image of the lower half of the right side of the human face;
step S2, preprocessing the first image and the second image, wherein the preprocessing comprises correction processing and splicing processing to obtain a lower half fusion image of the face;
step S3, determining feature points of the human face from the lower half fusion image by using a pre-trained feature extraction model so as to segment the human face from a background image;
and step S4, constructing a complete image of the face by using the pre-trained face reconstruction model and taking the segmented face image and the feature points of the face as input.
2. The close-range three-dimensional human face reconstruction method according to claim 1, wherein the image acquisition device comprises a first fisheye image collector for collecting the first image and a second fisheye image collector for collecting the second image, the image acquisition device is mounted on a wearable device, so that when the wearable device is worn, the image acquisition device is located at any position of the fundus of the human face, the temple, the eye corner or the nose support, and the first fisheye image collector and the second fisheye image collector are respectively located at the left side and the right side of any position of the fundus of the human face, the temple, the eye corner or the nose support.
3. The method according to claim 2, wherein the correction processing performed on the first image and the second image in step S2 includes calculating a corrected distorted projection point P using a camera model based on an actual position point P in the real world and an imaging position point P' where distortion occurs during imaging0The camera model is as follows:
rd=fθd=θ(1+k1θ2+k2θ4+k3θ6+k4θ8) (1)
wherein r isd| OP '|, which represents the distance from the imaging position point P' to the imaging plane midpoint O, f is the focal length, θ is the P-point imaging incident angle, θ isdIs the exit angle, k, of point P 'at the imaging position point P' where distortion occurs during the imaging process1、k2、k3、k4Is a distortion parameter.
4. The close-range three-dimensional face reconstruction method according to claim 3, wherein in the step S2, the stitching process comprises:
extracting features in the calibrated first image and the calibrated second image, and carrying out corner point detection on the features to establish a geometric corresponding relation of the calibrated first image and the calibrated second image in the same coordinate system;
based on the geometric correspondence, deleting regions outside an overlapping region of the calibrated first image and the calibrated second image using homography matrix estimation to fuse the calibrated first image and the calibrated second image based on the overlapping region.
5. The close-range three-dimensional face reconstruction method according to claim 4, wherein in the step S3:
the pre-training of the feature extraction model comprises: training the feature extraction model by utilizing an open source data set, so that the feature extraction model has the capability of extracting all feature points of a complete human face;
the feature points of the human face determined by the feature extraction model include: and extracting the feature data of the lower half part of the human face and predicting the feature data of the upper half part of the human face.
6. The method for reconstructing three-dimensional near view face as claimed in claim 5, wherein in said step S4, the training environment of the pre-trained face reconstruction model is: windows10+ cuda10.2+ cudann 7.6.5+ pyrrch 1.6+ pyrrch 3d0.2, the iteration number is 300, and the number of selected samples is 48, so as to finish the pre-training of the face reconstruction model.
7. The method according to claim 6, wherein in the step S4, in the pre-training process of the face reconstruction model, the pre-training of the loss-coarse face reconstruction is measured by face key point loss, eye closure loss, photo-based loss, shape continuity loss and contour regularization loss, and the pre-training of the detail face reconstruction is measured by photo detail loss, soft symmetry loss and detail regularization loss.
8. A close-range three-dimensional face reconstruction apparatus, the apparatus comprising:
the image acquisition device comprises a first processing unit, a second processing unit and a display unit, wherein the first processing unit is configured to call the image acquisition device to simultaneously acquire a first image and a second image of a human face, the first image is an image of the lower half of the left side of the human face, and the second image is an image of the lower half of the right side of the human face;
the second processing unit is configured to perform preprocessing on the first image and the second image, wherein the preprocessing comprises rectification processing and splicing processing so as to obtain a lower half fusion image of the face;
a third processing unit, configured to invoke a pre-trained feature extraction model to determine feature points of the face from the lower half fused image, so as to segment the face from the background image;
and the fourth processing unit is configured to call a pre-trained face reconstruction model, and construct a complete image of the face by taking the segmented face image and the feature points of the face as input.
9. An electronic device, comprising a memory storing a computer program and a processor, wherein the processor implements the steps of the method for reconstructing a three-dimensional face from a close-range view according to any one of claims 1 to 7 when executing the computer program.
10. A computer-readable storage medium, having stored thereon a computer program which, when executed by a processor, implements the steps of a method for three-dimensional face reconstruction from close range according to any one of claims 1 to 7.
CN202111441281.0A 2021-11-30 2021-11-30 Close-range three-dimensional face reconstruction device Pending CN114387388A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111441281.0A CN114387388A (en) 2021-11-30 2021-11-30 Close-range three-dimensional face reconstruction device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111441281.0A CN114387388A (en) 2021-11-30 2021-11-30 Close-range three-dimensional face reconstruction device

Publications (1)

Publication Number Publication Date
CN114387388A true CN114387388A (en) 2022-04-22

Family

ID=81196667

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111441281.0A Pending CN114387388A (en) 2021-11-30 2021-11-30 Close-range three-dimensional face reconstruction device

Country Status (1)

Country Link
CN (1) CN114387388A (en)

Similar Documents

Publication Publication Date Title
US10679046B1 (en) Machine learning systems and methods of estimating body shape from images
JP6074494B2 (en) Shape recognition device, shape recognition program, and shape recognition method
CN110807364B (en) Modeling and capturing method and system for three-dimensional face and eyeball motion
US20140254939A1 (en) Apparatus and method for outputting information on facial expression
CN111932678B (en) Multi-view real-time human motion, gesture, expression and texture reconstruction system
CN111710036B (en) Method, device, equipment and storage medium for constructing three-dimensional face model
KR20170008638A (en) Three dimensional content producing apparatus and three dimensional content producing method thereof
WO2019164498A1 (en) Methods, devices and computer program products for global bundle adjustment of 3d images
WO2023109753A1 (en) Animation generation method and apparatus for virtual character, and storage medium and terminal
CN108090463B (en) Object control method, device, storage medium and computer equipment
JP2008102902A (en) Visual line direction estimation device, visual line direction estimation method, and program for making computer execute visual line direction estimation method
WO2024007478A1 (en) Three-dimensional human body modeling data collection and reconstruction method and system based on single mobile phone
WO2021052208A1 (en) Auxiliary photographing device for movement disorder disease analysis, control method and apparatus
US11928778B2 (en) Method for human body model reconstruction and reconstruction system
CN111553284A (en) Face image processing method and device, computer equipment and storage medium
WO2022174594A1 (en) Multi-camera-based bare hand tracking and display method and system, and apparatus
CN112348937A (en) Face image processing method and electronic equipment
US11120624B2 (en) Three-dimensional head portrait generating method and electronic device
CN108549484B (en) Man-machine interaction method and device based on human body dynamic posture
CN115482359A (en) Method for measuring size of object, electronic device and medium thereof
CN110675413B (en) Three-dimensional face model construction method and device, computer equipment and storage medium
CN114387388A (en) Close-range three-dimensional face reconstruction device
WO2022180575A1 (en) Three-dimensional (3d) human modeling under specific body-fitting of clothes
CN107478227B (en) Interactive large space positioning algorithm
CN115410242A (en) Sight estimation method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination