CN114387388A - Close-range three-dimensional face reconstruction device - Google Patents
Close-range three-dimensional face reconstruction device Download PDFInfo
- Publication number
- CN114387388A CN114387388A CN202111441281.0A CN202111441281A CN114387388A CN 114387388 A CN114387388 A CN 114387388A CN 202111441281 A CN202111441281 A CN 202111441281A CN 114387388 A CN114387388 A CN 114387388A
- Authority
- CN
- China
- Prior art keywords
- image
- face
- human face
- reconstruction
- lower half
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T17/00—Three dimensional [3D] modelling, e.g. data description of 3D objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/194—Segmentation; Edge detection involving foreground-background segmentation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20212—Image combination
- G06T2207/20221—Image fusion; Image merging
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30196—Human being; Person
- G06T2207/30201—Face
Abstract
The invention provides a close-range three-dimensional face reconstruction device. The device comprises: the image acquisition device comprises a first processing unit, a second processing unit and a display unit, wherein the first processing unit is configured to call the image acquisition device to simultaneously acquire a first image and a second image of a human face, the first image is an image of the lower half of the left side of the human face, and the second image is an image of the lower half of the right side of the human face; the second processing unit is configured to perform preprocessing on the first image and the second image, wherein the preprocessing comprises rectification processing and splicing processing so as to obtain a lower half fusion image of the face; a third processing unit, configured to invoke a pre-trained feature extraction model to determine feature points of the face from the lower half fused image, so as to segment the face from the background image; and the fourth processing unit is configured to call a pre-trained face reconstruction model, and construct a complete image of the face by taking the segmented face image and the feature points of the face as input.
Description
Technical Field
The invention belongs to the technical field of facial image reconstruction, and particularly relates to a close-range three-dimensional facial reconstruction device.
Background
Augmented reality refers to superimposing a virtual object in an observed real scene, and utilizing the virtual object to enhance a real world; the three-dimensional reconstruction is widely applied to the fields of augmented reality, virtual reality and the like. The three-dimensional face reconstruction aims at effectively restoring the real three-dimensional shape and depth information of the face through a two-dimensional image of the face, and can be applied to the fields of virtual image generation, virtual courses, special effect film production, game production and the like, for example, a three-dimensional face model is reconstructed based on key points of the face, head posture estimation and tracking, expression action capturing technology and the like, so that the interactive interest and the richness of industries such as short videos, live broadcast, mobile phones and the like can be effectively improved. Currently, there are three important types of three-dimensional face reconstruction methods: face reconstruction based on a traditional method, face reconstruction based on a model and face reconstruction based on deep learning end-to-end.
The three-dimensional reconstruction technology applied to a virtual reality/augmented reality scene generally needs to shoot RGB images at different angles of an object to obtain complete image information for three-dimensional reconstruction; or combining with a sensing device such as a stress sensor and the like, and overlapping partial image information by analyzing the change of the strain signal so as to reconstruct a complete three-dimensional model. However, this method generates various loss errors, which seriously affect the reconstruction accuracy.
The head-mounted display device adopted in the prior art can realize the animation driven by the performance of the three-dimensional face in real time; the wearable system adopts a plurality of ultra-thin flexible electronic materials, installs on the foam pad of earphone, measures the surface strain signal that corresponds with above facial expression. The strain signal is combined with a head mounted RGB-D camera to enhance tracking of the oral area and give inaccurate head mounted display device position. To map the input signal to a three-dimensional face model, a single instance off-line training is performed for each person. For re-use and accurate online operation, a brief calibration step is proposed to re-adjust the mapped gaussian mixture distribution before each use. The resulting animation is visually comparable to advanced depth sensor-driven facial expression capture systems and, therefore, is suitable for social interaction in a virtual world. Fig. 1 is a frame diagram of a head-mounted display device according to a comparative example of the present invention for implementing face reconstruction. Sensors are embedded in the foam board to capture the expression of the player's upper face by sensing the twitching of facial muscles, and a front protruding camera is used to track the user's chin and mouth movements.
From a hardware perspective, while each user requires only one complete training session to use the device, a short trial calibration is required before each use. And is not suitable for real-time three-dimensional face reconstruction. Since the pressure distribution varies with the placement position of the head-mounted display device and the head direction, the strain signal measurement may drift or slightly degrade the accuracy. Strain gauges sometimes make loose contact with the face. And too many external measuring devices add burden to the existing hardware equipment. In this case, it is difficult to accurately restore the expression around the eyebrow, and further, it is difficult to capture a subtle expression such as blinking or squinting using a strain gauge sparsely placed on a foam pad. From the software perspective, reconstruction involves face detail reconstruction of the mouth and eyes, involving many coefficients and parameters, and the reconstruction process between people is complicated.
Disclosure of Invention
In order to solve the technical problem, the application provides a close-range three-dimensional face reconstruction scheme, which includes acquiring the lower half face of a face through two fisheye cameras near a nose pad, and analyzing the lower half face expression by means of deep learning so as to restore the complete face; the reconstruction of the complete face image can be carried out only by the lower half part image of the face, and the face information is not required to be acquired from a plurality of visual angles; the reconstruction effect of deep learning is used with the expression details of the human face, so that the emotion of the human face can be accurately expressed; the whole reconstruction does not need other sensor hardware, the problem of loose contact caused by a strain gauge does not need to be considered, and the equipment is simple, light and easy to realize.
The invention discloses a close-range three-dimensional face reconstruction method in a first aspect. The method comprises the following steps:
step S1, simultaneously acquiring a first image and a second image of a human face by using an image acquisition device, wherein the first image is an image of the lower half of the left side of the human face, and the second image is an image of the lower half of the right side of the human face;
step S2, preprocessing the first image and the second image, wherein the preprocessing comprises correction processing and splicing processing to obtain a lower half fusion image of the face;
step S3, determining feature points of the human face from the lower half fusion image by using a pre-trained feature extraction model so as to segment the human face from a background image;
and step S4, constructing a complete image of the face by using the pre-trained face reconstruction model and taking the segmented face image and the feature points of the face as input.
According to the method of the first aspect of the present invention, the image capturing device includes a first fisheye image capturing device for capturing the first image and a second fisheye image capturing device for capturing the second image, the image capturing device is mounted on a wearable device, so that when the wearable device is worn, the image capturing device is located at any position of an eye fundus, a temple, an eye corner, or a nose support of the human face, and the first fisheye image capturing device and the second fisheye image capturing device are respectively located on the left and right sides of any position of the eye fundus, the temple, the eye corner, or the nose support of the human face.
According to the method of the first aspect of the present invention, in the step S2, the correction processing performed on the first image and the second image includes calculating a corrected distorted projection point P using a camera model based on an actual position point P in the real world and an imaging position point P' where distortion occurs during imaging0The camera model is as follows:
rd=fθd=θ(1+k1θ2+k2θ4+k3θ6+k4θ8) (1)
wherein r isd| OP '|, which represents the distance from the imaging position point P' to the imaging plane midpoint O, f is the focal length, θ is the P-point imaging incident angle, θ isdIs the exit angle, k, of point P 'at the imaging position point P' where distortion occurs during the imaging process1、k2、k3、k4Is a distortion parameter.
According to the method of the first aspect of the present invention, in the step S2, the splicing process includes:
extracting features in the calibrated first image and the calibrated second image, and carrying out corner point detection on the features to establish a geometric corresponding relation of the calibrated first image and the calibrated second image in the same coordinate system;
based on the geometric correspondence, deleting regions outside an overlapping region of the calibrated first image and the calibrated second image using homography matrix estimation to fuse the calibrated first image and the calibrated second image based on the overlapping region.
According to the method of the first aspect of the present invention, in said step S3:
the pre-training of the feature extraction model comprises: training the feature extraction model by utilizing an open source data set, so that the feature extraction model has the capability of extracting all feature points of a complete human face;
the feature points of the human face determined by the feature extraction model include: and extracting the feature data of the lower half part of the human face and predicting the feature data of the upper half part of the human face.
According to the method of the first aspect of the present invention, in step S4, the training environment of the pre-trained face reconstruction model is: windows10+ cuda10.2+ cudann 7.6.5+ pyrrch 1.6+ pyrrch 3d0.2, the iteration number is 300, and the number of selected samples is 48, so as to finish the pre-training of the face reconstruction model.
According to the method of the first aspect of the present invention, in step S4, in the pre-training process of the face reconstruction model, for the pre-training of the loss-coarse face reconstruction, the loss of face key points, the loss of eye closure, the loss based on photos, the loss of shape continuity, and the loss of contour regularization are used as the measurement basis, and for the pre-training of the detail face reconstruction, the loss of photo details, the loss of soft symmetry, and the loss of detail regularization are used as the measurement basis.
The invention discloses a close-range three-dimensional face reconstruction device in a second aspect. The device comprises:
the image acquisition device comprises a first processing unit, a second processing unit and a display unit, wherein the first processing unit is configured to call the image acquisition device to simultaneously acquire a first image and a second image of a human face, the first image is an image of the lower half of the left side of the human face, and the second image is an image of the lower half of the right side of the human face;
the second processing unit is configured to perform preprocessing on the first image and the second image, wherein the preprocessing comprises rectification processing and splicing processing so as to obtain a lower half fusion image of the face;
a third processing unit, configured to invoke a pre-trained feature extraction model to determine feature points of the face from the lower half fused image, so as to segment the face from the background image;
and the fourth processing unit is configured to call a pre-trained face reconstruction model, and construct a complete image of the face by taking the segmented face image and the feature points of the face as input.
According to the device of the second aspect of the present invention, the image capturing device includes a first fisheye image capturing device for capturing the first image and a second fisheye image capturing device for capturing the second image, the image capturing device is mounted on the wearable device, so that when the wearable device is worn, the image capturing device is located at any position of the fundus, temple, eye corner or nose support of the human face, and the first fisheye image capturing device and the second fisheye image capturing device are respectively located on the left and right sides of any position of the fundus, temple, eye corner or nose support of the human face.
According to the apparatus of the second aspect of the present invention, the second processing unit is specifically configured such that the correction processing performed on the first image and the second image includes calculating a corrected distorted projection point P using a camera model based on an actual position point P in the real world and an imaging position point P' where distortion is generated during imaging0The camera model is as follows:
rd=fθd=θ(1+k1θ2+k2θ4+k3θ 6+k4θ8) (1)
wherein r isd| OP '|, which represents the distance from the imaging position point P' to the imaging plane midpoint O, f is the focal length, θ is the P-point imaging incident angle, θ isdIs the exit angle, k, of point P 'at the imaging position point P' where distortion occurs during the imaging process1、k2、k3、k4Is a distortion parameter.
According to the apparatus of the second aspect of the invention, the second processing unit is specifically configured such that the stitching process comprises:
extracting features in the calibrated first image and the calibrated second image, and carrying out corner point detection on the features to establish a geometric corresponding relation of the calibrated first image and the calibrated second image in the same coordinate system;
based on the geometric correspondence, deleting regions outside an overlapping region of the calibrated first image and the calibrated second image using homography matrix estimation to fuse the calibrated first image and the calibrated second image based on the overlapping region.
According to the apparatus of the second aspect of the present invention, the third processing unit is specifically configured to:
the pre-training of the feature extraction model comprises: training the feature extraction model by utilizing an open source data set, so that the feature extraction model has the capability of extracting all feature points of a complete human face;
the feature points of the human face determined by the feature extraction model include: and extracting the feature data of the lower half part of the human face and predicting the feature data of the upper half part of the human face.
According to the apparatus of the second aspect of the present invention, the training environment of the pre-trained face reconstruction model is: windows10+ cuda10.2+ cudann 7.6.5+ pyrrch 1.6+ pyrrch 3d0.2, the iteration number is 300, and the number of selected samples is 48, so as to finish the pre-training of the face reconstruction model.
According to the device in the second aspect of the present invention, in the pre-training process of the face reconstruction model, for the pre-training of the loss coarse face reconstruction, the loss of the key points of the face, the eye closing loss, the loss based on the picture, the continuous loss of the shape and the loss of the contour regularization are used as the measurement basis, and for the pre-training of the detail face reconstruction, the loss of the picture details, the loss of the soft symmetry and the loss of the detail regularization are used as the measurement basis.
A third aspect of the invention discloses an electronic device. The electronic device comprises a memory and a processor, the memory stores a computer program, and the processor implements the steps of the method for reconstructing a close-range three-dimensional face according to any one of the first aspect of the present disclosure when executing the computer program.
A fourth aspect of the invention discloses a computer-readable storage medium. The computer readable storage medium has stored thereon a computer program which, when executed by a processor, implements the steps of a method for reconstructing a close-range three-dimensional face according to any one of the first aspect of the present disclosure.
The technical scheme provided by the invention can effectively reduce the dependence of the system on the sensor, and the action acquisition, the three-dimensional face modeling and the data synchronization of the behavior content can be realized only by wearing VR glasses and the three-dimensional face reconstruction device described herein by a user; the glasses can be accessed into AR/VR glasses of different brands or models through a communication interface, and the transportability is high; meanwhile, the application scene range of the existing commercial hardware is not influenced, the shielding of external image acquisition equipment can be eliminated facing to an open space, and the device has stronger stability, and is attractive and portable.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the embodiments or the description in the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.
Fig. 1 is a frame diagram of a head-mounted display device according to a comparative example of the present invention for implementing face reconstruction;
fig. 2a is a flowchart of a close-range three-dimensional face reconstruction method according to an embodiment of the present invention;
fig. 2b is a first schematic view of a wearable device and an image capturing apparatus according to an embodiment of the present invention;
fig. 2c is a second schematic view of the wearable device and the image capturing apparatus according to the embodiment of the invention;
FIG. 2d is a schematic illustration of distortion correction according to an embodiment of the present invention;
FIG. 2e is a schematic diagram of stitching fusion according to an embodiment of the present invention;
FIG. 2f is an example of an image of an open source data set in accordance with an embodiment of the present invention;
FIG. 2g is a flowchart of face reconstruction according to an embodiment of the present invention;
FIG. 3a is a flowchart of a specific example one according to an embodiment of the present invention;
FIG. 3b is a flowchart of a second specific example according to an embodiment of the present invention;
FIG. 3c is a flowchart of specific example three according to an embodiment of the present invention;
FIG. 4 is a block diagram of an intelligent internal threat detection apparatus based on spatiotemporal feature fusion according to an embodiment of the present invention;
fig. 5 is a block diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Fig. 2a is a flowchart of a close-range three-dimensional face reconstruction method according to an embodiment of the present invention; as shown in fig. 2a, the method comprises:
step S1, simultaneously acquiring a first image and a second image of a human face by using an image acquisition device, wherein the first image is an image of the lower half of the left side of the human face, and the second image is an image of the lower half of the right side of the human face;
step S2, preprocessing the first image and the second image, wherein the preprocessing comprises correction processing and splicing processing to obtain a lower half fusion image of the face;
step S3, determining feature points of the human face from the lower half fusion image by using a pre-trained feature extraction model so as to segment the human face from a background image;
and step S4, constructing a complete image of the face by using the pre-trained face reconstruction model and taking the segmented face image and the feature points of the face as input.
In step S1, an image capturing device is used to capture a first image and a second image of a human face at the same time, where the first image is the image of the lower left half of the human face and the second image is the image of the lower right half of the human face.
In some embodiments, the image capturing device includes a first fisheye image capturing device for capturing the first image and a second fisheye image capturing device for capturing the second image, and the image capturing device is installed on a wearable device, so that when the wearable device is worn, the image capturing device is located at any position of a fundus of the human face, a temple, an eye corner, or a nose support, and the first fisheye image capturing device and the second fisheye image capturing device are respectively located on left and right sides of any position of the fundus of the human face, the temple, the eye corner, or the nose support. Referring to fig. 2b and fig. 2c in detail, fig. 2b is a first schematic view of a wearable device and an image capturing device according to an embodiment of the present invention, and fig. 2c is a second schematic view of the wearable device and the image capturing device according to the embodiment of the present invention.
In step S2, the first image and the second image are preprocessed, where the preprocessing includes rectification and stitching to obtain a lower half fused image of the face.
In some embodiments, in the step S2, the correction processing performed on the first image and the second image includes calculating a corrected distorted projection point P using a camera model based on an actual position point P in the real world and an imaging position point P' where distortion occurs during imaging0The camera model is as follows:
rd=fθd=θ(1+k1θ2+k2θ4+k3θ6+k4θ8) (1)
wherein r isd| OP '|, which represents the distance from the imaging position point P' to the imaging plane midpoint O, f is the focal length, θ is the P-point imaging incident angle, θ isdIs the exit angle, k, of point P 'at the imaging position point P' where distortion occurs during the imaging process1、k2、k3、k4Is a distortion parameter.
Specifically, two paths of wide-angle cameras are connected into the device, distorted images of the left half part and the right half part of the human face are obtained respectively, and distortion correction is performed on the distorted images. Taking a fisheye camera (capable of shooting a short-distance object scene of 15mm-45 mm) as an example, in reality, an equidistant projection model is generally used for analyzing distortion correction due to the distortion of fisheye images. The first five terms of the taylor expansion are generally taken to approximate the actual projection function of a wide-angle camera:
r(θ)≈k0θ+k1θ3+k2θ5+k3θ7+k4θ9
FIG. 2d is a schematic diagram of distortion correction according to an embodiment of the present invention, where there is a point P (x, y, z) in the real world, and if there is no distortion projected by the pinhole camera, the image point is P0(a, b). Assuming that the focal length f is 1, P can be obtained according to the following formula0Point coordinates and incident angle θ:
due to the distortion, the position of the actual image point is P ' (x ', y '), and in combination with the equidistant projection function, the final wide-angle camera model is:
rd=fθd=k0θ+k1θ3+k2θ5+k3θ7+k4θ9
because f is 1, the first order coefficient k of theta0Can be approximated to 1, the final fish-eye camera model can be expressed as:
rd=fθd=θ(1+k1θ2+k2θ4+k3θ6+k4θ8)
and the similar triangle principle is adopted:
the coordinates of the distorted point P' are obtained as follows:
and finally, converting the points on the image plane into a pixel coordinate system by using camera internal parameters to obtain the points on the final image:
the distortion correction process of the fisheye camera actually finds the actual incident angle theta from the known distorted image point position (x ', y'). The camera parameters are known, and θ can be obtained from (x ', y') and the camera focal lengthdSo the nature of the distortion correction is to solve a unitary high-order equation for θ:
θd=θ(1+k1θ2+k2θ4+k3θ6+k4θ8)
common methods for solving the unary high-order equation include dichotomy, stationary point iteration and Newton iteration. f (theta) ═ theta (1+ k)1θ2+k2θ4+k3θ6+k4θ8)-θdAnd (5) iterating until f (theta) is approximately equal to 0 or the iteration number reaches an upper limit.
Finally, the distortion projection point P is obtained0The coordinates of (a):
finally, P is transformed by camera internal parameters0And (a, b) converting the pixel coordinate system to obtain undistorted pixel coordinates.
In some embodiments, in the step S2, the correction processing performed on the first image and the second image includes calculating a corrected distorted projection point P using a camera model based on an actual position point P in the real world and an imaging position point P' where distortion occurs during imaging0The camera model is as follows:
rd=fθd=θ(1+k1θ2+k2θ4+k3θ6+k4θ8) (1)
wherein r isdF is focal length, theta is incident angle of imaging at point P, and thetadIs the exit angle, k, of point P 'at the imaging position point P' where distortion occurs during the imaging process1、k2、k3、k4Is a distortion parameter.
Specifically, as shown in fig. 2e, fig. 2e is a schematic diagram of splicing and fusion according to an embodiment of the present invention. Because the corrected image 1 and the corrected image 2 have overlapping contents, the image 1 and the image 2 are spliced to obtain a complete lower half face image. The image stitching outputs the union of the two images. The basic flow is as follows: (1) the input image is loaded, and two photos captured by the camera at the same point in time are loaded. And (2) feature extraction, namely detecting feature points in all input images and carrying out corner point detection on the images by utilizing an SURF algorithm. (3) Image registration, which establishes geometric correspondence between images so that they can be transformed, compared and analyzed in a common frame of reference, in this embodiment, the correspondence is determined by a normalized cross-correlation method. (4) Homography matrices are calculated, and in homography matrix estimation, unneeded corners not belonging to overlapping regions are deleted. (5) And deforming and fusing all input images into a consistent output image. Basically, all the input images can simply be warped onto one plane, which is the composite panoramic plane.
In step S3, feature points of the face are determined from the lower half of the fused image by using a pre-trained feature extraction model, so as to segment the face from the background image.
In some embodiments, in said step S3:
the pre-training of the feature extraction model comprises: training the feature extraction model by utilizing an open source data set, so that the feature extraction model has the capability of extracting all feature points of a complete human face;
the feature points of the human face determined by the feature extraction model include: and extracting the feature data of the lower half part of the human face and predicting the feature data of the upper half part of the human face.
Specifically, the VGGFace2 data set, the BUPT-balanced face data set, and the VoxCeleb2 data set form an open source data set including 30000 face images, and the data set includes a large number of occluded face images, as shown in fig. 2f, where fig. 2f is an image example of the open source data set according to an embodiment of the present invention. And training the multi-view face image data in the data set by using a deep convolutional neural network to obtain a pre-training model. For the input user video stream data, after each frame is subjected to the above correction splicing process, 68 face 2D feature points are first detected, and then face segmentation is performed. In some embodiments, three-dimensional face marker points of a picture of the training data set are obtained using a face alignment algorithm, and a face segmentation result is obtained using a face segmentation algorithm.
In step S4, a pre-trained face reconstruction model is used to construct a complete image of the face by using the segmented face image and the feature points of the face as input.
In some embodiments, in the step S4, the training environment of the pre-trained face reconstruction model is: windows10+ cuda10.2+ cudann 7.6.5+ pyrrch 1.6+ pyrrch 3d0.2, the iteration number is 300, and the number of selected samples is 48, so as to finish the pre-training of the face reconstruction model.
In some embodiments, in the step S4, in the pre-training process of the face reconstruction model, the pre-training for the loss-coarse face reconstruction is measured by face key point loss, eye closing loss, photo-based loss, shape continuity loss, and contour regularization loss, and the pre-training for the detail face reconstruction is measured by photo detail loss, soft symmetry loss, and detail regularization loss.
Specifically, in the reconstruction stage, based on the segmented face image and the feature points of the face, parameters such as predicted face details, shape, albedo, expression, posture, illumination and the like are output by using a trained face reconstruction model, and are input into a FLAME (a lightweight and powerful universal head model which can be obtained from 33,000 times of 3D scanning with accurate alignment), a linear identification shape space (trained by head scanning of 3800 subjects) is combined with a hinged neck, a lower jaw, an eyeball, a posture-related correction mixed shape and other global expression mixed shapes) model to obtain vertex data of the face, and finally a face three-dimensional model file is obtained, so that the reconstruction process of the face is completed. As shown in fig. 2g, a deep convolutional neural network is used to train the multi-view face image data in the data set, so as to obtain a training model. And (3) outputting the predicted human face details, shape, albedo, expression, posture, illumination parameters and the like by the training model, inputting the parameters into the FLAME model to obtain vertex data, a triangular patch and texture characteristics, and writing the three-dimensional information into an obj file to obtain a human face three-dimensional model file.
Specific examples
FIG. 3a is a flowchart of a specific example one according to an embodiment of the present invention; FIG. 3b is a flowchart of a second specific example according to an embodiment of the present invention; fig. 3c is a flowchart of specific example three according to the embodiment of the present invention.
Specifically, in example two:
starting up and opening a camera and a reconstruction system switch;
the angle of the fisheye camera is 180 mm, and the focal length is 15 mm;
carrying out fisheye camera distortion correction on the obtained face picture;
the splicing uses panoramic splicing;
the three-dimensional face reconstruction code training phase operating environment based on deep learning is as follows:
windows10+ cuda10.2+ cudann 7.6.5+ pytorch1.6+ pytorch3d0.2, 3 ten thousand pieces of face image data are used for training in the early stage, the iteration is carried out for 300 times, and the batch is 48;
and applying the trained model to model reasoning for face reconstruction. The operating environment of the inference model is windows10+ cuda10.2+ cudann 7.6.5+ pyrrch 1.6+ pyrrch 3d0.2;
three-dimensional face driving data is obtained, and the data can be used in a teleconference process.
Specifically, in example three:
starting up and opening a camera and a reconstruction system switch;
the fisheye camera angle is 180;
carrying out fisheye camera distortion correction on the obtained face picture;
the splicing uses panoramic splicing;
the three-dimensional face reconstruction code training phase operating environment based on deep learning is as follows:
windows10+ cuda10.2+ cudann 7.6.5+ pytorch1.6+ pytorch3d0.2, 3 ten thousand pieces of face image data are used for training in the early stage, the iteration is carried out for 300 times, and the batch is 48;
and applying the trained model to model reasoning for face reconstruction. The operating environment of the inference model is windows10+ cuda10.2+ cudann 7.6.5+ pyrrch 1.6+ pyrrch 3d0.2;
a three-dimensional face model of the actor is obtained.
The invention discloses a close-range three-dimensional face reconstruction device in a second aspect. FIG. 4 is a block diagram of an intelligent internal threat detection apparatus based on spatiotemporal feature fusion according to an embodiment of the present invention; as shown in fig. 4, the apparatus 400 includes:
a first processing unit 401, configured to invoke an image acquisition device to simultaneously acquire a first image and a second image of a human face, where the first image is an image of a left lower half of the human face, and the second image is an image of a right lower half of the human face;
a second processing unit 402, configured to perform preprocessing on the first image and the second image, where the preprocessing includes rectification processing and stitching processing to obtain a lower half fusion image of the face;
a third processing unit 403, configured to invoke a pre-trained feature extraction model to determine feature points of the face from the lower half fused image, so as to segment the face from the background image;
a fourth processing unit 404, configured to invoke a pre-trained face reconstruction model, and construct a complete image of the face by using the segmented face image and the feature points of the face as input.
According to the device of the second aspect of the present invention, the image capturing device includes a first fisheye image capturing device for capturing the first image and a second fisheye image capturing device for capturing the second image, the image capturing device is mounted on the wearable device, so that when the wearable device is worn, the image capturing device is located at any position of the fundus, temple, eye corner or nose support of the human face, and the first fisheye image capturing device and the second fisheye image capturing device are respectively located on the left and right sides of any position of the fundus, temple, eye corner or nose support of the human face.
According to the apparatus of the second aspect of the present invention, the second processing unit 402 is specifically configured such that the correction processing performed on the first image and the second image includes calculating a corrected distorted projection point P using a camera model based on an actual position point P in the real world and an imaging position point P' where distortion is generated during imaging0The camera model is as follows:
rd=fθd=θ(1+k1θ2+k2θ4+k3θ6+k4θ8) (1)
wherein r isd| OP '|, which represents the distance from the imaging position point P' to the imaging plane midpoint O, f is the focal length, θ is the P-point imaging incident angle, θ isdIs the exit angle, k, of point P 'at the imaging position point P' where distortion occurs during the imaging process1、k2、k3、k4Is a distortion parameter.
According to the apparatus of the second aspect of the present invention, the second processing unit 402 is specifically configured to, the splicing process includes:
extracting features in the calibrated first image and the calibrated second image, and carrying out corner point detection on the features to establish a geometric corresponding relation of the calibrated first image and the calibrated second image in the same coordinate system;
based on the geometric correspondence, deleting regions outside an overlapping region of the calibrated first image and the calibrated second image using homography matrix estimation to fuse the calibrated first image and the calibrated second image based on the overlapping region.
According to the apparatus of the second aspect of the present invention, the third processing unit 403 is specifically configured to:
the pre-training of the feature extraction model comprises: training the feature extraction model by utilizing an open source data set, so that the feature extraction model has the capability of extracting all feature points of a complete human face;
the feature points of the human face determined by the feature extraction model include: and extracting the feature data of the lower half part of the human face and predicting the feature data of the upper half part of the human face.
According to the apparatus of the second aspect of the present invention, the training environment of the pre-trained face reconstruction model is: windows10+ cuda10.2+ cudann 7.6.5+ pyrrch 1.6+ pyrrch 3d0.2, the iteration number is 300, and the number of selected samples is 48, so as to finish the pre-training of the face reconstruction model.
According to the device in the second aspect of the present invention, in the pre-training process of the face reconstruction model, for the pre-training of the loss coarse face reconstruction, the loss of the key points of the face, the eye closing loss, the loss based on the picture, the continuous loss of the shape and the loss of the contour regularization are used as the measurement basis, and for the pre-training of the detail face reconstruction, the loss of the picture details, the loss of the soft symmetry and the loss of the detail regularization are used as the measurement basis.
A third aspect of the invention discloses an electronic device. The electronic device comprises a memory and a processor, the memory stores a computer program, and the processor implements the steps of the method for reconstructing a close-range three-dimensional face according to any one of the first aspect of the present disclosure when executing the computer program.
Fig. 5 is a block diagram of an electronic device according to an embodiment of the present invention, and as shown in fig. 5, the electronic device includes a processor, a memory, a communication interface, a display screen, and an input device, which are connected by a system bus. Wherein the processor of the electronic device is configured to provide computing and control capabilities. The memory of the electronic equipment comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The communication interface of the electronic device is used for carrying out wired or wireless communication with an external terminal, and the wireless communication can be realized through WIFI, an operator network, Near Field Communication (NFC) or other technologies. The display screen of the electronic equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the electronic equipment can be a touch layer covered on the display screen, a key, a track ball or a touch pad arranged on the shell of the electronic equipment, an external keyboard, a touch pad or a mouse and the like.
It will be understood by those skilled in the art that the structure shown in fig. 5 is only a partial block diagram related to the technical solution of the present disclosure, and does not constitute a limitation of the electronic device to which the solution of the present application is applied, and a specific electronic device may include more or less components than those shown in the drawings, or combine some components, or have a different arrangement of components.
A fourth aspect of the invention discloses a computer-readable storage medium. The computer readable storage medium has stored thereon a computer program which, when executed by a processor, implements the steps of a method for reconstructing a close-range three-dimensional face according to any one of the first aspect of the present disclosure.
To sum up. The invention obtains the lower half face of the face through two fisheye cameras near the nose pads, and then the lower half face expression is analyzed by utilizing deep learning so as to restore the complete face; the reconstruction of the complete face image can be carried out only by the lower half part image of the face, and the face information is not required to be acquired from a plurality of visual angles; the reconstruction effect of deep learning is used with the expression details of the human face, so that the emotion of the human face can be accurately expressed; the whole reconstruction does not need other sensor hardware, the problem of loose contact caused by a strain gauge does not need to be considered, and the equipment is simple, light and easy to realize.
The technical scheme provided by the invention can effectively reduce the dependence of the system on the sensor, and the action acquisition, the three-dimensional face modeling and the data synchronization of the behavior content can be realized only by wearing VR glasses and the three-dimensional face reconstruction device described herein by a user; the glasses can be accessed into AR/VR glasses of different brands or models through a communication interface, and the transportability is high; meanwhile, the application scene range of the existing commercial hardware is not influenced, the shielding of external image acquisition equipment can be eliminated facing to an open space, and the device has stronger stability, and is attractive and portable.
It should be noted that the technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, however, as long as there is no contradiction between the combinations of the technical features, the scope of the present description should be considered. The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.
Claims (10)
1. A close-range three-dimensional face reconstruction method is characterized by comprising the following steps:
step S1, simultaneously acquiring a first image and a second image of a human face by using an image acquisition device, wherein the first image is an image of the lower half of the left side of the human face, and the second image is an image of the lower half of the right side of the human face;
step S2, preprocessing the first image and the second image, wherein the preprocessing comprises correction processing and splicing processing to obtain a lower half fusion image of the face;
step S3, determining feature points of the human face from the lower half fusion image by using a pre-trained feature extraction model so as to segment the human face from a background image;
and step S4, constructing a complete image of the face by using the pre-trained face reconstruction model and taking the segmented face image and the feature points of the face as input.
2. The close-range three-dimensional human face reconstruction method according to claim 1, wherein the image acquisition device comprises a first fisheye image collector for collecting the first image and a second fisheye image collector for collecting the second image, the image acquisition device is mounted on a wearable device, so that when the wearable device is worn, the image acquisition device is located at any position of the fundus of the human face, the temple, the eye corner or the nose support, and the first fisheye image collector and the second fisheye image collector are respectively located at the left side and the right side of any position of the fundus of the human face, the temple, the eye corner or the nose support.
3. The method according to claim 2, wherein the correction processing performed on the first image and the second image in step S2 includes calculating a corrected distorted projection point P using a camera model based on an actual position point P in the real world and an imaging position point P' where distortion occurs during imaging0The camera model is as follows:
rd=fθd=θ(1+k1θ2+k2θ4+k3θ6+k4θ8) (1)
wherein r isd| OP '|, which represents the distance from the imaging position point P' to the imaging plane midpoint O, f is the focal length, θ is the P-point imaging incident angle, θ isdIs the exit angle, k, of point P 'at the imaging position point P' where distortion occurs during the imaging process1、k2、k3、k4Is a distortion parameter.
4. The close-range three-dimensional face reconstruction method according to claim 3, wherein in the step S2, the stitching process comprises:
extracting features in the calibrated first image and the calibrated second image, and carrying out corner point detection on the features to establish a geometric corresponding relation of the calibrated first image and the calibrated second image in the same coordinate system;
based on the geometric correspondence, deleting regions outside an overlapping region of the calibrated first image and the calibrated second image using homography matrix estimation to fuse the calibrated first image and the calibrated second image based on the overlapping region.
5. The close-range three-dimensional face reconstruction method according to claim 4, wherein in the step S3:
the pre-training of the feature extraction model comprises: training the feature extraction model by utilizing an open source data set, so that the feature extraction model has the capability of extracting all feature points of a complete human face;
the feature points of the human face determined by the feature extraction model include: and extracting the feature data of the lower half part of the human face and predicting the feature data of the upper half part of the human face.
6. The method for reconstructing three-dimensional near view face as claimed in claim 5, wherein in said step S4, the training environment of the pre-trained face reconstruction model is: windows10+ cuda10.2+ cudann 7.6.5+ pyrrch 1.6+ pyrrch 3d0.2, the iteration number is 300, and the number of selected samples is 48, so as to finish the pre-training of the face reconstruction model.
7. The method according to claim 6, wherein in the step S4, in the pre-training process of the face reconstruction model, the pre-training of the loss-coarse face reconstruction is measured by face key point loss, eye closure loss, photo-based loss, shape continuity loss and contour regularization loss, and the pre-training of the detail face reconstruction is measured by photo detail loss, soft symmetry loss and detail regularization loss.
8. A close-range three-dimensional face reconstruction apparatus, the apparatus comprising:
the image acquisition device comprises a first processing unit, a second processing unit and a display unit, wherein the first processing unit is configured to call the image acquisition device to simultaneously acquire a first image and a second image of a human face, the first image is an image of the lower half of the left side of the human face, and the second image is an image of the lower half of the right side of the human face;
the second processing unit is configured to perform preprocessing on the first image and the second image, wherein the preprocessing comprises rectification processing and splicing processing so as to obtain a lower half fusion image of the face;
a third processing unit, configured to invoke a pre-trained feature extraction model to determine feature points of the face from the lower half fused image, so as to segment the face from the background image;
and the fourth processing unit is configured to call a pre-trained face reconstruction model, and construct a complete image of the face by taking the segmented face image and the feature points of the face as input.
9. An electronic device, comprising a memory storing a computer program and a processor, wherein the processor implements the steps of the method for reconstructing a three-dimensional face from a close-range view according to any one of claims 1 to 7 when executing the computer program.
10. A computer-readable storage medium, having stored thereon a computer program which, when executed by a processor, implements the steps of a method for three-dimensional face reconstruction from close range according to any one of claims 1 to 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111441281.0A CN114387388A (en) | 2021-11-30 | 2021-11-30 | Close-range three-dimensional face reconstruction device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111441281.0A CN114387388A (en) | 2021-11-30 | 2021-11-30 | Close-range three-dimensional face reconstruction device |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114387388A true CN114387388A (en) | 2022-04-22 |
Family
ID=81196667
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111441281.0A Pending CN114387388A (en) | 2021-11-30 | 2021-11-30 | Close-range three-dimensional face reconstruction device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114387388A (en) |
-
2021
- 2021-11-30 CN CN202111441281.0A patent/CN114387388A/en active Pending
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10679046B1 (en) | Machine learning systems and methods of estimating body shape from images | |
JP6074494B2 (en) | Shape recognition device, shape recognition program, and shape recognition method | |
CN110807364B (en) | Modeling and capturing method and system for three-dimensional face and eyeball motion | |
US20140254939A1 (en) | Apparatus and method for outputting information on facial expression | |
CN111932678B (en) | Multi-view real-time human motion, gesture, expression and texture reconstruction system | |
CN111710036B (en) | Method, device, equipment and storage medium for constructing three-dimensional face model | |
KR20170008638A (en) | Three dimensional content producing apparatus and three dimensional content producing method thereof | |
WO2019164498A1 (en) | Methods, devices and computer program products for global bundle adjustment of 3d images | |
WO2023109753A1 (en) | Animation generation method and apparatus for virtual character, and storage medium and terminal | |
CN108090463B (en) | Object control method, device, storage medium and computer equipment | |
JP2008102902A (en) | Visual line direction estimation device, visual line direction estimation method, and program for making computer execute visual line direction estimation method | |
WO2024007478A1 (en) | Three-dimensional human body modeling data collection and reconstruction method and system based on single mobile phone | |
WO2021052208A1 (en) | Auxiliary photographing device for movement disorder disease analysis, control method and apparatus | |
US11928778B2 (en) | Method for human body model reconstruction and reconstruction system | |
CN111553284A (en) | Face image processing method and device, computer equipment and storage medium | |
WO2022174594A1 (en) | Multi-camera-based bare hand tracking and display method and system, and apparatus | |
CN112348937A (en) | Face image processing method and electronic equipment | |
US11120624B2 (en) | Three-dimensional head portrait generating method and electronic device | |
CN108549484B (en) | Man-machine interaction method and device based on human body dynamic posture | |
CN115482359A (en) | Method for measuring size of object, electronic device and medium thereof | |
CN110675413B (en) | Three-dimensional face model construction method and device, computer equipment and storage medium | |
CN114387388A (en) | Close-range three-dimensional face reconstruction device | |
WO2022180575A1 (en) | Three-dimensional (3d) human modeling under specific body-fitting of clothes | |
CN107478227B (en) | Interactive large space positioning algorithm | |
CN115410242A (en) | Sight estimation method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |