CN117974879A - Face three-dimensional reconstruction method, device, equipment, readable storage medium and vehicle - Google Patents

Face three-dimensional reconstruction method, device, equipment, readable storage medium and vehicle Download PDF

Info

Publication number
CN117974879A
CN117974879A CN202211320289.6A CN202211320289A CN117974879A CN 117974879 A CN117974879 A CN 117974879A CN 202211320289 A CN202211320289 A CN 202211320289A CN 117974879 A CN117974879 A CN 117974879A
Authority
CN
China
Prior art keywords
face
image
pixel point
original
pixel
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211320289.6A
Other languages
Chinese (zh)
Inventor
韦涛
马皓
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Co Wheels Technology Co Ltd
Original Assignee
Beijing Co Wheels Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Co Wheels Technology Co Ltd filed Critical Beijing Co Wheels Technology Co Ltd
Priority to CN202211320289.6A priority Critical patent/CN117974879A/en
Publication of CN117974879A publication Critical patent/CN117974879A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three dimensional [3D] modelling, e.g. data description of 3D objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/80Analysis of captured images to determine intrinsic or extrinsic camera parameters, i.e. camera calibration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/75Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person
    • G06T2207/30201Face

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Computer Graphics (AREA)
  • Geometry (AREA)
  • Image Processing (AREA)
  • Image Analysis (AREA)

Abstract

The present disclosure relates to a face three-dimensional reconstruction method, apparatus, device, readable storage medium, and vehicle, the method comprising: acquiring a face first image and a face second image according to a preset face detection model based on a first original image and a second original image, wherein the face first image and the face second image correspond to the same target face; processing the face first image and the face second image according to a stereo matching algorithm to obtain a face parallax image; based on the face parallax map, three-dimensional coordinates of a plurality of pixel points in a face area corresponding to the face first image and the face second image are obtained; and carrying out three-dimensional face reconstruction on the target face according to the three-dimensional coordinates of the plurality of pixel points in the face region. According to the face three-dimensional reconstruction method and device, the matching range of the face three-dimensional reconstruction is reduced from the full graph to the face region by introducing the face detection model, so that the face three-dimensional reconstruction precision and modeling speed in the scene are effectively improved.

Description

Face three-dimensional reconstruction method, device, equipment, readable storage medium and vehicle
Technical Field
The disclosure relates to the field of computer technology, and in particular relates to a face three-dimensional reconstruction method, device, equipment, readable storage medium and vehicle.
Background
In the space in the vehicle, because of the light in the vehicle and the images of the environment in the vehicle, a large number of areas with weak illumination and similar textures exist in the face image of the user acquired by the binocular camera system, and the areas can seriously affect the matching of the left image and the right image, so that the matching error or the matching failure of the pixel points is caused, and the accuracy of the three-dimensional reconstruction of the face under the image acquisition scene in the vehicle is lower.
Disclosure of Invention
In order to solve the technical problems, the present disclosure provides a face three-dimensional reconstruction method, a device, equipment, a readable storage medium and a vehicle, so as to improve the accuracy of face three-dimensional modeling.
In a first aspect, an embodiment of the present disclosure provides a face three-dimensional reconstruction method, including:
acquiring a face first image and a face second image according to a preset face detection model based on a first original image and a second original image, wherein the face first image and the face second image correspond to the same target face;
processing the face first image and the face second image according to a stereo matching algorithm to obtain a face parallax image;
Based on the face parallax map, three-dimensional coordinates of a plurality of pixel points in a face area corresponding to the face first image and the face second image are obtained;
and carrying out three-dimensional face reconstruction on the target face according to the three-dimensional coordinates of the plurality of pixel points in the face region.
In some embodiments, the processing the face first image and the face second image according to the stereo matching algorithm to obtain a face disparity map includes:
Calculating an initial matching cost matrix of each pixel point in the face first image and each pixel point in the face second image;
Performing cross domain cost aggregation on the initial matching cost matrix of each pixel point in the face first image to obtain an aggregation cost matrix of each pixel point in the face first image;
generating an original parallax map based on the aggregation cost matrix of each pixel point;
And obtaining a face disparity map according to the original disparity map.
In some embodiments, the calculating the initial matching cost matrix for each pixel in the first image of the face and each pixel in the second image of the face includes:
calculating the color value difference between each pixel point in the first face image and each pixel point in the second face image;
Calculating the Hamming distance between each pixel point in the first face image and each pixel point in the second face image;
and obtaining an initial matching cost matrix of each pixel point in the first face image and each pixel point in the second face image based on the color value difference and the Hamming distance.
In some embodiments, performing cross domain cost aggregation on the initial matching cost matrix of each pixel point in the face first image to obtain an aggregate cost matrix of each pixel point in the face first image, including:
Constructing a crisscross domain for each pixel point in the face first image;
and obtaining an aggregation cost matrix of each pixel point in the face first image according to the initial matching cost of each pixel point in the cross domain.
In some embodiments, the generating an original disparity map based on the aggregate cost matrix of each pixel includes:
based on the aggregation cost matrix of each pixel point, determining a parallax value corresponding to the minimum aggregation cost as a target parallax value of the pixel point;
And generating an original parallax image according to the target parallax value of each pixel point.
In some embodiments, the acquiring the first face image and the second face image according to the preset face detection model based on the first original image and the second original image includes:
acquiring a first original image and a second original image, wherein the first original image and the second original image correspond to the same target face;
Correcting the first original image and the second original image to obtain a corrected first original image and a corrected second original image;
and inputting the corrected first original image and the corrected second original image into a preset face detection model to obtain a face first image and a face second image.
In a second aspect, embodiments of the present disclosure provide a facial three-dimensional reconstruction apparatus, including:
the acquisition module is used for acquiring a first face image and a second face image according to a preset face detection model based on the first original image and the second original image, wherein the first face image and the second face image correspond to the same target face;
The processing module is used for processing the face first image and the face second image according to a stereo matching algorithm to obtain a face parallax image;
The calculating module is used for acquiring three-dimensional coordinates of a plurality of pixel points in a face area corresponding to the face first image and the face second image based on the face parallax image;
and the reconstruction module is used for carrying out three-dimensional face reconstruction on the target face according to the three-dimensional coordinates of the plurality of pixel points in the face region.
In a third aspect, an embodiment of the present disclosure provides an electronic device, including:
a memory;
A processor; and
A computer program;
Wherein the computer program is stored in the memory and configured to be executed by the processor to implement the method according to the first aspect.
In a fourth aspect, embodiments of the present disclosure provide a computer-readable storage medium having stored thereon a computer program for execution by a processor to implement the method of the first aspect.
In a fifth aspect, the presently disclosed embodiments also provide a vehicle including a facial three-dimensional reconstruction apparatus as described above; or an electronic device as described above; or a computer readable storage medium as described above.
According to the face three-dimensional reconstruction method, device, equipment, readable storage medium and vehicle, the matching range of face three-dimensional reconstruction is reduced from the full image to the face region by introducing the face detection model, the problem of pixel point matching errors or matching failure caused by a large amount of weak illumination and similar texture image regions in an original image is solved, meanwhile, invalid calculation outside the face region is avoided, and the face three-dimensional reconstruction precision and modeling speed under the scene are effectively improved.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the disclosure and together with the description, serve to explain the principles of the disclosure.
In order to more clearly illustrate the embodiments of the present disclosure or the solutions in the prior art, the drawings that are required for the description of the embodiments or the prior art will be briefly described below, and it will be obvious to those skilled in the art that other drawings can be obtained from these drawings without inventive effort.
Fig. 1 is a flowchart of a face three-dimensional reconstruction method provided in an embodiment of the present disclosure;
FIG. 2 is a flow chart of a face three-dimensional reconstruction method according to another embodiment of the present disclosure;
FIG. 3 is a flowchart of a face three-dimensional reconstruction method according to another embodiment of the present disclosure;
FIG. 4 is a schematic view of a corrected first original image and a corrected second original image provided by an embodiment of the present disclosure;
fig. 5 is a schematic structural diagram of a face three-dimensional reconstruction device according to an embodiment of the present disclosure;
fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the disclosure.
Detailed Description
In order that the above objects, features and advantages of the present disclosure may be more clearly understood, a further description of aspects of the present disclosure will be provided below. It should be noted that, without conflict, the embodiments of the present disclosure and features in the embodiments may be combined with each other.
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present disclosure, but the present disclosure may be practiced otherwise than as described herein; it will be apparent that the embodiments in the specification are only some, but not all, embodiments of the disclosure.
The embodiment of the disclosure provides a face three-dimensional reconstruction method, and the method is described below with reference to specific embodiments.
Fig. 1 is a flowchart of a face three-dimensional reconstruction method according to an embodiment of the present disclosure. The method can be applied to any terminal equipment with a data processing function, wherein the terminal equipment can be a car machine, a smart phone, a palm computer, a tablet computer, a notebook computer, an integrated machine, an intelligent driving device and the like. It can be appreciated that the face three-dimensional reconstruction method provided by the embodiment of the present disclosure may also be applied to other scenes.
The following describes a face three-dimensional reconstruction method shown in fig. 1, which includes the following specific steps:
s101, acquiring a face first image and a face second image according to a preset face detection model based on a first original image and a second original image, wherein the face first image and the face second image correspond to the same target face.
The first original image and the second original image may be obtained by image capturing of the target face from different angles by the image capturing device. The first original image and the second original image can be acquired simultaneously; or the first original image is acquired firstly, and then the second original image is acquired; or the second original image is acquired first and then the first original image is acquired, which is not limited in the embodiment of the present disclosure. Illustratively, the camera acquisition device comprises a binocular camera system.
And constructing a binocular camera system, and acquiring images of the same face from different angles to obtain a first original image and a second original image. Before image acquisition, the binocular camera is generally required to be calibrated, that is, the camera internal parameters of the binocular camera and the relative parameters between the left and right cameras are acquired through a camera calibration method. This step is generally performed by a calibration method of the open-ended computer, and the calibration data is collected by using a calibration plate, and after the calibration data is obtained, the calibration can be performed directly by using a binocular camera calibration interface provided by commercial mathematical software (matlab) or an open-ended computer vision library (opencv). The Zhang's calibration method is a practical method for calibrating a camera by using a plane checkerboard, which is proposed by Zhang Zhengyou doctor in paper Flexible Camera Calibration By VIEWING A PLANE From Unknown Orientations published in the international top-level conference ICCV in 1999. The method is between the photographic calibration method and the self-calibration method, not only overcomes the defect of high-precision three-dimensional calibration object required by the photographic calibration method, but also solves the problem of poor robustness of the self-calibration method.
The face detection model may be a neural network model trained based on a deep learning method. Specifically, an original image training set is prepared in advance, face labeling is carried out on each image in the original image training set, a labeled image training set is obtained, each image in the original image training set is used as input of a face detection model, each image in the labeled image training set is used as output of the face detection model, and the face detection model is trained. Illustratively, the face detection model may be implemented based on any available neural network such as a conditional generation antagonism network (CGAN, conditionalGenerativeAdversarial Nets), a pixel2pixel network, or the like.
Inputting the first original image into a preset face detection model, and identifying a face region in the first original image to obtain a face first image, wherein the face first image only comprises the face region in the first original image; and similarly, inputting the second original image into a preset face detection model, and identifying the face region in the second original image to obtain a face second image, wherein the face second image only comprises the face region in the second original image.
S102, processing the face first image and the face second image according to a stereo matching algorithm to obtain a face parallax image.
Parallax refers to the positional deviation of a certain spatial point in the real environment in the horizontal direction in the left and right images. And carrying out stereo matching on the face first image and the face second image, and determining a corresponding point of a certain pixel point in the face first image in the face second image, wherein the certain pixel point in the face first image and the corresponding point in the face second image correspond to the same space point in the real environment, namely the same point in a face area in the real environment, and calculating the horizontal position deviation, namely parallax, between the certain pixel point in the face first image and the corresponding point in the face second image. For example, if the face first image is used as a reference image, a face parallax map can be obtained according to the parallax of each pixel point in the face first image; similarly, the face second image may be used as a reference image, and the face parallax map may be obtained from the parallax of each pixel in the face second image.
And S103, based on the face parallax map, acquiring three-dimensional coordinates of a plurality of pixel points in a face area corresponding to the face first image and the face second image.
S104, carrying out face three-dimensional reconstruction on the target face according to the three-dimensional coordinates of the plurality of pixel points in the face region.
According to the facial parallax map, the distance between each position of the facial area in the real environment and the binocular camera system, namely the depth value of the pixel point in the facial area image, can be calculated by combining the similar triangle relation and the parameters of the binocular camera system. Specifically, the calculation can be performed according to the following formula:
Wherein d is a depth value; b is a base line of the binocular camera system, namely the position deviation of the optical centers of the left camera and the right camera in the binocular camera in the horizontal direction; f is the focal length of the camera; x r -l is parallax.
The relative position relation of each position of the face area in the real environment can be obtained through the face first image and the face second image, the three-dimensional coordinates of each position of the face area in the real environment can be obtained through combining the distance between each position of the face area in the real environment and the binocular camera system, and finally modeling of the face area from the two-dimensional image to the three-dimensional structure can be completed through converting the three-dimensional coordinates into point clouds to represent the three-dimensional structure of the face.
According to the embodiment of the disclosure, the first face image and the second face image are obtained according to a preset face detection model based on the first original image and the second original image; processing the face first image and the face second image according to a stereo matching algorithm to obtain a face parallax image; based on the face parallax map, three-dimensional coordinates of a plurality of pixel points in a face area corresponding to the face first image and the face second image are obtained; according to the three-dimensional coordinates of a plurality of pixel points in the face area, carrying out face three-dimensional reconstruction on the target face, reducing the matching range of the face three-dimensional reconstruction from a full image to the face area by introducing a face detection model, eliminating the problem of mismatching or invalid matching of the pixel points caused by a large amount of weak illumination and similar texture image areas in an original image, avoiding invalid calculation outside the face area, and effectively improving the face three-dimensional reconstruction precision and modeling speed in the scene.
For example, in a three-dimensional reconstruction scene of a face in a vehicle, a binocular camera system is deployed in the vehicle to acquire images of a target face for three-dimensional reconstruction, but due to insufficient illumination of the environment in the vehicle, the imaging quality of the binocular camera is lower, so that a large number of pixel points with wrong or even failed matching exist when the left and right images are subjected to three-dimensional matching, and the accuracy of three-dimensional modeling of the face is lower. According to the face three-dimensional reconstruction method provided by the embodiment of the disclosure, the human face detection model is introduced, so that the three-dimensional matching range of the left and right images is reduced from the full image range to the face range, invalid calculation on other areas not interested is not needed, and the three-dimensional reconstruction precision and the running speed in the three-dimensional reconstruction scene of the human face in the vehicle are effectively improved.
Fig. 2 is a flowchart of a face three-dimensional reconstruction method according to another embodiment of the present disclosure. As shown in fig. 2, the method comprises the following steps:
S201, acquiring a face first image and a face second image according to a preset face detection model based on the first original image and the second original image, wherein the face first image and the face second image correspond to the same target face.
S202, calculating an initial matching cost matrix of each pixel point in the face first image and each pixel point in the face second image.
Specifically, calculating the color value difference between each pixel point in the first face image and each pixel point in the second face image; calculating the Hamming distance between each pixel point in the first face image and each pixel point in the second face image; and obtaining an initial matching cost matrix of each pixel point in the first face image and each pixel point in the second face image based on the color value difference and the Hamming distance.
The matching cost is used for measuring the correlation between the candidate pixel and the matching pixel, and the smaller the cost is, the larger the correlation between two pixel points is, and the larger the probability of being the same-name point is. The so-called homonymy points, i.e. corresponding pixels in the first image of the face and the second image of the face, correspond to the same point in the real environment, here the same position of the face region in the real environment.
Taking the face first image as a reference image as an example, for each pixel point in the face first image, firstly acquiring the color value (RGB value) of the pixel point, simultaneously acquiring the color value (RGB value) of each pixel point in the face second image, and calculating the color value difference between the pixel point in the face first image and the pixel point in each face second image, namely taking the average of the absolute value of the difference between the red, green and blue color components of each pair of pixel points. Alternatively, if the face first image and the face second image are grayscale images, the luminance difference of each pair of pixel points is calculated.
Further, for each pixel point in the face first image, each pixel gray level in the pixel neighborhood window with the pixel point as the center is obtained, the gray level value of the pixel point is compared, and the boolean value obtained by comparison is mapped into a bit string. The bit string is a binary character string converted from the information after being encoded according to ANSI. And carrying out the same operation on each pixel point in the face second image to obtain a plurality of bit strings corresponding to the face second image. The hamming distance is the number of the corresponding bits of the two bit strings, the calculation method is to perform OR logic operation on the two bit strings, and then the number of 1 in the bit bits in the result of the XOR operation is counted. And traversing the first face image and the second face image, and calculating the Hamming distance between each pixel point in the first face image and each pixel point in the second face image.
Since the color value difference is different from the scale of the hamming distance, the range of the color value difference is [0,255], and the range of the hamming distance is [0, N ], where N is the number of character bits of the bit string, it is necessary to normalize the results of both to the same range interval by normalization. The embodiment of the disclosure normalizes the two to the range of [0,1] through a normalization function, and then adds the two normalized results to obtain the initial matching cost of each pixel point in the face first image and each pixel point in the face second image, wherein the range is [0,2]. Meanwhile, the parallax between each pixel point in the face first image and each pixel point in the face second image is calculated, and the initial matching cost of each pixel point in the face first image and each pixel point in the face second image is combined to obtain a three-dimensional initial matching cost matrix, wherein the initial matching cost matrix represents the initial matching cost value of each pixel point under each parallax in the parallax range.
S203, performing cross domain cost aggregation on the initial matching cost matrix of each pixel point in the face first image to obtain an aggregation cost matrix of each pixel point in the face first image.
Specifically, for each pixel point in the face first image, constructing a crisscross domain; and obtaining an aggregation cost matrix of each pixel point in the face first image according to the initial matching cost matrix of each pixel point in the cross domain.
Each pixel has a cross arm, and the color (brightness) value of all pixels on the arm is similar to the color (brightness) value of the pixel. Based on the principle, a crisscross domain is constructed for each pixel point in the face first image, and the specific process is that the pixel point is taken as a center to extend leftwards and rightwards and upwards and downwards, and when the color (brightness) is encountered and the pixel difference is large, the extension is stopped, and finally, the crisscross domain taking the pixel as the center, namely the crisscross domain of the pixel point, is obtained. And carrying out cost aggregation according to the initial matching cost matrix of each pixel point in the cross domain. In some embodiments, the cost accumulated values of all the horizontal arms of the pixels may be stored first, and then accumulated along the vertical arm of each pixel to obtain an aggregate cost matrix of the pixel points. The aggregate cost matrix has the same scale as the initial matching cost matrix, and represents the aggregate cost and parallax between each pixel point and each pixel point in another image.
S204, generating an original parallax map based on the aggregation cost matrix of each pixel point.
Specifically, based on the aggregation cost matrix of each pixel point, determining a parallax value corresponding to the minimum aggregation cost as a target parallax value of the pixel point; and generating an original parallax image according to the target parallax value of each pixel point.
In the aggregation cost matrix of a pixel point, the aggregation cost and parallax between the pixel point and each pixel point in another image are included, and the smaller the aggregation cost is, the larger the correlation between the two pixel points is, the larger the probability of the same-name point is, so that the reliability of the parallax value corresponding to the minimum aggregation cost is also higher. Therefore, for each pixel point, the parallax value corresponding to the minimum value of the aggregation cost in the aggregation cost matrix can be used as the target parallax value of the pixel point, and the parallax map corresponding to the image can be obtained according to the target parallax value of each pixel point in the image. For example, if the face first image is used as a reference image, the parallax value corresponding to the minimum value of the aggregation cost in each pixel is used as the target parallax value of the pixel, and then the original parallax map corresponding to the face first image can be generated.
S205, obtaining a face disparity map according to the original disparity map.
Optionally, outlier detection is performed on the original disparity map. Outliers refer to pixels with larger errors, and their corresponding disparity values are erroneous. The reason for this is that the pixels of the face first image and the face second image form a mismatch. For example, outlier detection is performed using the left-right consistency method (L-R Check). The left-right coincidence method is based on the unique constraint of parallax, i.e. there is at most one correct parallax per pixel. The specific steps are that the left image and the right image are interchanged, namely, the left image is changed into the right image, the right image is changed into the left image, the steps are executed again to obtain another Zhang Shicha diagram, the homonymous point pixel of each pixel in the right image and the parallax value corresponding to the pixel are found through the parallax diagram of the left image, if the difference value between the two parallax values is smaller than a preset threshold value, the uniqueness constraint is met and reserved, and otherwise, the uniqueness constraint is not met and the parallax value is eliminated. For example, in the above step, if the original disparity map corresponding to the face first image is obtained with the face first image as a reference, the original disparity map corresponding to the face second image is obtained with the face second image as a reference in the step, outlier detection is performed on the two different original disparity maps, and points that do not satisfy the uniqueness constraint are removed, so as to obtain the face disparity map.
Optionally, optimization processes such as iterative local voting, parallax filling, parallax discontinuous region adjustment, sub-pixel optimization and the like may be performed on the original parallax map, which is not limited by the embodiment of the present disclosure.
S206, based on the face parallax image, acquiring three-dimensional coordinates of a plurality of pixel points in a face area corresponding to the face first image and the face second image.
S207, carrying out face three-dimensional reconstruction on the target face according to the three-dimensional coordinates of the plurality of pixel points in the face region.
Specifically, the implementation processes and principles of S206 to S207 are consistent with those of S103 to S104, and will not be described here again.
According to the embodiment of the disclosure, based on a first original image and a second original image, a face first image and a face second image are obtained according to a preset face detection model, and the face first image and the face second image correspond to the same target face; calculating an initial matching cost matrix of each pixel point in the face first image and each pixel point in the face second image; performing cross domain cost aggregation on the initial matching cost matrix of each pixel point in the face first image to obtain an aggregation cost matrix of each pixel point in the face first image; generating an original parallax map based on the aggregation cost matrix of each pixel point; obtaining a face disparity map according to the original disparity map; based on the face parallax map, three-dimensional coordinates of a plurality of pixel points in a face area corresponding to the face first image and the face second image are obtained; according to the three-dimensional coordinates of a plurality of pixel points in the face region, the face of the target face is subjected to three-dimensional reconstruction, the matching cost of the pixel points is calculated by combining the color difference value of the pixel points and the Hamming distance, the cost aggregation of the cross domain is further calculated, meanwhile, the matching cost of the pixel points and the pixel point field is concerned, the pixel points can be better matched, the situation that a parallax map is discontinuous is avoided, the accuracy of the parallax map is improved, and a more accurate data basis is provided for the subsequent face depth value calculation and three-dimensional reconstruction.
Fig. 3 is a flowchart of a face three-dimensional reconstruction method according to another embodiment of the present disclosure. As shown in fig. 3, the method comprises the following steps:
S301, acquiring a first original image and a second original image, wherein the first original image and the second original image correspond to the same target face.
The first original image of the target face and the second original image of the target face may be obtained by image capturing of the target face from different angles by the image capturing device. For example, the image may be acquired by a depth camera. Wherein the depth camera comprises a binocular camera system.
S302, correcting the first original image and the second original image to obtain a corrected first original image and a corrected second original image.
Specifically, determining corresponding points of pixel points in the first original image in the second image; correcting the first original image and the second original image in the horizontal direction, so that the position of the pixel point in the corrected first original image and the position of the corresponding point in the corrected second original image are at the same height.
Correcting the first original image and the second original image based on relative parameters between camera internal parameters obtained by binocular camera calibration and the left and right cameras, eliminating camera distortion, and obtaining a corrected first original image and a corrected second original image, so that the positions of pixel points in the first original image in the corrected first original image and the positions of corresponding points in the second image in the corrected second original image are at the same height. Wherein the pixel point in the first original image and the corresponding point in the second image correspond to the same point in the real environment.
Fig. 4 is a schematic diagram of a corrected first original image and a corrected second original image provided by an embodiment of the present disclosure. As shown in fig. 4, the corrected first original image and the corrected second original image are at the same imaging height in the horizontal direction.
S303, inputting the corrected first original image and the corrected second original image into a preset face detection model to obtain a face first image and a face second image.
The face detection model may be a neural network model trained based on a deep learning method. Specifically, an original image training set is prepared in advance, face labeling is carried out on each image in the original image training set, a labeled image training set is obtained, each image in the original image training set is used as input of a face detection model, each image in the labeled image training set is used as output of the face detection model, and the face detection model is trained.
Inputting the corrected first original image into a preset face detection model, and identifying a face region in the corrected first original image to obtain a face first image, wherein the face first image only comprises the face region in the corrected first original image; and similarly, inputting the corrected second original image into a preset face detection model, and identifying the face region in the corrected second original image to obtain a face second image, wherein the face second image only comprises the face region in the corrected second original image.
S304, processing the face first image and the face second image according to a stereo matching algorithm to obtain a face parallax image.
Specifically, calculating an initial matching cost matrix of each pixel point in the first face image and each pixel point in the second face image; performing cross domain cost aggregation on the initial matching cost matrix of each pixel point in the face first image to obtain an aggregation cost matrix of each pixel point in the face first image; generating an original parallax map based on the aggregation cost matrix of each pixel point; and obtaining a face disparity map according to the original disparity map.
S305, based on the face parallax map, three-dimensional coordinates of a plurality of pixel points in a face area corresponding to the face first image and the face second image are obtained.
S306, carrying out face three-dimensional reconstruction on the target face according to the three-dimensional coordinates of the plurality of pixel points in the face region.
According to the facial parallax map, the distance between each position of the facial area in the real environment and the binocular camera system, namely the depth value of the pixel point in the facial area image, can be calculated by combining the similar triangle relation and the parameters of the binocular camera system. Specifically, the calculation can be performed according to the following formula:
Wherein d is a depth value; b is a base line of the binocular camera system, namely the position deviation of the optical centers of the left camera and the right camera in the binocular camera in the horizontal direction; f is the focal length of the camera; x r -l is parallax.
The relative position relation of each position of the face area in the real environment can be obtained through the face first image and the face second image, the three-dimensional coordinates of each position of the face area in the real environment can be obtained through combining the distance between each position of the face area in the real environment and the binocular camera system, and finally modeling of the face area from the two-dimensional image to the three-dimensional structure can be completed through converting the three-dimensional coordinates into point clouds to represent the three-dimensional structure of the face. According to the embodiment of the disclosure, a first original image and a second original image are acquired, and the first original image and the second original image correspond to the same target face; correcting the first original image and the second original image to obtain a corrected first original image and a corrected second original image; inputting the corrected first original image and the corrected second original image into a preset face detection model to obtain a face first image and a face second image; processing the face first image and the face second image according to a stereo matching algorithm to obtain a face parallax image; based on the face parallax map, three-dimensional coordinates of a plurality of pixel points in a face area corresponding to the face first image and the face second image are obtained; according to the three-dimensional coordinates of a plurality of pixel points in the face region, the target face is subjected to face three-dimensional reconstruction, and the first original image and the second original image acquired by the binocular camera are corrected, so that the corrected first original image and the corrected second original image are positioned at the same imaging height in the horizontal direction, the matching range of the two pictures can be reduced from the full picture to the horizontal direction, the time complexity of a matching algorithm is reduced, the data operand is reduced, and the efficiency of the face three-dimensional reconstruction method is further improved.
Fig. 5 is a schematic structural diagram of a face three-dimensional reconstruction device according to an embodiment of the present disclosure. The face three-dimensional reconstruction means may be a terminal device as described in the above embodiments, or the face three-dimensional reconstruction means may be a part or assembly in the terminal device. The three-dimensional face reconstruction device provided in the embodiment of the present disclosure may execute the processing flow provided in the three-dimensional face reconstruction method embodiment, as shown in fig. 5, the three-dimensional face reconstruction device 50 includes: an acquisition module 51, a processing module 52, a calculation module 53, a reconstruction module 54; the acquiring module 51 is configured to acquire a face first image and a face second image according to a preset face detection model based on the first original image and the second original image, where the face first image and the face second image correspond to the same target face; the processing module 52 is configured to process the face first image and the face second image according to a stereo matching algorithm to obtain a face parallax map; the calculating module 53 is configured to obtain three-dimensional coordinates of a plurality of pixel points in a face area corresponding to the face first image and the face second image based on the face disparity map; the reconstruction module 54 is configured to perform three-dimensional face reconstruction on the target face according to three-dimensional coordinates of a plurality of pixel points in the face region.
Optionally, the processing module 52 includes a calculating unit 521, an aggregating unit 522, a generating unit 523, and an optimizing unit 524; the calculating unit 521 is configured to calculate an initial matching cost matrix of each pixel point in the first face image and each pixel point in the second face image; the aggregation unit 522 is configured to perform cross domain cost aggregation on the initial matching cost matrix of each pixel in the face first image, so as to obtain an aggregate cost matrix of each pixel in the face first image; the generating unit 523 is configured to generate an original disparity map based on the aggregation cost matrix of each pixel point; the optimizing unit 524 is configured to obtain a facial disparity map according to the original disparity map.
Optionally, the calculating unit 521 is configured to calculate a color value difference between each pixel point in the first image of the face and each pixel point in the second image of the face; calculating the Hamming distance between each pixel point in the first face image and each pixel point in the second face image; and obtaining an initial matching cost matrix of each pixel point in the first face image and each pixel point in the second face image based on the color value difference and the Hamming distance.
Optionally, the aggregation unit 522 is configured to construct a crisscross domain for each pixel point in the first image of the face; and obtaining an aggregation cost matrix of each pixel point in the face first image according to the initial matching cost of each pixel point in the cross domain.
Optionally, the generating unit 523 is configured to determine, based on the aggregation cost matrix of each pixel point, a parallax value corresponding to the minimum aggregation cost as a target parallax value of the pixel point; and generating an original parallax image according to the target parallax value of each pixel point.
Optionally, the acquiring module 51 includes an acquiring unit 511, a correcting unit 512, and a detecting unit 513; the acquiring unit 511 is configured to acquire a first original image and a second original image, where the first original image and the second original image correspond to the same target face; the correcting unit 512 is configured to correct the first original image and the second original image to obtain a corrected first original image and a corrected second original image; the detection unit 513 is configured to input the corrected first original image and the corrected second original image into a preset face detection model, so as to obtain a face first image and a face second image.
The three-dimensional facial reconstruction device in the embodiment shown in fig. 5 may be used to implement the technical solution of the above-mentioned method embodiment, and its implementation principle and technical effects are similar, and will not be described herein again.
In addition, the embodiment of the disclosure also provides a vehicle, which comprises the facial three-dimensional reconstruction device according to the embodiment.
Fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the disclosure. The electronic device may be a terminal device as described in the above embodiments. The electronic device provided in the embodiment of the present disclosure may execute the processing flow provided in the embodiment of the face three-dimensional reconstruction method, as shown in fig. 6, the electronic device 60 includes: a memory 61, a processor 62, computer programs and a communication interface 63; wherein the computer program is stored in the memory 61 and configured to be executed by the processor 62 for performing the facial three-dimensional reconstruction method as described above.
In addition, the embodiment of the present disclosure also provides a computer-readable storage medium having stored thereon a computer program that is executed by a processor to implement the face three-dimensional reconstruction method described in the above embodiment.
Furthermore, the disclosed embodiments also provide a computer program product comprising a computer program or instructions which, when executed by a processor, implement the face three-dimensional reconstruction method as described above.
Computer program code for carrying out operations of the present disclosure may be written in one or more programming languages, including, but not limited to, an object oriented programming language such as Java, smalltalk, C ++ and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (for example, through the Internet using an Internet service provider).
The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
It should be noted that in this document, relational terms such as "first" and "second" and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
The foregoing is merely a specific embodiment of the disclosure to enable one skilled in the art to understand or practice the disclosure. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the disclosure. Thus, the present disclosure is not intended to be limited to the embodiments shown and described herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (10)

1. A method of three-dimensional reconstruction of a face, the method comprising:
acquiring a face first image and a face second image according to a preset face detection model based on a first original image and a second original image, wherein the face first image and the face second image correspond to the same target face;
processing the face first image and the face second image according to a stereo matching algorithm to obtain a face parallax image;
Based on the face parallax map, three-dimensional coordinates of a plurality of pixel points in a face area corresponding to the face first image and the face second image are obtained;
and carrying out three-dimensional face reconstruction on the target face according to the three-dimensional coordinates of the plurality of pixel points in the face region.
2. The method of claim 1, wherein the processing the face first image and the face second image according to the stereo matching algorithm to obtain a face disparity map comprises:
Calculating an initial matching cost matrix of each pixel point in the face first image and each pixel point in the face second image;
Performing cross domain cost aggregation on the initial matching cost matrix of each pixel point in the face first image to obtain an aggregation cost matrix of each pixel point in the face first image;
generating an original parallax map based on the aggregation cost matrix of each pixel point;
And obtaining a face disparity map according to the original disparity map.
3. The method of claim 2, wherein the calculating an initial matching cost matrix for each pixel in the first image of the face and each pixel in the second image of the face comprises:
calculating the color value difference between each pixel point in the first face image and each pixel point in the second face image;
Calculating the Hamming distance between each pixel point in the first face image and each pixel point in the second face image;
and obtaining an initial matching cost matrix of each pixel point in the first face image and each pixel point in the second face image based on the color value difference and the Hamming distance.
4. The method of claim 2, wherein performing cross-domain cost aggregation on the initial matching cost matrix for each pixel in the first image of the face to obtain an aggregate cost matrix for each pixel in the first image of the face comprises:
Constructing a crisscross domain for each pixel point in the face first image;
and obtaining an aggregation cost matrix of each pixel point in the face first image according to the initial matching cost of each pixel point in the cross domain.
5. The method according to claim 2, wherein generating the original disparity map based on the aggregate cost matrix for each pixel includes:
based on the aggregation cost matrix of each pixel point, determining a parallax value corresponding to the minimum aggregation cost as a target parallax value of the pixel point;
And generating an original parallax image according to the target parallax value of each pixel point.
6. The method according to claim 1, wherein the acquiring the first image of the face and the second image of the face according to the preset face detection model based on the first original image and the second original image includes:
acquiring a first original image and a second original image, wherein the first original image and the second original image correspond to the same target face;
Correcting the first original image and the second original image to obtain a corrected first original image and a corrected second original image;
and inputting the corrected first original image and the corrected second original image into a preset face detection model to obtain a face first image and a face second image.
7. A facial three-dimensional reconstruction apparatus, the apparatus comprising:
the acquisition module is used for acquiring a first face image and a second face image according to a preset face detection model based on the first original image and the second original image, wherein the first face image and the second face image correspond to the same target face;
The processing module is used for processing the face first image and the face second image according to a stereo matching algorithm to obtain a face parallax image;
The calculating module is used for acquiring three-dimensional coordinates of a plurality of pixel points in a face area corresponding to the face first image and the face second image based on the face parallax image;
and the reconstruction module is used for carrying out three-dimensional face reconstruction on the target face according to the three-dimensional coordinates of the plurality of pixel points in the face region.
8. An electronic device, comprising:
a memory;
A processor; and
A computer program;
wherein the computer program is stored in the memory and configured to be executed by the processor to implement the method of any one of claims 1-6.
9. A computer readable storage medium, on which a computer program is stored, which computer program, when being executed by a processor, implements the method according to any of claims 1-6.
10. A vehicle, comprising: the facial three-dimensional reconstruction apparatus as set forth in claim 7; or an electronic device as claimed in claim 8; or a computer readable storage medium as claimed in claim 9.
CN202211320289.6A 2022-10-26 2022-10-26 Face three-dimensional reconstruction method, device, equipment, readable storage medium and vehicle Pending CN117974879A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211320289.6A CN117974879A (en) 2022-10-26 2022-10-26 Face three-dimensional reconstruction method, device, equipment, readable storage medium and vehicle

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211320289.6A CN117974879A (en) 2022-10-26 2022-10-26 Face three-dimensional reconstruction method, device, equipment, readable storage medium and vehicle

Publications (1)

Publication Number Publication Date
CN117974879A true CN117974879A (en) 2024-05-03

Family

ID=90850092

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211320289.6A Pending CN117974879A (en) 2022-10-26 2022-10-26 Face three-dimensional reconstruction method, device, equipment, readable storage medium and vehicle

Country Status (1)

Country Link
CN (1) CN117974879A (en)

Similar Documents

Publication Publication Date Title
WO2021115071A1 (en) Three-dimensional reconstruction method and apparatus for monocular endoscope image, and terminal device
CN111402313B (en) Image depth recovery method and device
Cyganek et al. An introduction to 3D computer vision techniques and algorithms
US20230027389A1 (en) Distance determination method, apparatus and system
US8355565B1 (en) Producing high quality depth maps
KR101903619B1 (en) Structured stereo
CN109640066B (en) Method and device for generating high-precision dense depth image
CN107545586B (en) Depth obtaining method and system based on light field polar line plane image local part
CN104662589A (en) Systems and methods for parallax detection and correction in images captured using array cameras
US11461911B2 (en) Depth information calculation method and device based on light-field-binocular system
CN113129430B (en) Underwater three-dimensional reconstruction method based on binocular structured light
US11651581B2 (en) System and method for correspondence map determination
WO2021008052A1 (en) Lens accuracy calibration method, apparatus and device for 3d photographic module
CN115035235A (en) Three-dimensional reconstruction method and device
CN115082450A (en) Pavement crack detection method and system based on deep learning network
CN116029996A (en) Stereo matching method and device and electronic equipment
CN113379815A (en) Three-dimensional reconstruction method and device based on RGB camera and laser sensor and server
CN105335959B (en) Imaging device quick focusing method and its equipment
JP2000121319A (en) Image processor, image processing method and supply medium
CN105574844A (en) Radiation response function estimation method and device
CN117974879A (en) Face three-dimensional reconstruction method, device, equipment, readable storage medium and vehicle
CN115661258A (en) Calibration method and device, distortion correction method and device, storage medium and terminal
CN112233164B (en) Method for identifying and correcting error points of disparity map
CN115830131A (en) Method, device and equipment for determining fixed phase deviation
CN112750157A (en) Depth image generation method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination