CN115861540A - Three-dimensional reconstruction method, device and equipment for two-dimensional face and storage medium - Google Patents

Three-dimensional reconstruction method, device and equipment for two-dimensional face and storage medium Download PDF

Info

Publication number
CN115861540A
CN115861540A CN202211583787.XA CN202211583787A CN115861540A CN 115861540 A CN115861540 A CN 115861540A CN 202211583787 A CN202211583787 A CN 202211583787A CN 115861540 A CN115861540 A CN 115861540A
Authority
CN
China
Prior art keywords
dimensional
preset
neural network
pairs
deviation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211583787.XA
Other languages
Chinese (zh)
Inventor
苏朋杨
陈泳桦
李伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Jitu Technology Co ltd
Original Assignee
Shanghai Jitu Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Jitu Technology Co ltd filed Critical Shanghai Jitu Technology Co ltd
Priority to CN202211583787.XA priority Critical patent/CN115861540A/en
Publication of CN115861540A publication Critical patent/CN115861540A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Image Analysis (AREA)

Abstract

The invention relates to the field of three-dimensional reconstruction, and discloses a three-dimensional reconstruction method, a three-dimensional reconstruction device, three-dimensional reconstruction equipment and a storage medium for a two-dimensional face. The method comprises the following steps: receiving a two-dimensional face image, and performing mouth key point detection processing on the two-dimensional face image based on a mediaprofile neural network to obtain a two-dimensional key point set; obtaining a three-dimensional face image based on a pix2pix neural network; based on a mediaprofile neural network, performing mouth key point detection processing on the three-dimensional face image to obtain a three-dimensional key point set; extracting N pairs of two-dimensional key points from the two-dimensional key point set, and extracting N pairs of three-dimensional key points from the three-dimensional key point set, wherein N is a positive integer; obtaining an absolute value of deviation according to a preset deviation algorithm; judging whether the absolute value of the deviation is smaller than an error threshold value; if the difference is smaller than the error threshold value, adding the two-dimensional face image into a reconstructed two-dimensional image training set; and inputting the pictures in the reconstructed two-dimensional map training set into a pix2pix neural network for training processing to generate a new pix2pix neural network.

Description

Three-dimensional reconstruction method, device and equipment for two-dimensional face and storage medium
Technical Field
The present invention relates to the field of three-dimensional reconstruction, and in particular, to a method, an apparatus, a device, and a storage medium for three-dimensional reconstruction of a two-dimensional face.
Background
The existing 3d reconstruction algorithms for 2d faces have many problems, but some lips are inaccurate. For example, the mouth of a 2d face is closed, but after 3d reconstruction, the lips of a 3d face are open. The method can accurately convert the 2d face picture into the 3d face picture by utilizing the strong corresponding (converting) relation of pix2 pix.
The problem that the mouth is closed inaccurately in 3d reconstruction of 2d face photos is solved by using the strong corresponding (conversion) relation of pix2 pix. The problem of inaccurate mouth closure in 3d reconstruction is derived from the characteristics of the neural network in training, and the loss function of the neural network considers global loss, which can consider the generation quality of the whole 3d reconstruction picture instead of the accuracy of single mouth closure during training. Therefore, although there are many existing 3d reconstruction techniques, there is no exception to the problem that the degree of mouth closure is not accurate enough. Therefore, a new technology is needed to solve the technical problem that the reconstruction of the mouth part, which is currently reconstructed from a two-dimensional face image into a three-dimensional image, is not accurate enough.
Disclosure of Invention
The invention mainly aims to solve the technical problem that the reconstruction of the mouth part of the current two-dimensional face picture into the three-dimensional picture is not accurate enough.
The invention provides a three-dimensional reconstruction method of a two-dimensional face, which comprises the following steps:
receiving a two-dimensional face image, and performing mouth key point detection processing on the two-dimensional face image based on a preset media neural network to obtain a two-dimensional key point set;
performing three-dimensional reconstruction processing on the two-dimensional face image based on a preset pix2pix neural network to obtain a three-dimensional face image;
based on a preset media neural network, performing mouth key point detection processing on the three-dimensional face image to obtain a three-dimensional key point set;
extracting N pairs of two-dimensional key points from the two-dimensional key point set, and extracting N pairs of three-dimensional key points from the three-dimensional key point set, wherein N is a positive integer;
performing deviation value operation processing on the N pairs of two-dimensional key points and the N pairs of three-dimensional key points according to a preset deviation algorithm to obtain a deviation absolute value;
judging whether the absolute value of the deviation is smaller than a preset error threshold value or not;
if the two-dimensional face image is smaller than a preset error threshold value, adding the two-dimensional face image into a preset reconstruction two-dimensional image training set;
and inputting the pictures in the reconstructed two-dimensional map training set into a preset pix2pix neural network for training processing to generate a new pix2pix neural network.
Optionally, in a first implementation manner of the first aspect of the present invention, the performing, according to a preset bias algorithm, a bias value operation on the N pairs of two-dimensional key points and the N pairs of three-dimensional key points to obtain a bias absolute value includes:
calculating the average number of the number of interval pixels corresponding to each pair of two-dimensional key points in the N pairs of two-dimensional key points to obtain a two-dimensional closed value;
calculating the average number of the number of interval pixels corresponding to each pair of three-dimensional key points in the N pairs of three-dimensional key points to obtain a three-dimensional closed value;
and calculating the absolute value of the difference value between the two-dimensional closed value and the three-dimensional closed value to obtain the absolute value of deviation.
Optionally, in a second implementation manner of the first aspect of the present invention, the extracting N pairs of two-dimensional key points from the two-dimensional key point set, and extracting N pairs of three-dimensional key points from the three-dimensional key point set includes:
extracting N pairs of two-dimensional key points in the two-dimensional key point set based on a preset media neural network;
and extracting N pairs of three-dimensional key points from the three-dimensional key point set according to the corresponding relation between the N pairs of two-dimensional key points and the three-dimensional face image based on a preset media neural network.
Optionally, in a third implementation manner of the first aspect of the present invention, after the inputting the pictures in the reconstructed two-dimensional map training set to a preset pix2pix neural network for training processing to generate a new pix2pix neural network, the method further includes:
and replacing the preset pix2pix neural network with the new pix2pix neural network.
Optionally, in a fourth implementation manner of the first aspect of the present invention, after determining whether the absolute value of the deviation is smaller than a preset error threshold, before inputting the pictures in the reconstructed two-dimensional map training set to a preset pix2pix neural network for training processing and generating a new pix2pix neural network, the method further includes:
and if the two-dimensional face image is not less than the preset error threshold, adding the two-dimensional face image into a preset verification image set.
Optionally, in a fifth implementation manner of the first aspect of the present invention, after the inputting the pictures in the reconstructed two-dimensional map training set into a preset pix2pix neural network for training processing to generate a new pix2pix neural network, the method further includes:
based on the new pix2pix neural network, performing three-dimensional reconstruction processing on each image of the verification image set to obtain a verification three-dimensional image set;
performing deviation analysis processing on the verified three-dimensional image set according to a preset verification algorithm to obtain an analysis result;
and when the analysis result is qualified, replacing the preset pix2pix neural network with the new pix2pix neural network.
Optionally, in a sixth implementation manner of the first aspect of the present invention, the performing, according to a preset verification algorithm, deviation analysis processing on the verified three-dimensional image set to obtain an analysis result includes:
extracting M pairs of two-dimensional key points and M pairs of three-dimensional key points corresponding to the verification three-dimensional image set and the verification image set based on a preset media neural network, wherein M is a positive integer;
performing deviation value operation processing on the M pairs of two-dimensional key points and the M pairs of three-dimensional key points according to a preset deviation algorithm to obtain a deviation absolute value;
when the absolute value of the deviation is smaller than a preset check threshold value, determining an analysis result as a qualified result;
and when the absolute value of the deviation is not less than a preset check threshold value, determining the analysis result as an unqualified result.
The second aspect of the present invention provides a three-dimensional reconstruction apparatus for a two-dimensional face, comprising:
the two-dimensional detection module is used for receiving a two-dimensional face image and detecting and processing mouth key points of the two-dimensional face image based on a preset media neural network to obtain a two-dimensional key point set;
the three-dimensional reconstruction module is used for performing three-dimensional reconstruction processing on the two-dimensional face image based on a preset pix2pix neural network to obtain a three-dimensional face image;
the three-dimensional detection module is used for detecting and processing the key points of the mouth of the three-dimensional face image based on a preset media neural network to obtain a three-dimensional key point set;
the extraction module is used for extracting N pairs of two-dimensional key points from the two-dimensional key point set and extracting N pairs of three-dimensional key points from the three-dimensional key point set, wherein N is a positive integer;
the deviation operation module is used for performing deviation value operation processing on the N pairs of two-dimensional key points and the N pairs of three-dimensional key points according to a preset deviation algorithm to obtain a deviation absolute value;
the judging module is used for judging whether the absolute value of the deviation is smaller than a preset error threshold value or not;
a training set adding module, configured to add the two-dimensional face image to a preset reconstructed two-dimensional image training set if the two-dimensional face image is smaller than a preset error threshold;
and the training module is used for inputting the pictures in the reconstructed two-dimensional image training set into a preset pix2pix neural network for training processing to generate a new pix2pix neural network.
A third aspect of the present invention provides a three-dimensional reconstruction apparatus of a two-dimensional face, including: a memory having instructions stored therein and at least one processor, the memory and the at least one processor interconnected by a line; the at least one processor invokes the instructions in the memory to cause the three-dimensional reconstruction device of the two-dimensional face to perform the three-dimensional reconstruction method of the two-dimensional face described above.
A fourth aspect of the present invention provides a computer-readable storage medium having stored therein instructions, which when run on a computer, cause the computer to execute the above-described three-dimensional reconstruction method of a two-dimensional face.
In the embodiment of the invention, the generated 3d reconstructed face photo has more accurate mouth closing degree and extremely high speed, and the 3d reconstruction process of the 2d face photo is realized very light in weight. The existing 3d reconstruction technique takes 0.05s to generate a 3d face picture, but pix2pix only needs 0.0ls. The application of 3D face reconstruction is very wide, for example, the fields of single-photo speaking, real person digital people, video motion synchronization and the like can be applied to the 3D reconstruction technology, and by applying the set of method, a more accurate 3D face can be generated, so that the subsequent application can obtain a better effect, and the technical problem that the reconstruction of the mouth part reconstructed from the current two-dimensional face picture into the three-dimensional picture is not accurate enough is solved.
Drawings
FIG. 1 is a diagram of a method for three-dimensional reconstruction of a two-dimensional face according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a two-dimensional key point set of a mediapipe neural network;
FIG. 3 is a schematic diagram of a two-dimensional facial image constructing a three-dimensional facial image;
FIG. 4 is a schematic diagram of a two-dimensional keypoint set extracting three pairs of two-dimensional keypoints;
FIG. 5 is a schematic diagram of an embodiment of a three-dimensional reconstruction apparatus for a two-dimensional face according to an embodiment of the present invention;
FIG. 6 is a schematic diagram of another embodiment of a three-dimensional reconstruction apparatus for a two-dimensional face according to an embodiment of the present invention;
fig. 7 is a schematic diagram of an embodiment of a three-dimensional reconstruction device for a two-dimensional face in the embodiment of the present invention.
Detailed Description
The embodiment of the invention provides a three-dimensional reconstruction method, a three-dimensional reconstruction device, three-dimensional reconstruction equipment and a storage medium for a two-dimensional face.
The terms "first," "second," "third," "fourth," and the like in the description and in the claims, as well as in the drawings, if any, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It will be appreciated that the data so used may be interchanged under appropriate circumstances such that the embodiments described herein may be implemented in other sequences than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," or "having," and any variations thereof, are intended to cover non-exclusive inclusions, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
For understanding, a specific flow of an embodiment of the present invention is described below, and referring to fig. 1, an embodiment of a three-dimensional reconstruction method for a two-dimensional face according to an embodiment of the present invention includes:
101. receiving a two-dimensional face image, and performing mouth key point detection processing on the two-dimensional face image based on a preset media neural network to obtain a two-dimensional key point set;
in this embodiment, a median neural network model is used to detect the distance between the upper and lower lips of 3000 faces 2d and 3d using a neural network model for face key point detection. The mediaprofile neural network model has the characteristics of high speed, dense and accurate face detection points and the like. In order to calculate the degree of closure of the upper and lower lips, accurate and dense key points of the face are required, so that a mediaphipe neural network is selected.
Referring to fig. 2, fig. 2 is a schematic diagram of a two-dimensional key point set of a mediapipe neural network. The key points in fig. 2 are concentrated in the mouth of the face.
102. Performing three-dimensional reconstruction processing on the two-dimensional face image based on a preset pix2pix neural network to obtain a three-dimensional face image;
103. based on a preset media neural network, performing mouth key point detection processing on the three-dimensional face image to obtain a three-dimensional key point set;
in steps 102-103, referring to fig. 3, fig. 3 is a schematic diagram of constructing a three-dimensional face image from a two-dimensional face image, in fig. 3, the left image is a selected two-dimensional face image, and the right image is a reconstructed three-dimensional face image, a three-dimensional reconstruction processing scheme of a pix2pix neural network is used, and similarly to step 101, a detection processing is also performed on a keypoint of a three-dimensional mouth by using a mediaprofile neural network, so as to obtain a three-dimensional keypoint set.
104. Extracting N pairs of two-dimensional key points from the two-dimensional key point set, and extracting N pairs of three-dimensional key points from the three-dimensional key point set, wherein N is a positive integer;
in this embodiment, reference may be made to fig. 4, where fig. 4 is a schematic diagram of extracting three pairs of two-dimensional keypoints from the two-dimensional keypoint set of fig. 2, and the three pairs of two-dimensional keypoints in fig. 4 are selected from the two-dimensional keypoint set of fig. 2. And similarly, 3 pairs of three-dimensional key points are extracted from the three-dimensional key point set, so that the next step of deviation analysis is facilitated.
Further, at 104, the following steps are performed:
1041. extracting N pairs of two-dimensional key points from the two-dimensional key point set based on a preset media neural network;
1042. and extracting N pairs of three-dimensional key points from the three-dimensional key point set according to the corresponding relation between the N pairs of two-dimensional key points and the three-dimensional face image based on a preset media neural network.
In the steps 1041 to 1042, 3 pairs of two-dimensional key points and 3 pairs of three-dimensional key points are collected, and based on the reconstructed relationship mapping content, a mediaprofile neural network is used for extraction processing, that is, the 3 pairs of two-dimensional key points and the 3 pairs of three-dimensional key points are nodes which are mutually corresponding in two-dimensional conversion and three-dimensional conversion.
105. Performing deviation value operation processing on the N pairs of two-dimensional key points and the N pairs of three-dimensional key points according to a preset deviation algorithm to obtain a deviation absolute value;
in this embodiment, by using the deviation analysis, the absolute value of the deviation can be obtained by analyzing the numerical value of the difference in the opening/closing state distance between 3 pairs of two-dimensional key points and 3 pairs of three-dimensional key points.
Further, at 105, the following steps may be performed:
1051. calculating the average number of the number of interval pixels corresponding to each pair of two-dimensional key points in the N pairs of two-dimensional key points to obtain a two-dimensional closed value;
1052. calculating the average number of the corresponding spaced pixels of each pair of three-dimensional key points in the N pairs of three-dimensional key points to obtain a three-dimensional closed value;
1053. and calculating the absolute value of the difference value between the two-dimensional closed value and the three-dimensional closed value to obtain the absolute value of the deviation.
In the 1051-1053 steps, the degree of closure is defined as: using the face key point detection in step 105, 3 points of the upper lip and 3 points of the lower lip are located, for example, the first point of the upper lip has coordinates (50, 80), and the first point of the lower lip has coordinates (50, 89), so that the longitudinal distance between the two points is 89-80=9 pixel points. As shown in the key points of FIG. 4, the longitudinal average distance of 3 pairs of upper and lower lip detection points is selected to measure the degree of mouth closure. For example, the longitudinal distance of the first group of monitoring points is 10, the longitudinal distance of the second group is 15, and the longitudinal distance of the third group is 15, then the degree of mouth closure is (10 + 15)/3 =10 pixels. And performing key point detection on the 2D face photos by using a media network, and calculating to obtain the closing degree B of each 2D face mouth. And (4) performing key point detection on the 3D face picture by using a mediaphipe network, and calculating to obtain the closing degree C of each 2D face mouth.
The absolute value of the deviation defines the absolute value of the deviation = | degree of closure B-degree of closure C |, where | is the absolute value.
106. Judging whether the absolute value of the deviation is smaller than a preset error threshold value or not;
in this embodiment, by calculating the error value, the pair of photos with the 3D mouth and the 2D mouth close to each other in the 3D reconstruction process is selected. Error values of the closeness of the pairs of 2D face photos and 3D face photos are calculated 3000, and paired 2D and 3D photos with error values smaller than 3 image searching points are screened. If the number of pictures is large enough, the threshold value of the difference value can be reduced to 1-2 pixel points, so that the final result can be more accurate. But the lower the threshold drop, the fewer pairs of pictures are obtained.
Further, after 106, before 108, the following steps may also be performed:
1061. and if the two-dimensional face image is not less than the preset error threshold, adding the two-dimensional face image into a preset verification image set.
In this embodiment, the verification atlas is 2D images with inaccurate mouth closure in 3D reconstruction. The difference between the 3D mouth and 2D mouth closure was greater than 3 pixels and the photograph was not in the training set of the pix2pix model.
107. If the two-dimensional face image is smaller than the preset error threshold value, adding the two-dimensional face image into a preset reconstruction two-dimensional image training set;
108. and inputting the pictures in the reconstructed two-dimensional picture training set into a preset pix2pix neural network for training processing to generate a new pix2pix neural network.
In steps 107-108, the 2D and 3D pictures which are smaller than the preset error threshold value pair are used as a training set to train the pix2pix neural network model. The following pix2pix neural network is introduced here: when the picture A and the picture B have strong correlation, the pix2pix neural network can convert the picture A into the picture B. Since our 3D faces are 3D reconstructed from 2D faces, they have very strong correlation and index level correspondence. Therefore, after the training of the paired data sets, the pix2pix neural network can accurately change the 2D face picture into the 3D face picture.
Further, after 108, the following steps are performed:
1081. and replacing the preset pix2pix neural network with the new pix2pix neural network.
In this embodiment, the newly trained pix2pix neural network replaces the original pix2pix neural network, and the new pix2pix neural network is more accurate in the mouth state of constructing the three-dimensional image.
Further, under the scheme 1061, after 108, the following steps may be performed:
1082. based on the new pix2pix neural network, performing three-dimensional reconstruction processing on each image of the verification image set to obtain a verification three-dimensional image set;
1083. according to a preset checking algorithm, carrying out deviation analysis processing on the three-dimensional image set to be checked to obtain an analysis result;
1084. and when the analysis result is qualified, replacing the preset pix2pix neural network with the new pix2pix neural network.
And at the step 1082-1084, three-dimensional image reconstruction is carried out again on the original image which is not successfully reconstructed, then the deviation condition of each three-dimensional image is analyzed, and the trained pix2pix neural network replaces the original pix2pix neural network when the requirement value is met.
Further, at 1083 the following steps may be performed:
10831. extracting M pairs of two-dimensional key points and M pairs of three-dimensional key points corresponding to a verification three-dimensional image set and the verification image set based on a preset media neural network, wherein M is a positive integer;
10832. performing deviation value operation processing on the M pairs of two-dimensional key points and the M pairs of three-dimensional key points according to a preset deviation algorithm to obtain a deviation absolute value;
10833. when the absolute value of the deviation is smaller than a preset check threshold value, determining the analysis result as a qualified result;
10834. and when the absolute value of the deviation is not less than a preset check threshold value, confirming the analysis result as an unqualified result.
In the 10831-10834 steps, in order to avoid the accidental error of single picture, we select out the 2D picture with inaccurate mouth closing degree in 3000 pictures for 3D reconstruction as the input of the pix2pix neural network. Finally, the error value of the degree of closure of the mouth of the 3D face after pix2pix optimization is generally found to be smaller than that obtained by a 3D reconstruction method. We add up the error values for 3000 photos, each.
The total error value obtained by the 3D reconstruction method is 5371.2, while the total error value of the pix2pix neural network optimization is 4156.4, and the verification threshold is 4500, the analysis result is determined to be a qualified result.
In the embodiment of the invention, the generated 3d reconstructed face picture has more accurate mouth closing degree and extremely high speed, and the 3d reconstruction process of the 2d face picture is realized very light in weight. The existing 3d reconstruction technique takes 0.05s to generate a 3d face picture, but pix2pix only needs 0.0ls. The application of 3D face reconstruction is very wide, for example, the fields of single-photo speaking, real person digital people, video motion synchronization and the like can be applied to the 3D reconstruction technology, and by applying the set of method, a more accurate 3D face can be generated, so that the subsequent application can obtain a better effect, and the technical problem that the reconstruction of the mouth part reconstructed from the current two-dimensional face picture into the three-dimensional picture is not accurate enough is solved.
With reference to fig. 5, the three-dimensional reconstruction apparatus for a two-dimensional face according to an embodiment of the present invention includes:
the two-dimensional detection module 501 is configured to receive a two-dimensional face image, and perform mouth key point detection processing on the two-dimensional face image based on a preset media neural network to obtain a two-dimensional key point set;
a three-dimensional reconstruction module 502, configured to perform three-dimensional reconstruction processing on the two-dimensional face image based on a preset pix2pix neural network to obtain a three-dimensional face image;
the three-dimensional detection module 503 is configured to perform mouth key point detection processing on the three-dimensional face image based on a preset media neural network to obtain a three-dimensional key point set;
an extracting module 504, configured to extract N pairs of two-dimensional key points from the two-dimensional key point set, and extract N pairs of three-dimensional key points from the three-dimensional key point set, where N is a positive integer;
a deviation operation module 505, configured to perform deviation value operation on the N pairs of two-dimensional key points and the N pairs of three-dimensional key points according to a preset deviation algorithm, so as to obtain a deviation absolute value;
a judging module 506, configured to judge whether the absolute value of the deviation is smaller than a preset error threshold;
a training set adding module 507, configured to add the two-dimensional face image to a preset reconstructed two-dimensional image training set if the two-dimensional face image is smaller than a preset error threshold;
and the training module 508 is configured to input the pictures in the reconstructed two-dimensional map training set to a preset pix2pix neural network for training processing, so as to generate a new pix2pix neural network.
In the embodiment of the invention, the generated 3d reconstructed face photo has more accurate mouth closing degree and extremely high speed, and the 3d reconstruction process of the 2d face photo is realized very light in weight. The existing 3d reconstruction technique takes 0.05s to generate a 3d face picture, but pix2pix only needs 0.0ls. The application of 3D face reconstruction is very wide, for example, the fields of single-photo speaking, real person digital people, video motion synchronization and the like can be applied to the 3D reconstruction technology, and by applying the set of method, a more accurate 3D face can be generated, so that the subsequent application can obtain a better effect, and the technical problem that the reconstruction of the mouth part reconstructed from the current two-dimensional face picture into the three-dimensional picture is not accurate enough is solved.
Referring to fig. 6, in another embodiment of the three-dimensional reconstruction apparatus for a two-dimensional face according to the present invention, the three-dimensional reconstruction apparatus for a two-dimensional face includes:
the two-dimensional detection module 501 is configured to receive a two-dimensional face image, and perform mouth key point detection processing on the two-dimensional face image based on a preset media neural network to obtain a two-dimensional key point set;
a three-dimensional reconstruction module 502, configured to perform three-dimensional reconstruction processing on the two-dimensional face image based on a preset pix2pix neural network to obtain a three-dimensional face image;
the three-dimensional detection module 503 is configured to perform mouth key point detection processing on the three-dimensional face image based on a preset media neural network to obtain a three-dimensional key point set;
an extracting module 504, configured to extract N pairs of two-dimensional key points from the two-dimensional key point set, and extract N pairs of three-dimensional key points from the three-dimensional key point set, where N is a positive integer;
a deviation operation module 505, configured to perform deviation value operation on the N pairs of two-dimensional key points and the N pairs of three-dimensional key points according to a preset deviation algorithm, so as to obtain a deviation absolute value;
a judging module 506, configured to judge whether the absolute value of the deviation is smaller than a preset error threshold;
a training set adding module 507, configured to add the two-dimensional face image to a preset reconstructed two-dimensional image training set if the two-dimensional face image is smaller than a preset error threshold;
and the training module 508 is configured to input the pictures in the reconstructed two-dimensional map training set to a preset pix2pix neural network for training processing, so as to generate a new pix2pix neural network.
The deviation operation module 505 is specifically configured to:
calculating the average number of the number of interval pixels corresponding to each pair of two-dimensional key points in the N pairs of two-dimensional key points to obtain a two-dimensional closed value;
calculating the average number of the number of interval pixels corresponding to each pair of three-dimensional key points in the N pairs of three-dimensional key points to obtain a three-dimensional closed value;
and calculating the absolute value of the difference value between the two-dimensional closed value and the three-dimensional closed value to obtain the absolute value of deviation.
Wherein the extracting module 504 is specifically configured to:
extracting N pairs of two-dimensional key points in the two-dimensional key point set based on a preset media neural network;
and extracting N pairs of three-dimensional key points from the three-dimensional key point set according to the corresponding relation between the N pairs of two-dimensional key points and the three-dimensional face image based on a preset media neural network.
The three-dimensional reconstruction apparatus for a two-dimensional face further includes a replacement module 509, where the replacement module 509 is specifically configured to:
and replacing the preset pix2pix neural network with the new pix2pix neural network.
The three-dimensional reconstruction apparatus for a two-dimensional face further includes a verification set adding module 510, where the verification set adding module 510 is specifically configured to:
and if the two-dimensional face image is not less than the preset error threshold, adding the two-dimensional face image into a preset verification image set.
The three-dimensional reconstruction apparatus for a two-dimensional face further includes a verification module 511, where the verification module 511 is specifically configured to:
based on the new pix2pix neural network, performing three-dimensional reconstruction processing on each image of the verification image set to obtain a verification three-dimensional image set;
performing deviation analysis processing on the verified three-dimensional image set according to a preset verification algorithm to obtain an analysis result;
and when the analysis result is qualified, replacing the preset pix2pix neural network with the new pix2pix neural network.
The verification module 511 may further specifically be configured to:
extracting M pairs of two-dimensional key points and M pairs of three-dimensional key points corresponding to the verification three-dimensional image set and the verification image set based on a preset media neural network, wherein M is a positive integer;
performing deviation value operation processing on the M pairs of two-dimensional key points and the M pairs of three-dimensional key points according to a preset deviation algorithm to obtain a deviation absolute value;
when the absolute value of the deviation is smaller than a preset check threshold value, determining an analysis result as a qualified result;
and when the absolute value of the deviation is not less than a preset check threshold value, determining the analysis result as an unqualified result.
In the embodiment of the invention, the generated 3d reconstructed face photo has more accurate mouth closing degree and extremely high speed, and the 3d reconstruction process of the 2d face photo is realized very light in weight. The existing 3d reconstruction technique takes 0.05s to generate a 3d face picture, but pix2pix only needs 0.0ls. The application of 3D face reconstruction is very wide, for example, the fields of single-photo speaking, real person digital people, video motion synchronization and the like can be applied to the 3D reconstruction technology, and by applying the set of method, a more accurate 3D face can be generated, so that the subsequent application can obtain a better effect, and the technical problem that the reconstruction of the mouth part reconstructed from the current two-dimensional face picture into the three-dimensional picture is not accurate enough is solved.
Fig. 5 and 6 describe the three-dimensional reconstruction apparatus for a two-dimensional face in the embodiment of the present invention in detail from the perspective of a modular functional entity, and the three-dimensional reconstruction apparatus for a two-dimensional face in the embodiment of the present invention in detail from the perspective of hardware processing.
Fig. 7 is a schematic structural diagram of a three-dimensional reconstruction apparatus for a two-dimensional face 700 according to an embodiment of the present invention, which may have a relatively large difference due to different configurations or performances, and may include one or more processors (CPUs) 710 (e.g., one or more processors) and a memory 720, one or more storage media 730 (e.g., one or more mass storage devices) for storing applications 733 or data 732. Memory 720 and storage medium 730 may be, among other things, transient storage or persistent storage. The program stored on the storage medium 730 may include one or more modules (not shown), each of which may include a series of instruction operations in the three-dimensional reconstruction apparatus 700 for two-dimensional faces. Still further, the processor 710 may be configured to communicate with the storage medium 730 to execute a series of instruction operations in the storage medium 730 on the three-dimensional reconstruction device 700 of a two-dimensional face.
The two-dimensional face based three-dimensional reconstruction device 700 may also include one or more power supplies 740, one or more wired or wireless network interfaces 750, one or more input-output interfaces 760, and/or one or more operating systems 731, such as Windows Server, mac OS X, unix, linux, freeBSD, and the like. It will be understood by those skilled in the art that the configuration of the two-dimensional face three-dimensional reconstruction device shown in fig. 7 does not constitute a limitation of the two-dimensional face based three-dimensional reconstruction device and may include more or fewer components than those shown, or some components may be combined, or a different arrangement of components may be used.
The present invention also provides a computer-readable storage medium, which may be a non-volatile computer-readable storage medium, and which may also be a volatile computer-readable storage medium, having stored therein instructions, which, when run on a computer, cause the computer to perform the steps of the method for three-dimensional reconstruction of two-dimensional faces.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses, and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention, which is substantially or partly contributed by the prior art, or all or part of the technical solution may be embodied in a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a read-only memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
The above-mentioned embodiments are only used for illustrating the technical solutions of the present invention, and not for limiting the same; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (10)

1. A three-dimensional reconstruction method of a two-dimensional face is characterized by comprising the following steps:
receiving a two-dimensional face image, and performing mouth key point detection processing on the two-dimensional face image based on a preset media neural network to obtain a two-dimensional key point set;
performing three-dimensional reconstruction processing on the two-dimensional face image based on a preset pix2pix neural network to obtain a three-dimensional face image;
based on a preset media neural network, performing mouth key point detection processing on the three-dimensional face image to obtain a three-dimensional key point set;
extracting N pairs of two-dimensional key points from the two-dimensional key point set, and extracting N pairs of three-dimensional key points from the three-dimensional key point set, wherein N is a positive integer;
performing deviation value operation processing on the N pairs of two-dimensional key points and the N pairs of three-dimensional key points according to a preset deviation algorithm to obtain a deviation absolute value;
judging whether the absolute value of the deviation is smaller than a preset error threshold value or not;
if the two-dimensional face image is smaller than a preset error threshold value, adding the two-dimensional face image into a preset reconstruction two-dimensional image training set;
and inputting the pictures in the reconstructed two-dimensional map training set into a preset pix2pix neural network for training processing to generate a new pix2pix neural network.
2. The method for three-dimensional reconstruction of two-dimensional face according to claim 1, wherein said performing deviation value operation on said N pairs of two-dimensional key points and said N pairs of three-dimensional key points according to a preset deviation algorithm to obtain a deviation absolute value comprises:
calculating the average number of the number of interval pixels corresponding to each pair of two-dimensional key points in the N pairs of two-dimensional key points to obtain a two-dimensional closed value;
calculating the average number of the number of interval pixels corresponding to each pair of three-dimensional key points in the N pairs of three-dimensional key points to obtain a three-dimensional closed value;
and calculating the absolute value of the difference value between the two-dimensional closed value and the three-dimensional closed value to obtain a deviation absolute value.
3. The method for three-dimensional reconstruction of a two-dimensional face according to claim 1, wherein said extracting N pairs of two-dimensional keypoints from the two-dimensional keypoint set and extracting N pairs of three-dimensional keypoints from the three-dimensional keypoint set comprises:
extracting N pairs of two-dimensional key points in the two-dimensional key point set based on a preset media neural network;
and extracting N pairs of three-dimensional key points from the three-dimensional key point set according to the corresponding relation between the N pairs of two-dimensional key points and the three-dimensional face image based on a preset media neural network.
4. The method for three-dimensional reconstruction of a two-dimensional face according to claim 1, wherein after the pictures in the training set of the reconstructed two-dimensional map are input to a preset pix2pix neural network for training processing to generate a new pix2pix neural network, the method further comprises:
and replacing the preset pix2pix neural network with the new pix2pix neural network.
5. The method for three-dimensional reconstruction of a two-dimensional face according to claim 1, wherein after determining whether the absolute value of the deviation is smaller than a preset error threshold, before inputting the pictures in the reconstructed two-dimensional map training set to a preset pix2pix neural network for training processing and generating a new pix2pix neural network, the method further comprises:
and if the two-dimensional face image is not less than the preset error threshold value, adding the two-dimensional face image into a preset verification image set.
6. The method for three-dimensional reconstruction of a two-dimensional face according to claim 5, wherein after the pictures in the training set of the reconstructed two-dimensional map are input to a preset pix2pix neural network for training processing to generate a new pix2pix neural network, the method further comprises:
based on the new pix2pix neural network, performing three-dimensional reconstruction processing on each image of the verification image set to obtain a verification three-dimensional image set;
performing deviation analysis processing on the verified three-dimensional image set according to a preset verification algorithm to obtain an analysis result;
and when the analysis result is qualified, replacing the preset pix2pix neural network with the new pix2pix neural network.
7. The method for three-dimensional reconstruction of a two-dimensional face according to claim 6, wherein the performing a deviation analysis process on the verified three-dimensional image set according to a preset verification algorithm to obtain an analysis result comprises:
extracting M pairs of two-dimensional key points and M pairs of three-dimensional key points corresponding to the verification three-dimensional image set and the verification image set based on a preset media neural network, wherein M is a positive integer;
performing deviation value operation processing on the M pairs of two-dimensional key points and the M pairs of three-dimensional key points according to a preset deviation algorithm to obtain a deviation absolute value;
when the absolute value of the deviation is smaller than a preset check threshold value, determining an analysis result as a qualified result;
and when the absolute value of the deviation is not less than a preset check threshold value, determining the analysis result as an unqualified result.
8. A three-dimensional reconstruction apparatus for a two-dimensional face, comprising:
the two-dimensional detection module is used for receiving a two-dimensional face image and detecting and processing mouth key points of the two-dimensional face image based on a preset media neural network to obtain a two-dimensional key point set;
the three-dimensional reconstruction module is used for performing three-dimensional reconstruction processing on the two-dimensional face image based on a preset pix2pix neural network to obtain a three-dimensional face image;
the three-dimensional detection module is used for detecting and processing the key points of the mouth of the three-dimensional face image based on a preset media neural network to obtain a three-dimensional key point set;
the extraction module is used for extracting N pairs of two-dimensional key points from the two-dimensional key point set and extracting N pairs of three-dimensional key points from the three-dimensional key point set, wherein N is a positive integer;
the deviation operation module is used for performing deviation value operation processing on the N pairs of two-dimensional key points and the N pairs of three-dimensional key points according to a preset deviation algorithm to obtain a deviation absolute value;
the judging module is used for judging whether the absolute value of the deviation is smaller than a preset error threshold value or not;
a training set adding module, configured to add the two-dimensional face image to a preset reconstructed two-dimensional image training set if the two-dimensional face image is smaller than a preset error threshold;
and the training module is used for inputting the pictures in the reconstructed two-dimensional image training set into a preset pix2pix neural network for training processing to generate a new pix2pix neural network.
9. A three-dimensional reconstruction device of a two-dimensional face, characterized in that it comprises: a memory having instructions stored therein and at least one processor, the memory and the at least one processor interconnected by a line;
the at least one processor invokes the instructions in the memory to cause the three-dimensional reconstruction device of the two-dimensional face to perform the three-dimensional reconstruction method of the two-dimensional face as recited in any one of claims 1-7.
10. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out a method of three-dimensional reconstruction of a two-dimensional face as claimed in any one of claims 1 to 7.
CN202211583787.XA 2022-12-09 2022-12-09 Three-dimensional reconstruction method, device and equipment for two-dimensional face and storage medium Pending CN115861540A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211583787.XA CN115861540A (en) 2022-12-09 2022-12-09 Three-dimensional reconstruction method, device and equipment for two-dimensional face and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211583787.XA CN115861540A (en) 2022-12-09 2022-12-09 Three-dimensional reconstruction method, device and equipment for two-dimensional face and storage medium

Publications (1)

Publication Number Publication Date
CN115861540A true CN115861540A (en) 2023-03-28

Family

ID=85671806

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211583787.XA Pending CN115861540A (en) 2022-12-09 2022-12-09 Three-dimensional reconstruction method, device and equipment for two-dimensional face and storage medium

Country Status (1)

Country Link
CN (1) CN115861540A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116580099A (en) * 2023-07-14 2023-08-11 山东艺术学院 Forest land target positioning method based on fusion of video and three-dimensional model

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116580099A (en) * 2023-07-14 2023-08-11 山东艺术学院 Forest land target positioning method based on fusion of video and three-dimensional model

Similar Documents

Publication Publication Date Title
Ruiz et al. Fine-grained head pose estimation without keypoints
CN110909651B (en) Method, device and equipment for identifying video main body characters and readable storage medium
US11928800B2 (en) Image coordinate system transformation method and apparatus, device, and storage medium
CN107481279B (en) Monocular video depth map calculation method
Zach et al. Disambiguating visual relations using loop constraints
EP3151160B1 (en) Visual attention detector and visual attention detection method
CN111241989A (en) Image recognition method and device and electronic equipment
CN110598019B (en) Repeated image identification method and device
CN110827312B (en) Learning method based on cooperative visual attention neural network
Tekin et al. Fusing 2d uncertainty and 3d cues for monocular body pose estimation
CN112183456A (en) Multi-scene moving object detection method and device based on sample generation and domain adaptation
CN116229019A (en) Digital twinning-oriented large-scene fusion three-dimensional reconstruction method and system
US11651581B2 (en) System and method for correspondence map determination
WO2019127102A1 (en) Information processing method and apparatus, cloud processing device, and computer program product
KR20190125029A (en) Methods and apparatuses for generating text to video based on time series adversarial neural network
CN111144215A (en) Image processing method, image processing device, electronic equipment and storage medium
KR101959436B1 (en) The object tracking system using recognition of background
CN115861540A (en) Three-dimensional reconstruction method, device and equipment for two-dimensional face and storage medium
CN114332473A (en) Object detection method, object detection device, computer equipment, storage medium and program product
CN110717593B (en) Method and device for neural network training, mobile information measurement and key frame detection
CN111402156A (en) Restoration method and device for smear image, storage medium and terminal equipment
CN112633100B (en) Behavior recognition method, behavior recognition device, electronic equipment and storage medium
CN116958267B (en) Pose processing method and device, electronic equipment and storage medium
Arrigoni et al. Robust global motion estimation with matrix completion
Guo et al. Deep network with spatial and channel attention for person re-identification

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination