CN112069923B - 3D face point cloud reconstruction method and system - Google Patents

3D face point cloud reconstruction method and system Download PDF

Info

Publication number
CN112069923B
CN112069923B CN202010834329.3A CN202010834329A CN112069923B CN 112069923 B CN112069923 B CN 112069923B CN 202010834329 A CN202010834329 A CN 202010834329A CN 112069923 B CN112069923 B CN 112069923B
Authority
CN
China
Prior art keywords
point cloud
face
module
point
initial
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010834329.3A
Other languages
Chinese (zh)
Other versions
CN112069923A (en
Inventor
顾一新
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong Zhengyang Sensor Technology Co ltd
Original Assignee
Guangdong Zhengyang Sensor Technology Co ltd
Filing date
Publication date
Application filed by Guangdong Zhengyang Sensor Technology Co ltd filed Critical Guangdong Zhengyang Sensor Technology Co ltd
Priority to CN202010834329.3A priority Critical patent/CN112069923B/en
Publication of CN112069923A publication Critical patent/CN112069923A/en
Application granted granted Critical
Publication of CN112069923B publication Critical patent/CN112069923B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

The invention discloses a 3D face point cloud reconstruction method and a system, wherein the method comprises four stages of initial point cloud establishment, point cloud registration, point cloud resampling and point cloud enhancement; in the point cloud registration stage, coefficients required by point cloud registration are dynamically generated in a neural network learning mode, so that the generated point cloud data are more accurate and do not depend on corresponding relations in point pairs too much; meanwhile, a feature matrix of two adjacent frames of point clouds is generated in a down-sampling mode of a neural network, euclidean distance between the two feature matrices is obtained, so that the calculated amount of the point clouds is reduced, and the condition that the point clouds can be processed in real time is basically achieved; in addition, the noise of the reconstructed point cloud is further reduced through the point cloud resampling, and the robustness of the reconstructed point cloud is improved through the point cloud enhancement mode, so that the method is better applied to the subsequent curved surface reconstruction.

Description

3D face point cloud reconstruction method and system
Technical Field
The invention relates to the technical field of 3D face reconstruction, in particular to a 3D face point cloud reconstruction method and system.
Background
With the development of Chinese economy, the level of urbanization is further improved, and the automatic detection system for drivers has become an important direction for the development of modern vehicle-mounted intelligent systems. Face recognition systems are also evolving from traditional 2D image recognition modes to 3D deep learning modes as an important component thereof. In 3D face recognition and related tasks, the point cloud reconstruction problem of the 3D data of the face is an important split in one of the pre-processing. Therefore, the key technology of the point cloud reconstruction method based on the 3D data has become a research hotspot in the related fields at home and abroad.
The 3D point cloud reconstruction is used as one of the front-end tasks of related tasks such as 3D face recognition and the like, so that the defect that the 2D face recognition and related tasks are too dependent on imaging quality can be avoided to a great extent, and the accuracy of the face recognition task and the robustness to scene environments are improved. However, the conventional point cloud reconstruction method consumes a large amount of calculation amount and is time-consuming, and in the reconstruction process, the ICP algorithm is too dependent, that is, the correspondence between the point pairs is too dependent, and the point cloud with the lacking correspondence and partial visibility cannot be processed, so that the point cloud cannot be applied in a real scene.
Disclosure of Invention
The invention aims to provide a 3D face point cloud reconstruction method for solving the defects of the technical problems, so as to improve the accuracy of 3D face recognition and effectively reduce the calculated amount.
Another object of the present invention is to provide a 3D face point cloud reconstruction system, so as to improve the accuracy of 3D face recognition and effectively reduce the calculation amount.
In order to achieve the above purpose, the invention discloses a 3D face point cloud reconstruction method, which comprises the following steps:
1) Selecting two adjacent frames of face images from a video stream comprising multiple frames of face images, and processing to obtain initial point clouds S T、ST+1 corresponding to the two frames of face images respectively;
2) Then transforming the initial point cloud S T through a predefined initial transformation matrix { R i,Ti }, obtaining S T, splicing and fusing S T and S T+1, and sending the fused S T and S T+1 into a coefficient prediction network based on a convolutional neural network and comprising a convolutional layer and a maximum pooling layer, so as to obtain a group of iterative coefficient matrixes (alpha, beta);
3) Respectively sending S T and S T+1 into a first feature extraction network based on a convolutional neural network, wherein the first feature extraction network is used for performing network downsampling on S T and S T+1 so as to obtain two groups of feature matrixes F T and F T+1;
4) Obtaining an initial registration matrix M 0 by a calculation model, wherein the input parameters of the calculation model are a group of iteration coefficient matrices (alpha, beta), and feature matrices F T and F T+1;
5) Normalizing the initial registration matrix M 0 to obtain a final registration matrix M T;
6) Singular value decomposition is carried out on M T by adopting a decomposition algorithm to obtain a transformation matrix { R T,TT }, and { R i,Ti } is updated by adopting { R T,TT } and is used as an initial value of the next frame point cloud transformation.
Compared with the prior art, the 3D face point cloud reconstruction method comprises initial point cloud establishment and point cloud registration, wherein after the initial point cloud is established and subjected to primary coordinate transformation, point clouds of adjacent frames are selected, point clouds are fused, the fused point clouds are sent into a convolutional neural network to obtain a group of coefficient matrixes alpha and beta, meanwhile, the point clouds of the selected adjacent frames are respectively sent into another convolutional neural network to be subjected to network downsampling, point cloud data are reduced to obtain two groups of feature matrixes, the obtained coefficient matrixes and the feature matrixes are subjected to mixed calculation, so that an initial registration matrix is calculated, then the initial registration matrix is optimally processed, a final registration matrix is obtained, then a transformation matrix is obtained according to the final registration matrix, and the initial transformation matrix is updated by using the obtained transformation matrix, so that the initial transformation matrix is used as an initial value of the point cloud change of the next frame until all frames of face image data are processed; therefore, in the 3D face point cloud reconstruction process, the required iteration coefficient is not preset manually, but is obtained from the self-learning of the neural network, so that the accuracy of the final point cloud reconstruction can be effectively improved, in addition, the initial point cloud is subjected to network downsampling through the first feature extraction network, and the calculated amount in the point cloud reconstruction and the curved surface reconstruction can be effectively reduced.
Preferably, the calculation model of the initial registration matrix M 0 is:
Where Δ F is the euclidean distance between F T and F T+1.
Preferably, in the step 1), the method for obtaining the initial point cloud from two adjacent frames of the face image includes:
Performing preliminary clipping on the face image, wherein the face image comprises an RGB image and a depth image, and inputting the selected two adjacent frames of face images into a face detection network based on a convolutional neural network so as to obtain a 2D face frame corresponding to the face detection network;
Based on the 2D face frame, the RGB images and the depth map coordinate systems of the two selected face images are aligned respectively, so that initial point clouds S T、ST+1 corresponding to the two face images are obtained.
Preferably, in the step 1), after the initial point cloud S T、ST+1 is established, the method further includes a step of clipping the face image again:
Taking the center of the 2D face frame as a midpoint, and cutting the initial point cloud S T、ST+1 according to a certain depth threshold; the depth threshold may be a preset prior value or an average value of depth data in the face image after preliminary clipping.
Preferably, when the face image is primarily cut, key points on the 2D face frame can be obtained through the face detection network; the 3D face point cloud reconstruction method further comprises the step of resampling the registered point cloud data:
7) Randomly selecting a point from key points in the 2D face frame as an initial point P 0(x0,y0,z0), taking the point P 0 as a center, and obtaining a sampling set S 1={Pi |i E (0, 1,2, …, N) by a furthest point sampling method for points in a certain threshold range;
8) Repeating the step 7) for the rest key points to obtain a sampling set S= { S j |j epsilon (0, 1,2, …, M) }; wherein M is the number of key points;
9) Obtaining k groups of sets by clustering the sample points except the set S through k-means, wherein k is more than 1;
10 Taking the set S as an initial point set, and respectively sampling k groups of sets to obtain a final sampling point set S ={St |t epsilon (0, 1,2, …, M+k).
Preferably, the method further comprises the step of carrying out point cloud enhancement on the resampled point cloud data:
The method comprises the steps of performing point-by-point feature extraction on a sampling point set S through a second feature extraction network based on a convolutional neural network, adding a weight factor mu in a feature extraction process to obtain feature-enhanced point cloud data f t, taking a preset value larger than 1 when an extracted feature data point is key point data, and taking a value of 1 when the extracted feature data point is non-key point data;
The point cloud data f t are respectively sent into two different sub-networks of the same convolutional neural network, and two groups of different parameters T S and D p are obtained through linear layer regression of the convolutional neural network;
f t is updated according to the following formula,
ft=TS×ft+DP
The invention also discloses a 3D face point cloud reconstruction system, which comprises an initial point cloud establishment unit and a point cloud registration unit; the initial point cloud establishing unit comprises an initial point cloud establishing module and a coordinate transformation module; the registration unit comprises an iteration coefficient generation module, a first feature extraction module, a calculation module, an optimization module and a first updating module;
the initial point cloud establishing module is used for processing two adjacent frames of face images selected from the video stream to obtain initial point clouds T、ST+1 corresponding to the two frames of face images respectively;
The coordinate transformation module is used for transforming the initial point cloud S T through a predefined initial transformation matrix { R i,Ti }, so as to obtain S T;
The iteration coefficient generation module is used for splicing and fusing the S T and the S T+1 and then sending the fused S T and the S T+1 into a coefficient prediction network which is based on a convolutional neural network and comprises a convolutional layer and a maximum pooling layer so as to obtain a group of iteration coefficient matrixes (alpha, beta);
The first feature extraction module is configured to send S T and S T+1 to a first feature extraction network based on a convolutional neural network, where the first feature extraction network is configured to perform network downsampling on S T and S T+1 to obtain two sets of feature matrices F T and F T+1;
the computing module is used for obtaining an initial registration matrix M 0 through a computing model, wherein the input parameters of the computing model are a group of iteration coefficient matrices (alpha, beta), and feature matrices F T and F T+1;
the optimization module is used for carrying out normalization processing on the M 0 through an optimization algorithm so as to obtain a final registration matrix M T;
The first updating module is configured to perform singular value decomposition on M T by using a decomposition algorithm to obtain a transformation matrix { R T,TT }, and update { R i,Ti } by using { R T,TT } as an initial value of the next frame point cloud transformation.
Preferably, the initial point cloud establishing unit further includes a preliminary clipping module, wherein the face image includes an RGB image and a depth image, and the preliminary clipping module is configured to perform face region detection on two adjacent frames of face images selected by a face detection network based on a convolutional neural network, so as to obtain a 2D face frame; the initial point cloud establishing module may align the RGB images of the two selected face images with the depth map coordinate system based on the 2D face frames cut by the preliminary cutting module, so as to obtain initial point clouds T、ST+1 corresponding to the two face images.
Preferably, the initial point cloud establishing unit further includes a re-clipping module, and after the initial point cloud S T、ST+1 is established, the re-clipping module is configured to clip the initial point cloud S T、ST+1 according to a certain depth threshold with the center of the 2D face frame as a midpoint; the depth threshold may be a preset prior value or an average value of depth data in the face image after preliminary clipping.
Preferably, the system further comprises a point cloud resampling unit, wherein the point cloud resampling unit comprises a first sampling module, a grouping module and a second sampling module;
The first sampling module is configured to randomly select a point from key points in the 2D face frame as an initial point P 0(x0,y0,z0, and obtain a sampling set S 1={Pi |i e (0, 1,2, …, N) } by using a furthest point sampling method for points within a certain threshold range with the point P 0 as a center, so as to obtain a sampling set s= { S j |j e (0, 1,2, …, M) }; wherein M is the number of key points, and the key points are obtained through a face detection network;
The grouping module is used for grouping the sample points outside the set S through k-means clustering to obtain k groups of sets, wherein k is greater than 1;
The second sampling module is configured to sample the k groups of sets with the set S as an initial point set, so as to obtain a final sampling point set S ={St |t e (0, 1,2, …, m+k).
Preferably, the system further comprises a point cloud enhancement unit, wherein the point cloud enhancement unit comprises a second feature extraction module, a parameter generation module and a second updating module;
The second feature extraction module is configured to perform point-by-point feature extraction on the sampling point set S through a second feature extraction network based on a convolutional neural network, and add a weight factor μ in a feature extraction process to obtain feature-enhanced point cloud data f t, where μ takes a preset value greater than 1 when the extracted feature data point is key point data, and where μ takes a value of 1 when the extracted feature data point is non-key point data;
The parameter generating module is configured to send the point cloud data f t generated by the second feature extracting module into two different sub-networks of the same convolutional neural network, and obtain two different sets of parameters T S and D P through linear layer regression of the convolutional neural network;
the second updating module is used for updating f t according to the following formula,
ft=TS×ft+DP
The invention also discloses a 3D face point cloud reconstruction system, which comprises one or more processors, a memory and one or more programs, wherein the one or more programs are stored in the memory and are configured to be executed by the one or more processors, and the programs comprise instructions for executing the 3D face point cloud reconstruction method.
In addition, the invention also discloses a computer readable storage medium, which comprises a computer program for testing, wherein the computer program can be executed by a processor to finish the 3D face point cloud reconstruction method.
Drawings
Fig. 1 is a flow chart of a method for reconstructing a 3D face point cloud according to an embodiment of the present invention.
Fig. 2 is a schematic flow chart of initial point cloud establishment and point cloud registration in an embodiment of the invention.
Fig. 3 is a schematic flow chart of point cloud resampling in an embodiment of the invention.
Fig. 4 is a schematic structural diagram of a 3D face point cloud reconstruction system according to an embodiment of the present invention.
Detailed Description
In order to describe the technical content, the constructional features, the achieved objects and effects of the present invention in detail, the following description is made in connection with the embodiments and the accompanying drawings.
As shown in fig. 1 and fig. 2, the invention discloses a 3D face point cloud reconstruction method, which comprises two stages of initial point cloud establishment and point cloud registration.
1. The initial point cloud establishment includes the steps of:
s1), acquiring a video stream comprising a plurality of frames of face images through 3D video equipment (such as a TOF camera), randomly selecting two adjacent frames of face images, and processing the two adjacent frames of face images respectively to obtain initial point clouds S T、ST+1 corresponding to the two frames of face images respectively.
2. The point cloud registration is a process of integrating the point cloud data under different view angles into a designated coordinate system in a unified way through rigid transformation such as rotation, translation and the like. In the changing process, a rotation matrix R and a translation matrix T are needed to be used, and the coincidence of the source point cloud and the target point cloud is realized through R, T transformation. Specifically, referring to fig. 1 and 2, the point cloud registration in the present embodiment includes the following steps:
S2), transforming the initial point cloud S T through a predefined initial transformation matrix { R i,Ti }, obtaining S T, then splicing and fusing S T and S T+1, and sending the fused S T and S T+1 into a coefficient prediction network which comprises a convolution layer and a maximum pooling layer and is based on a convolution neural network, so as to obtain a group of iteration coefficient matrixes (alpha, beta); it should be noted that, the specific role of the coefficient matrix (α, β) is common general knowledge in the art, that is, α is an iteration stop condition parameter, β is an iteration attenuation parameter, and these two parameters are manually preset values in the prior art, and only need to be called when in use, in this embodiment, the coefficient value in each iteration process is different through the learning dynamic generation of the point cloud data by a neural network (coefficient prediction network);
S3), respectively sending the S T and the S T+1 into a first feature extraction network based on a convolutional neural network, wherein the first feature extraction network is used for performing network downsampling on the S T and the S T+1 so as to obtain two groups of feature matrixes F T and F T+1; for network downsampling, it should be noted that: the processing of the data by the convolutional neural network comprises up-sampling and down-sampling, wherein the down-sampling is also called down-sampling, mainly causes the image to conform to the size of the display area and generates a thumbnail of the corresponding image; upsampling, also called image interpolation, is mainly used for amplifying images;
S4), obtaining an initial registration matrix M 0 through a calculation model, wherein the input parameters of the calculation model are a group of iteration coefficient matrices (alpha, beta) obtained in the step S4), and feature matrices F T and F T+1 obtained in the step S5);
S5), carrying out normalization processing on the M 0 to obtain a final registration matrix M T;
S6), adopting a decomposition algorithm (such as SVD algorithm) to decompose singular values of M T to obtain a transformation matrix { R T,TT }, and adopting { R T,TT } to update { R i,Ti } as an initial value of the next frame point cloud transformation until face images of all frames are processed.
Compared with the prior art, the method has two largest differences in the face reconstruction process, namely, iteration coefficients (alpha, beta) required in the point cloud reconstruction process are obtained in a network self-learning mode, so that the accuracy of final point cloud reconstruction and subsequent curved surface reconstruction can be effectively improved, the corresponding relation between the point pairs of two face images is not excessively relied, and the quality requirement on the initial point cloud is reduced; and secondly, the initial point cloud is subjected to network downsampling through a convolutional neural network (a first feature extraction network), so that the calculated amount in the point cloud reconstruction and the curved surface reconstruction is effectively reduced, and the hardware load is further reduced.
Preferably, the calculation model of the initial registration matrix M 0 is:
Where Δ F is the euclidean distance between F T and F T+1.
In the above embodiment, the face image acquired by the TOF camera includes an RGB image and a depth image. The RGB map is a planar color map, the depth map is an information image or image channel containing information about the surface distance of a scene object from a viewpoint, the depth map is similar to a gray scale image, except that each pixel value is the actual distance from a photosensitive sensor in a camera to an object, typically the RGB map and the depth map acquired by a TOF camera are paired, and thus there is a one-to-one correspondence between pixel points. Since the RGB image contains some background information, in the step S1), the step of performing preliminary cropping on the face image is further included before aligning the RGB image and the depth image coordinate system of the two frames of face images:
Inputting the selected two adjacent frames of face images into a face detection network based on a convolutional neural network, so as to obtain a 2D face frame corresponding to the face detection network;
Based on the 2D face frames, the RGB images and the depth map coordinate systems of the two selected face images are aligned respectively to obtain initial point clouds S T、ST+1 corresponding to the two face images respectively, so that some background information is cut off.
By the above preliminary clipping, points beyond the range of the face region are defaulted to be the background and the background region is clipped, and the initial point cloud is built only in the face region.
Since the detection result based on the RGB map has no depth information, a part of background information is still included in the initial point cloud. Therefore, in the above step S1), after the initial point cloud S T、ST+1 is established, the face image may also be cropped again, specifically:
and cutting the initial point cloud S T、ST+1 by taking the center of the 2D face frame as a midpoint according to a certain depth threshold. The depth threshold may be a preset prior value or an average value of depth data in the face image after preliminary clipping.
After registration, the reconstructed point clouds still have noise, so that further optimization can be realized, and the point clouds can be processed by means of resampling, as shown in fig. 1 and 3, namely, the 3D face point cloud reconstruction method further comprises a point cloud resampling stage, and specifically comprises the following steps:
S7), randomly selecting a point from key points in the 2D face frame as an initial point P 0(x0,y0,z0), taking the point P 0 as a center, and obtaining a sampling set S 1={Pi |i E (0, 1,2, …, N) by a furthest point sampling method for points in a certain threshold range; the key points in the face frame are convex and concave points on the face, such as nose tips, mouths, glasses and the like, and the key points on the 2D face frame can be output through a face detection network in the process of primarily cutting the face image;
s8), repeating the step S9 for the rest key points to obtain a sampling set S= { S j |j epsilon (0, 1,2, …, M) }; wherein M is the number of key points;
S9), clustering sample points except the set S through k-means to obtain k groups of sets, wherein k is greater than 1;
S10), taking the set S as an initial point set, and respectively sampling k groups of sets to obtain a final sampling point set S ={St |t epsilon (0, 1,2, …, M+k).
In the resampling process, key points on the face are taken as the center, preliminary sampling is carried out through the furthest point sampling method, then data of the preliminary sampling is taken as an initial point set, and secondary sampling is carried out on other points after the preliminary sampling through k-means clustering, so that noise data in the point cloud can be removed to the maximum extent, and meanwhile accuracy of the point cloud data obtained after the sampling is guaranteed.
In addition, in order to further improve the robustness of the reconstructed point cloud, the 3D face point cloud reconstruction method further comprises the step of carrying out point cloud enhancement on resampled point cloud data:
the method further comprises the step of carrying out point cloud enhancement on the resampled point cloud data:
The method comprises the steps of performing point-by-point feature extraction on a sampling point set S through a second feature extraction network based on a convolutional neural network, adding a weight factor mu in a feature extraction process to obtain feature-enhanced point cloud data f t, taking a preset value larger than 1 mu when an extracted feature data point P i(Pi∈S) is key point data, and taking a value of 1 mu when P i is non-key point data, namely f t←μ×Pi;
The point cloud data f t are respectively sent into two different sub-networks of the same convolutional neural network, and two groups of different parameters T S and D P are obtained through linear layer regression of the convolutional neural network;
f t is updated according to the following formula,
ft=TS×ft+DP
In addition, the parameters T S and D P in the present embodiment are not limited by practical meaning, but are just one data processing method of the point cloud data f t, and when f t is updated, D P may be multiplied by f t, that is, f t=ft×DP+TS.
In the point cloud enhancement method, the point cloud data are updated in a self-learning mode, and meanwhile, different weights are given to the key point data and the non-key point data for different treatment, so that the robustness of the finally formed point cloud data set { f t } is effectively improved.
In summary, the 3D face point cloud reconstruction method disclosed by the invention comprises four stages of initial point cloud establishment, point cloud registration, point cloud resampling and point cloud enhancement. In the point cloud registration stage, coefficients required by point cloud registration are dynamically generated in a neural network learning mode, so that the generated point cloud data are more accurate and do not depend on corresponding relations in point pairs too much; meanwhile, feature matrixes of two adjacent frames of point clouds are generated in a down-sampling mode of the neural network, euclidean distance between the two feature matrixes is obtained, and therefore calculation amount of the point clouds is reduced, and the condition that the point clouds can be processed in real time is basically achieved. In addition, the noise of the reconstructed point cloud is further reduced through the point cloud resampling, and the robustness of the reconstructed point cloud is improved through the point cloud enhancement mode, so that the method is better applied to the subsequent curved surface reconstruction.
The invention also discloses a 3D face point cloud reconstruction system, as shown in fig. 4, which comprises an initial point cloud establishment unit, a point cloud registration unit, a point cloud resampling unit and a point cloud enhancement unit.
The initial point cloud establishing unit comprises a basic data acquiring module and an initial point cloud establishing module. The registration unit comprises a coordinate transformation module, an iteration coefficient generation module, a first feature extraction module, a calculation module, an optimization module and a first updating module.
And the initial point cloud establishing module is used for processing two adjacent frames of face images selected from the video stream to obtain initial point clouds S T、ST+1 corresponding to the two frames of face images respectively.
The coordinate transformation module is used for transforming the initial point cloud S T through a pre-defined initial transformation matrix { R i,Ti }, and obtaining S T.
The iteration coefficient generation module is used for splicing and fusing the S T and the S T+1 and then sending the fused S T and the S T+1 into a coefficient prediction network which is based on a convolutional neural network and comprises a convolutional layer and a maximum pooling layer so as to obtain a group of iteration coefficient matrixes (alpha, beta);
The first feature extraction module is used for sending the S T and the S T+1 into a first feature extraction network based on a convolutional neural network respectively, and the first feature extraction network is used for performing network downsampling on the S T and the S T+1 so as to obtain two groups of feature matrixes F T and F T+1.
The calculation module is used for obtaining an initial registration matrix M 0 through a calculation model, wherein the input parameters of the calculation model are a set of iteration coefficient matrices (alpha, beta), and feature matrices F T and F T+1.
And the optimization module is used for carrying out normalization processing on the M 0 through an optimization algorithm so as to obtain a final registration matrix M T.
The first updating module is used for carrying out singular value decomposition on M T by adopting a decomposition algorithm to obtain a transformation matrix { R T,TT }, and updating { R i,Ti } by adopting { R T,TT } to serve as an initial value of the next frame point cloud transformation.
Preferably, the initial point cloud establishing unit further includes a preliminary clipping module, wherein the face image includes an RGB image and a depth image, and the preliminary clipping module is configured to perform face region detection on two adjacent frames of face images selected by a face detection network based on a convolutional neural network, so as to obtain a 2D face frame; the initial point cloud establishing module may align the RGB images of the two selected face images with the depth map coordinate system based on the 2D face frames cut by the preliminary cutting module, so as to obtain initial point clouds T、ST+1 corresponding to the two face images.
Further, the initial point cloud establishing unit further includes a re-clipping module, and when the initial point cloud S T、ST+1 is established, the re-clipping module is configured to clip the initial point cloud S T、ST+1 according to a certain depth threshold with the center of the 2D face frame as a midpoint; the depth threshold may be a preset prior value or an average value of depth data in the face image after preliminary clipping.
The point cloud resampling unit comprises a first sampling module, a grouping module and a second sampling module.
The first sampling module is used for randomly selecting a point from key points in the 2D face frame as an initial point P 0(x0,y0,z0), taking the point P 0 as a center, and obtaining a sampling set S 1={Pi |i epsilon (0, 1,2, …, N) by a furthest point sampling method for the points in a certain threshold range, so as to obtain a sampling set S= { S j |j epsilon (0, 1,2, …, M) of all the key points; wherein M is the number of key points, and the key points are obtained through a face detection network;
and the grouping module is used for grouping the sample points outside the set S through k-means clustering to obtain k groups of sets, wherein k is greater than 1.
And the second sampling module is used for taking the set S as an initial point set and respectively sampling k groups of sets to obtain a final sampling point set S ={St |t epsilon (0, 1,2, …, M+k).
The point cloud enhancement unit comprises a second feature extraction module, a parameter generation module and a second updating module.
The second feature extraction module is configured to perform point-by-point feature extraction on the sampling point set S through a second feature extraction network based on a convolutional neural network, and add a weight factor μ in a feature extraction process to obtain feature-enhanced point cloud data f t, where μ takes a preset value greater than 1 when the extracted feature data point is key point data, and where μ takes a value of 1 when the extracted feature data point is non-key point data.
The parameter generation module is used for respectively sending the point cloud data f t generated by the second feature extraction module into two different sub-networks of the same convolutional neural network, and obtaining two groups of different parameters T S and D P through linear layer regression of the convolutional neural network;
the second updating module is used for updating f t according to the following formula,
ft=TS×ft+DP
Regarding the working principle and the working flow of the 3D face point cloud reconstruction system, details of the 3D face point cloud reconstruction method are described in detail herein and are not described in detail.
In addition, the invention also discloses another 3D face point cloud reconstruction system, which comprises one or more processors, a memory and one or more programs, wherein the one or more programs are stored in the memory and are configured to be executed by the one or more processors, and the programs comprise instructions for executing the 3D face point cloud reconstruction method.
In addition, the invention also discloses a computer readable storage medium, which comprises a computer program for testing, wherein the computer program can be executed by a processor to finish the 3D face point cloud reconstruction method.
The foregoing description of the preferred embodiments of the present invention is not intended to limit the scope of the claims, which follow, as defined in the claims.

Claims (12)

1. The 3D face point cloud reconstruction method is characterized by comprising the following steps of:
1) Selecting two adjacent frames of face images from a video stream comprising multiple frames of face images, and processing to obtain initial point clouds S T、ST+1 corresponding to the two frames of face images respectively;
2) Then transforming the initial point cloud S T through a predefined initial transformation matrix { R i,Ti }, obtaining S ' T, splicing and fusing S ' T and S T+1, and sending the fused S ' T and S T+1 into a coefficient prediction network which comprises a convolution layer and a maximum pooling layer and is based on a convolution neural network, so as to obtain a group of iteration coefficient matrixes (alpha, beta);
3) Respectively sending S 'T and S T+1 into a first feature extraction network based on a convolutional neural network, wherein the first feature extraction network is used for performing network downsampling on S' T and S T+1 so as to obtain two groups of feature matrices F T and F T+1;
4) Obtaining an initial registration matrix M 0 by a calculation model, wherein the input parameters of the calculation model are a group of iteration coefficient matrices (alpha, beta), and feature matrices F T and F T+1;
5) Normalizing the initial registration matrix M 0 to obtain a final registration matrix M T;
6) Performing singular value decomposition on M T by adopting a decomposition algorithm to obtain a transformation matrix { R T,TT }, and updating { R i,Ti } by adopting { R T,TT } to serve as an initial value of the next frame point cloud transformation;
The calculation model of the initial registration matrix M 0 is:
Where Δ F is the euclidean distance between F T and F T+1.
2. The method of reconstructing a 3D face point cloud according to claim 1, wherein in the step 1), the method of obtaining the initial point cloud from two adjacent frames of the face images includes:
Performing preliminary clipping on the face image, wherein the face image comprises an RGB image and a depth image, and inputting the selected two adjacent frames of face images into a face detection network based on a convolutional neural network so as to obtain a 2D face frame corresponding to the face detection network;
Based on the 2D face frame, the RGB images and the depth map coordinate systems of the two selected face images are aligned respectively, so that initial point clouds S T、ST+1 corresponding to the two face images are obtained.
3. The method of reconstructing a 3D face point cloud according to claim 2, wherein in the step 1), after the initial point cloud S T、ST+1 is established, the method further comprises a step of cropping the face image again:
Taking the center of the 2D face frame as a midpoint, and cutting the initial point cloud S T、ST+1 according to a certain depth threshold; the depth threshold may be a preset prior value or an average value of depth data in the face image after preliminary clipping.
4. The 3D face point cloud reconstruction method according to claim 2, wherein key points on the 2D face frame are also obtained through the face detection network when the face image is primarily cut; the 3D face point cloud reconstruction method further comprises the step of resampling the registered point cloud data:
7) Randomly selecting a point from key points in the 2D face frame as an initial point P 0(x0,y0,z0), taking the point P 0 as a center, and obtaining a sampling set S 1={Pi |i epsilon (0, 1,2, the..N) by a furthest point sampling method for points in a certain threshold range; wherein N is the number of samples;
8) Repeating the step 7) for the rest key points to obtain a sampling set S= { S j |j e (0, 1,2,., M) }; wherein M is the number of key points;
9) Obtaining k groups of sets of the sample points except the set S through k-means clustering, wherein k is more than 1;
10 Taking the set S as an initial point set, and respectively sampling the k groups of sets to obtain a final sampling point set S' = { S t |t e (0, 1, 2.., m+k) }.
5. The method for reconstructing a 3D face point cloud of claim 4, further comprising the step of performing point cloud enhancement on the resampled point cloud data:
the method comprises the steps of performing point-by-point feature extraction on a sampling point set S' through a second feature extraction network based on a convolutional neural network, adding a weight factor mu in a feature extraction process to obtain feature-enhanced point cloud data f t, taking a preset value larger than 1 when an extracted feature data point is key point data, and taking a value of 1 when the extracted feature data point is non-key point data;
The point cloud data f t are respectively sent into two different sub-networks of the same convolutional neural network, and two groups of different parameters T S and D P are obtained through linear layer regression of the convolutional neural network;
f t is updated according to the following formula,
ft=TS×ft+DP
6. The 3D face point cloud reconstruction system is characterized by comprising an initial point cloud establishment unit and a point cloud registration unit; the initial point cloud establishing unit comprises an initial point cloud establishing module; the registration unit comprises a coordinate transformation module, an iteration coefficient generation module, a first feature extraction module, a calculation module, an optimization module and a first updating module;
the initial point cloud establishing module is used for processing two adjacent frames of face images selected from the video stream to obtain initial point clouds T、ST+1 corresponding to the two frames of face images respectively;
The coordinate transformation module is used for transforming the initial point cloud S T through a pre-defined initial transformation matrix { R i,Ti }, so as to obtain S' T;
The iteration coefficient generation module is used for splicing and fusing the S 'T and the S T+1 and then sending the fused S' T and the S T+1 into a coefficient prediction network which is based on a convolutional neural network and comprises a convolutional layer and a maximum pooling layer so as to obtain a group of iteration coefficient matrixes (alpha, beta);
The first feature extraction module is configured to send S 'T and S T+1 to a first feature extraction network based on a convolutional neural network, where the first feature extraction network is configured to perform network downsampling on S' T and S T+1 to obtain two sets of feature matrices F T and F T+1;
The computing module is used for pushing out an initial registration matrix M 0 through a computing model, wherein the input parameters of the computing model are a group of iteration coefficient matrices (alpha, beta), and feature matrices F T and F T+1;
the optimization module is used for carrying out normalization processing on the M 0 through an optimization algorithm so as to obtain a final registration matrix M T;
The first updating module is configured to perform singular value decomposition on M T by using a decomposition algorithm to obtain a transformation matrix { R T,TT }, and update { R i,Ti } by using { R T,TT } as an initial value of the next frame point cloud transformation.
7. The 3D face point cloud reconstruction system according to claim 6, wherein the initial point cloud establishing unit further includes a preliminary cropping module, the face image includes an RGB image and a depth image, and the preliminary cropping module is configured to perform face region detection on the two adjacent frames of face images selected by the face detection network based on the convolutional neural network, so as to obtain a 2D face frame; the initial point cloud establishing module may align the RGB images of the two selected face images with the depth map coordinate system based on the 2D face frames cut by the preliminary cutting module, so as to obtain initial point clouds T、ST+1 corresponding to the two face images.
8. The 3D face point cloud reconstruction system according to claim 7, wherein the initial point cloud establishing unit further includes a re-clipping module, and when the initial point cloud S T、ST+1 is established, the re-clipping module is configured to clip the initial point cloud S T、ST+1 according to a certain depth threshold with the center of the 2D face frame as a midpoint; the depth threshold may be a preset prior value or an average value of depth data in the face image after preliminary clipping.
9. The 3D face point cloud reconstruction system of claim 7, further comprising a point cloud resampling unit comprising a first sampling module, a grouping module, and a second sampling module;
The first sampling module is configured to randomly select a point from key points in the 2D face frame as an initial point P 0(x0,y0,z0), and obtain a sampling set S 1={Pi |i e (0, 1,2, N) } by using a furthest point sampling method with the point P 0 as a center, so as to obtain a sampling set s= { S j |j e (0, 1,2, M) }; wherein N is the number of samples, M is the number of key points, and the key points are obtained through a face detection network;
The grouping module is used for grouping the sample points outside the set S through k-means clustering to obtain k groups of sets, wherein k is more than 1;
The second sampling module is configured to sample the k groups of sets with the set S as an initial point set, so as to obtain a final sampling point set S' = { S t |t e (0, 1,2, a., m+k) }.
10. The 3D face point cloud reconstruction system of claim 9, further comprising a point cloud enhancement unit comprising a second feature extraction module, a parameter generation module, and a second update module;
The second feature extraction module is configured to perform point-by-point feature extraction on the sampling point set S' through a second feature extraction network based on a convolutional neural network, and add a weight factor μ in a feature extraction process to obtain feature-enhanced point cloud data f t, where μ takes a preset value greater than 1 when the extracted feature data point is key point data, and where μ takes a value of 1 when the extracted feature data point is non-key point data;
The parameter generating module is configured to send the point cloud data f t generated by the second feature extracting module into two different sub-networks of the same convolutional neural network, and obtain two different sets of parameters T S and D P through linear layer regression of the convolutional neural network;
the second updating module is used for updating f t according to the following formula,
ft=TS×ft+DP
11. A 3D face point cloud reconstruction system, comprising:
One or more processors;
A memory;
and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, the programs comprising instructions for performing the 3D face point cloud reconstruction method of any of claims 1 to 5.
12. A computer readable storage medium comprising a computer program for testing, the computer program being executable by a processor to perform the 3D face point cloud reconstruction method of any of claims 1 to 5.
CN202010834329.3A 2020-08-18 3D face point cloud reconstruction method and system Active CN112069923B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010834329.3A CN112069923B (en) 2020-08-18 3D face point cloud reconstruction method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010834329.3A CN112069923B (en) 2020-08-18 3D face point cloud reconstruction method and system

Publications (2)

Publication Number Publication Date
CN112069923A CN112069923A (en) 2020-12-11
CN112069923B true CN112069923B (en) 2024-07-12

Family

ID=

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109242951A (en) * 2018-08-06 2019-01-18 宁波盈芯信息科技有限公司 A kind of face's real-time three-dimensional method for reconstructing
CN109360267A (en) * 2018-09-29 2019-02-19 杭州蓝芯科技有限公司 A kind of thin objects quick three-dimensional reconstructing method

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109242951A (en) * 2018-08-06 2019-01-18 宁波盈芯信息科技有限公司 A kind of face's real-time three-dimensional method for reconstructing
CN109360267A (en) * 2018-09-29 2019-02-19 杭州蓝芯科技有限公司 A kind of thin objects quick three-dimensional reconstructing method

Similar Documents

Publication Publication Date Title
CN112233038B (en) True image denoising method based on multi-scale fusion and edge enhancement
KR102319177B1 (en) Method and apparatus, equipment, and storage medium for determining object pose in an image
CN110348330B (en) Face pose virtual view generation method based on VAE-ACGAN
CN109410127B (en) Image denoising method based on deep learning and multi-scale image enhancement
CN110427968B (en) Binocular stereo matching method based on detail enhancement
WO2021129569A1 (en) Human action recognition method
CN109035172B (en) Non-local mean ultrasonic image denoising method based on deep learning
CN110070517B (en) Blurred image synthesis method based on degradation imaging mechanism and generation countermeasure mechanism
CN111932577B (en) Text detection method, electronic device and computer readable medium
CN112288788A (en) Monocular image depth estimation method
CN117392496A (en) Target detection method and system based on infrared and visible light image fusion
CN110390724B (en) SLAM method with instance segmentation
CN113706407B (en) Infrared and visible light image fusion method based on separation characterization
WO2020087434A1 (en) Method and device for evaluating resolution of face image
CN112069923B (en) 3D face point cloud reconstruction method and system
CN115330874B (en) Monocular depth estimation method based on superpixel processing shielding
CN116309213A (en) High-real-time multi-source image fusion method based on generation countermeasure network
CN113781368B (en) Infrared imaging device based on local information entropy
CN114529455A (en) Task decoupling-based parameter image super-resolution method and system
CN114155406A (en) Pose estimation method based on region-level feature fusion
CN112069923A (en) 3D face point cloud reconstruction method and system
CN114066760A (en) Image denoising method, network model training method, device, medium, and apparatus
CN113240589A (en) Image defogging method and system based on multi-scale feature fusion
CN111985535A (en) Method and device for optimizing human body depth map through neural network
CN110189272B (en) Method, apparatus, device and storage medium for processing image

Legal Events

Date Code Title Description
PB01 Publication
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Address after: Room 101, No. 1, East Ring 3rd Street, Jitiagang, Huangjiang Town, Dongguan City, Guangdong Province, 523000

Applicant after: Guangdong Zhengyang Sensor Technology Co.,Ltd.

Address before: 523000 Jitigang Village, Huangjiang Town, Dongguan City, Guangdong Province

Applicant before: DONGGUAN ZHENGYANG ELECTRONIC MECHANICAL Co.,Ltd.

GR01 Patent grant