CN113837053A - Biological face alignment model training method, biological face alignment method and device - Google Patents

Biological face alignment model training method, biological face alignment method and device Download PDF

Info

Publication number
CN113837053A
CN113837053A CN202111097168.5A CN202111097168A CN113837053A CN 113837053 A CN113837053 A CN 113837053A CN 202111097168 A CN202111097168 A CN 202111097168A CN 113837053 A CN113837053 A CN 113837053A
Authority
CN
China
Prior art keywords
point cloud
biological
face
network
biological face
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111097168.5A
Other languages
Chinese (zh)
Other versions
CN113837053B (en
Inventor
涂弘德
张为义
罗士杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fujian Cook Intelligent Technology Co ltd
Original Assignee
Fujian Cook Intelligent Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fujian Cook Intelligent Technology Co ltd filed Critical Fujian Cook Intelligent Technology Co ltd
Priority to CN202111097168.5A priority Critical patent/CN113837053B/en
Publication of CN113837053A publication Critical patent/CN113837053A/en
Application granted granted Critical
Publication of CN113837053B publication Critical patent/CN113837053B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/50Image enhancement or restoration by the use of more than one image, e.g. averaging, subtraction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10028Range image; Depth image; 3D point clouds

Abstract

A biological face alignment model training method, a biological face alignment method and a biological face alignment device are provided, and the biological face alignment method comprises the following steps: acquiring a first 3D point cloud, wherein the first 3D point cloud is a single non-frontal biological face 3D point cloud; performing correction processing on the first 3D point cloud by using a 3D graph convolution network to obtain a second 3D point cloud; carrying out symmetrical filling processing on the second 3D point cloud to obtain a third 3D point cloud; and generating a confrontation network by utilizing the convolution of the 3D image, and performing hole filling processing on the third 3D point cloud to obtain a fourth 3D point cloud, wherein the fourth 3D point cloud is the 3D point cloud of the front biological face without the broken hole. The biological face alignment method can obtain the 3D point cloud on the front of the non-broken-hole biological face which is particularly similar to the 3D point cloud on the side face of the large-angle biological face, and the 3D point cloud on the front of the non-broken-hole biological face is free from deformation, so that the identification accuracy of the subsequent application of the 3D point cloud on the non-broken-hole front face image is improved.

Description

Biological face alignment model training method, biological face alignment method and device
Technical Field
The present application relates to the field of biometric feature detection technologies, and in particular, to a biometric face alignment model training method, a biometric face alignment method, and a biometric face alignment apparatus.
Background
Currently, biological facial images can be applied to many fields, especially to applications to which the human face is pushed, and there are many applications that utilize the commercial value of human facial images, such as face recognition (face recognition), expression recognition, emotion recognition, and the like. The face recognition is a biological detection and recognition technology for identity recognition based on face feature information of a person. The method comprises the steps of collecting images or video streams containing human faces by using a camera or a camera, automatically detecting and tracking the human faces in the images, and further performing a series of related technologies such as image preprocessing, image feature extraction, matching and recognition of the detected human faces, wherein the related technologies are generally called portrait recognition or facial recognition. With the rapid development of computer and network technologies, face recognition technology has been widely applied in many industries and fields such as intelligent access control, mobile terminal, public security, entertainment, military and the like.
Face recognition, expression recognition and emotion recognition mostly need to use the front face image to carry out subsequent application, and the face has the characteristic of many gestures, for example, the face image to be recognized is a non-front face image, and if the non-front face image is directly used for recognition, certain influence is caused on the recognition result.
Therefore, how to obtain a frontal image according to a non-frontal image to improve the accuracy of subsequent biological face recognition is a problem to be solved urgently.
Disclosure of Invention
The embodiment of the application provides a training method of a biological face alignment model, a biological face alignment method and a biological face alignment device, which can obtain a front face image according to a non-front face image so as to improve the accuracy of subsequent biological face recognition.
In a first aspect, a method for aligning a biological face is provided, including: acquiring a first 3D point cloud, wherein the first 3D point cloud is a single non-frontal biological face 3D point cloud; performing correction processing on the first 3D point cloud by using a 3D graph convolution network to obtain a second 3D point cloud; carrying out symmetrical filling processing on the second 3D point cloud to obtain a third 3D point cloud; and generating a confrontation network by utilizing the convolution of the 3D image, and performing hole filling processing on the third 3D point cloud to obtain a fourth 3D point cloud, wherein the fourth 3D point cloud is the 3D point cloud of the front biological face without the broken hole.
In the technical scheme, the 3D point cloud of the large-angle biological face is corrected, and according to the corrected 3D point cloud, the 3D point cloud of the front of the biological face with a plurality of broken holes is obtained through preliminary symmetrical filling. And taking the preliminarily supplemented 3D point cloud of the front face as a comparison point cloud, so that the generated 3D point cloud of the non-broken-hole front face image is similar to the comparison point cloud in characteristics, and the identification accuracy of the subsequent application of the non-broken-hole front face 3D point cloud is improved.
In one possible embodiment, the transforming the first 3D point cloud using a 3D graph convolution network to obtain a second 3D point cloud comprises: and processing the first 3D point cloud by using a 3D graph convolution network to obtain a first rotation parameter, wherein the first rotation parameter is used for correcting the first 3D point cloud.
According to the technical scheme, the rotating parameters corresponding to the 3D point cloud of the face needing to be corrected can be obtained through the trained 3D graph convolution network, the position information of the first 3D point cloud is rotated according to the rotating parameters, and the second 3D point cloud after being corrected is not deformed compared with affine transformation due to the fact that the rotating parameters are transformed into rigid transformation, so that the real reliability of the corrected face 3D point cloud is improved.
In one possible embodiment, the inverting process is performed on the first 3D point cloud to obtain a second 3D point cloud, including: correcting the position information of the first 3D point cloud through the first rotation parameter; the second 3D point cloud includes location information of the rectified first 3D point cloud.
In one possible embodiment, the method further includes that the color information of the first 3D point cloud remains unchanged, and the second 3D point cloud includes the corrected position information of the first 3D point cloud and the color information of the first 3D point cloud corresponding to the position information.
In one possible embodiment, the symmetrically filling the second 3D point cloud to obtain a third 3D point cloud includes: extracting a partial point cloud of the second 3D point cloud; and carrying out symmetrical processing on part of the second 3D point cloud to obtain a third 3D point cloud.
According to the technical scheme, the biological face 3D point cloud after alignment is obtained in a symmetrical alignment mode, the whole integrity coordination or authenticity of the 3 point cloud after symmetrical alignment is guaranteed, and then the similarity of the subsequent hole-free biological face 3D point cloud and the non-frontal biological face 3D point cloud is guaranteed. Compared with the prior knowledge, the complete front biological face image is obtained by presuming the invisible area according to the visible area, namely the complete front biological face is presumed according to the non-front biological face image, and the technical scheme of the application is more accurate.
In one possible embodiment, generating a countermeasure network by using 3D graph convolution, and performing hole filling processing on the third 3D point cloud to obtain a fourth 3D point cloud includes: acquiring target hidden data; generating a target synthesized 3D point cloud according to the target hidden data; and judging whether the target synthesized 3D point cloud is similar to the third 3D point cloud or not to obtain a fourth 3D point cloud.
According to the technical scheme, the 3D point cloud of the hole-free biological face is determined by comparing the similarity between the target synthesized 3D point cloud and the preliminarily supplemented 3D point cloud, the similarity between the output 3D point cloud and the non-positive biological face 3D point cloud can be guaranteed, and the identification accuracy of subsequent application is improved.
In one possible embodiment, generating a target synthesized 3D point cloud from target hidden data includes: the target hidden data are used as the input of a 3D image convolution generating network, a target is output to synthesize a 3D point cloud, and the 3D image convolution generating countermeasure network comprises a 3D image convolution generating network; wherein, the graph convolution layer of the network is generated by utilizing the 3D graph convolution, and the characteristics of the target hidden data are extracted.
In one possible embodiment, determining whether the target synthesized 3D point cloud and the third 3D point cloud are similar to obtain a fourth 3D point cloud includes: and judging whether the target synthesized 3D point cloud is similar to the third 3D point cloud or not by judging whether the loss function value corresponding to the loss function between the target synthesized 3D point cloud and the third 3D point cloud is less than or equal to the similarity threshold corresponding to the loss function.
In one possible embodiment, the loss function includes at least one of: a target synthesizes a distance absolute error between the 3D point cloud and the third 3D point cloud; similarity errors of the target synthesis 3D point cloud and the third 3D point cloud between layers of the same biological face recognition model; similarity errors between the target synthesized 3D point cloud and the third 3D point cloud between layer features of the point set neural network; and errors between the target synthesized 3D point cloud and the third 3D point cloud in the key point locations of the biological face.
In the technical scheme of the application, the similarity between the two point clouds is judged from the overall distribution condition, the contour appearance and the face key position of the target synthesis 3D point cloud and the third 3D point cloud through the four loss functions, so that the similarity between the output fourth 3D point cloud and the input third 3D point cloud can be effectively ensured in multiple aspects, and the fourth 3D point cloud is more real.
In one possible embodiment, determining whether the target synthesized 3D point cloud and the third 3D point cloud are similar comprises: if the loss value corresponding to the loss function is larger than the similarity threshold value and the target synthesized 3D point cloud is not similar to the third 3D point cloud, updating target hidden data; or the loss value corresponding to the loss function is smaller than or equal to the similarity threshold, the target synthesized 3D point cloud is similar to the third 3D point cloud, the target synthesized 3D point cloud is a fourth 3D point cloud, and the fourth 3D point cloud is output.
In one possible embodiment, updating the target hidden data includes: and updating the target hidden data according to a gradient descent method.
According to the technical scheme, the hidden dimensionality and the hidden relation of the control data can be expressed by the target hidden data, the attribute of the biological face can be reflected by the target synthesized 3D point cloud generated by the target hidden data, and the target synthesized 3D point cloud generated by the updated target hidden data is similar to the biological face 3D point cloud with the broken hole through iterative updating of the target hidden data, so that the accuracy of subsequent biological face identification application is improved.
In a second aspect, a method for training a biological face alignment model is provided, the method including: acquiring first training data, wherein the first training data comprises position information of a plurality of biological face 3D point clouds and rotation parameters corresponding to each biological face 3D point cloud; and training the first training data by utilizing a 3D graph convolution network to obtain a first biological face alignment model, wherein the output of the first biological face alignment model is a first rotation parameter, and the first rotation parameter is used for correcting the non-frontal biological face 3D point cloud.
In the technical scheme of this application, through the driver's appearance biological face alignment model that 3D picture convolution network training obtained, can output first rotation parameter to realize that non-positive biological face changes right the processing through the rigidity conversion, make the non-positive biological face after changeing right no deformation, guaranteed the true reliability of 3D point cloud.
In one possible embodiment, training the first training data to obtain a first biological face alignment model using a 3D graph convolution network includes: and extracting the characteristics of the first training data through the 3D map convolution layer corresponding to the 3D map convolution network.
According to the technical scheme, due to the disorder of the 3D point cloud data structure, the feature extraction of the 3D point cloud data can be realized by performing feature extraction on training data through the graph convolution layer, the depth information in the 3D point cloud data is considered, so that the subsequently obtained 3D point cloud data of the face image without the broken hole has the depth information, and the true reliability of the depth information is ensured. In addition, the amount of computation required to extract the 3D point cloud features by the map convolution layer is much smaller than that required to extract the 3D point cloud features by the full connected layer.
In one possible implementation, the first training data is normalized multiple biological face 3D point cloud position information and rotation parameters corresponding to each biological face 3D point cloud.
According to the technical scheme, the normalized first training data serve as model training data, the convergence rate of the 3D graph convolution network can be improved, and then the efficiency of model training is improved.
In a third aspect, a method for training a biological face alignment model is provided, the method including: acquiring second training data, wherein the second training data comprises a plurality of groups of biological face feature vectors and real biological face 3D point clouds; and generating a countermeasure network according to the 3D image convolution, training second training data to obtain a second biological face alignment model, wherein the 3D image convolution generation countermeasure network comprises a 3D image convolution generation network and a 3D image convolution judgment network, and the 3D image convolution generation network in the second biological face alignment model is used for generating a target synthetic 3D point cloud.
In the technical scheme of this application, 3D picture convolution generates the network to the resisting including 3D picture convolution generates network and 3D picture convolution judges the network, draws the characteristic of second training data through the picture convolution, compares in full tie-layer extraction characteristic, and the operand is showing and descends, compares in the convolution layer and draws the characteristic, and the picture convolution layer is more suitable for the characteristic of drawing the 3D point cloud to improve the reliability of target synthesis 3D point cloud.
In one possible embodiment, the generation of the countermeasure network from the 3D volume of the graph, the training of the second training data, comprises: inputting a plurality of groups of biological facial feature vectors into a 3D image convolution generation network to obtain a 3D point cloud to be verified; inputting the 3D point cloud of the real biological face and the 3D point cloud to be verified into a 3D image convolution judgment network so as to respectively obtain a first probability and a second probability, wherein the first probability is the probability that the 3D point cloud to be verified is true, the second probability is the probability that the 3D point cloud of the real biological face is true, and the first probability and the second probability are used for obtaining a loss function for training the 3D image convolution judgment network; the first probability is used to obtain a loss function for training the 3D map convolution generation network.
In one possible implementation, inputting a plurality of sets of biometric facial feature vectors into a 3D atlas generating network to obtain a 3D point cloud to be verified includes: extracting features of a plurality of groups of biological facial feature vectors through a 3D image convolution layer corresponding to a 3D image convolution generation network to obtain a 3D point cloud to be verified; inputting the 3D point cloud of the real biological face and the 3D point cloud to be verified into a 3D image convolution judgment network to respectively obtain a first probability and a second probability, wherein the method comprises the following steps: and extracting the characteristics of the 3D point cloud of the real biological face and the 3D point cloud to be verified through the 3D image convolution layer corresponding to the 3D image convolution judging network so as to obtain a first probability and a second probability.
In a fourth aspect, an apparatus for aligning a biological face is provided, the apparatus includes an obtaining unit and a processing unit, the processing unit includes a biological face correcting module, a biological face symmetrical module and a biological face hole filling module; the acquisition unit is used for acquiring a first 3D point cloud, wherein the first 3D point cloud is a single non-frontal biological face 3D point cloud; the biological face correcting module is used for correcting the first 3D point cloud by using a 3D image convolution network to obtain a second 3D point cloud; the biological face symmetry module is used for carrying out symmetrical filling processing on the second 3D point cloud so as to obtain a third 3D point cloud; and the biological face hole filling module generates a confrontation network by utilizing the 3D image convolution, and performs hole filling processing on the third 3D point cloud to obtain a fourth 3D point cloud, wherein the fourth 3D point cloud is the 3D point cloud of the non-broken front biological face.
In the technical scheme, the 3D point cloud of the large-angle biological face is corrected, and according to the corrected 3D point cloud, the 3D point cloud of the front of the biological face with a plurality of broken holes is obtained through preliminary symmetrical filling. And taking the preliminarily supplemented 3D point cloud of the front face as a comparison point cloud, so that the generated 3D point cloud of the non-broken-hole front face image is similar to the comparison point cloud in characteristics, and the identification accuracy of the subsequent application of the non-broken-hole front face 3D point cloud is improved.
In one possible embodiment, the biological face rectifying module performs a rectifying process on the first 3D point cloud by using a 3D atlas to obtain a second 3D point cloud, including: the biological face correcting module processes the first 3D point cloud by using a 3D image convolution network to obtain a first rotation parameter, and the first rotation parameter is used for correcting the first 3D point cloud.
According to the technical scheme, the rotating parameters corresponding to the 3D point cloud of the face needing to be corrected can be obtained through the trained 3D graph convolution network, the position information of the first 3D point cloud is rotated according to the rotating parameters, and the second 3D point cloud after being corrected is not deformed compared with affine transformation due to the fact that the rotating parameters are transformed into rigid transformation, so that the real reliability of the corrected face 3D point cloud is improved.
In one possible embodiment, the inverting process is performed on the first 3D point cloud to obtain a second 3D point cloud, including: the biological face correcting module is used for correcting the position information of the first 3D point cloud through the first rotation parameter; the second 3D point cloud includes location information of the rectified first 3D point cloud.
In a possible implementation manner, the apparatus further includes that the color information of the first 3D point cloud is kept unchanged, and the second 3D point cloud includes the corrected position information of the first 3D point cloud and the color information of the first 3D point cloud corresponding to the position information.
In one possible embodiment, the biological face symmetry module is configured to symmetrically patch the second 3D point cloud to obtain a third 3D point cloud, and includes: the biological face symmetry module is used for extracting part of point cloud of the second 3D point cloud; and the biological face symmetry module is used for carrying out symmetry processing on part of the point cloud of the second 3D point cloud so as to obtain a third 3D point cloud.
According to the technical scheme, the biological face 3D point cloud after alignment is obtained in a symmetrical alignment mode, the whole integrity coordination or authenticity of the 3 point cloud after symmetrical alignment is guaranteed, and then the similarity of the subsequent hole-free biological face 3D point cloud and the non-frontal biological face 3D point cloud is guaranteed. Compared with the prior knowledge, the complete front biological face image is obtained by presuming the invisible area according to the visible area, namely the complete front biological face is presumed according to the non-front biological face image, and the technical scheme of the application is more accurate.
In one possible implementation, the biological face hole filling module, which generates a confrontation network by using 3D image convolution, performs hole filling processing on the third 3D point cloud to obtain a fourth 3D point cloud, and includes: the acquisition unit is used for acquiring target hidden data; the biological face hole filling module is used for generating target synthesis 3D point cloud according to the target hidden data; and judging whether the target synthesized 3D point cloud is similar to the third 3D point cloud or not to obtain a fourth 3D point cloud.
According to the technical scheme, the 3D point cloud of the hole-free biological face is determined by comparing the similarity between the target synthesized 3D point cloud and the preliminarily supplemented 3D point cloud, the similarity between the output 3D point cloud and the non-positive biological face 3D point cloud can be guaranteed, and the identification accuracy of subsequent application is improved.
In one possible implementation, the biological face hole filling module generates a target synthesized 3D point cloud according to the target hidden data, including: the biological face hole filling module is used for taking target hidden data as input of a 3D graph convolution generating network, outputting a target synthesized 3D point cloud, and generating an antithetical network by the 3D graph convolution including the 3D graph convolution generating network; wherein, the graph convolution layer of the network is generated by utilizing the 3D graph convolution, and the characteristics of the target hidden data are extracted.
In one possible embodiment, the biological face hole filling module is configured to determine whether the target synthesized 3D point cloud and the third 3D point cloud are similar to obtain a fourth 3D point cloud, and includes: and the biological face hole filling module judges whether the loss function value corresponding to the loss function between the target synthesized 3D point cloud and the third 3D point cloud is less than or equal to the similarity threshold corresponding to the loss function, and whether the target synthesized 3D point cloud and the third 3D point cloud are similar.
In one possible embodiment, the loss function includes at least one of: a target synthesizes a distance absolute error between the 3D point cloud and the third 3D point cloud; similarity errors of the target synthesis 3D point cloud and the third 3D point cloud between layers of the same biological face recognition model; similarity errors between the target synthesized 3D point cloud and the third 3D point cloud between layer features of the point set neural network; and errors between the target synthesized 3D point cloud and the third 3D point cloud in the key point locations of the biological face.
In the technical scheme of the application, the similarity between the two point clouds is judged from the overall distribution condition, the contour appearance and the face key position of the target synthesis 3D point cloud and the third 3D point cloud through the four loss functions, so that the similarity between the output fourth 3D point cloud and the input third 3D point cloud can be effectively ensured in multiple aspects, and the fourth 3D point cloud is more real.
In one possible embodiment, the biological face hole filling module is configured to determine whether the target synthesized 3D point cloud and the third 3D point cloud are similar, and includes: if the loss value corresponding to the loss function is larger than the similarity threshold value and the target synthesized 3D point cloud is not similar to the third 3D point cloud, the biological face hole filling module is used for updating the target hidden data; or, the loss value corresponding to the loss function is less than or equal to the similarity threshold, the target synthesized 3D point cloud is similar to the third 3D point cloud, the target synthesized 3D point cloud is a fourth 3D point cloud, and the output unit is configured to output the fourth 3D point cloud.
In one possible embodiment, updating the target hidden data includes: and the biological face hole filling module is used for updating the target hidden data according to a gradient descent method.
According to the technical scheme, the hidden dimensionality and the hidden relation of the control data can be expressed by the target hidden data, the attribute of the biological face can be reflected by the target synthesized 3D point cloud generated by the target hidden data, and the target synthesized 3D point cloud generated by the updated target hidden data is similar to the biological face 3D point cloud with the broken hole through iterative updating of the target hidden data, so that the accuracy of subsequent biological face identification application is improved.
In a fifth aspect, there is provided an apparatus for training a biological face alignment model, the apparatus comprising: the training device comprises an acquisition unit, a processing unit and a display unit, wherein the acquisition unit is used for acquiring first training data, and the first training data comprises position information of a plurality of biological face 3D point clouds and rotation parameters corresponding to each biological face 3D point cloud; and the processing unit is used for training the first training data by utilizing the 3D graph convolution network so as to obtain a first biological face alignment model, the output of the first biological face alignment model is a first rotation parameter, and the first rotation parameter is used for performing correction processing on the non-frontal biological face 3D point cloud.
In the technical scheme of this application, through the driver's appearance biological face alignment model that 3D picture convolution network training obtained, can output first rotation parameter to realize that non-positive biological face changes right the processing through the rigidity conversion, make the non-positive biological face after changeing right no deformation, guaranteed the true reliability of 3D point cloud.
In one possible embodiment, the processing unit trains the first training data to obtain a first biological face alignment model using a 3D graph convolution network, including: the processing unit extracts features of the first training data through the 3D map convolution layer corresponding to the 3D map convolution network.
According to the technical scheme, due to the disorder of the 3D point cloud data structure, the feature extraction of the 3D point cloud data can be realized by performing feature extraction on training data through the graph convolution layer, the depth information in the 3D point cloud data is considered, so that the subsequently obtained 3D point cloud data of the face image without the broken hole has the depth information, and the true reliability of the depth information is ensured. In addition, the amount of computation required to extract the 3D point cloud features by the map convolution layer is much smaller than that required to extract the 3D point cloud features by the full connected layer.
In one possible implementation, the first training data is normalized multiple biological face 3D point cloud position information and rotation parameters corresponding to each biological face 3D point cloud.
According to the technical scheme, the normalized first training data serve as model training data, the convergence rate of the 3D graph convolution network can be improved, and then the efficiency of model training is improved.
In a sixth aspect, there is provided an apparatus for training a biological face alignment model, the apparatus comprising: the acquisition unit is used for acquiring second training data, and the second training data comprises a plurality of groups of biological facial feature vectors and real biological facial 3D point clouds; and the processing unit is used for generating a countermeasure network according to the 3D image convolution, training second training data to obtain a second biological face alignment model, wherein the 3D image convolution generation countermeasure network comprises a 3D image convolution generation network and a 3D image convolution judgment network, and the 3D image convolution generation network in the second biological face alignment model is used for generating a target synthetic 3D point cloud.
In the technical scheme of this application, 3D picture convolution generates the network to the resisting including 3D picture convolution generates network and 3D picture convolution judges the network, draws the characteristic of second training data through the picture convolution, compares in full tie-layer extraction characteristic, and the operand is showing and descends, compares in the convolution layer and draws the characteristic, and the picture convolution layer is more suitable for the characteristic of drawing the 3D point cloud to improve the reliability of target synthesis 3D point cloud.
In one possible embodiment, the processing unit, which generates the countermeasure network according to the 3D volume, trains the second training data, and includes: the processing unit is used for inputting a plurality of groups of biological facial feature vectors into a 3D image convolution generation network so as to obtain a 3D point cloud to be verified; the processing unit is further used for inputting the 3D point cloud of the real biological face and the 3D point cloud to be verified into a 3D image convolution judgment network so as to respectively obtain a first probability and a second probability, wherein the first probability is the probability that the 3D point cloud to be verified is true, the second probability is the probability that the 3D point cloud of the real biological face is true, and the first probability and the second probability are used for obtaining a loss function for training the 3D image convolution judgment network; the first probability is used to obtain a loss function for training the 3D map convolution generation network.
In one possible embodiment, the processing unit inputting the plurality of sets of biometric facial feature vectors into a 3D atlas volume generation network to obtain the 3D point cloud to be verified comprises: the processing unit is used for extracting the characteristics of a plurality of groups of biological facial feature vectors through a 3D image convolution layer corresponding to the 3D image convolution generation network so as to obtain 3D point clouds to be verified; the processing unit is further used for inputting the 3D point cloud of the real biological face and the 3D point cloud to be verified into a 3D image convolution judgment network so as to respectively obtain a first probability and a second probability, and the processing unit comprises: and the processing unit is also used for extracting the characteristics of the 3D point cloud of the real biological face and the 3D point cloud to be verified through the 3D image convolution layer corresponding to the 3D image convolution judging network so as to obtain a first probability and a second probability.
In a seventh aspect, a computer-readable storage medium is provided, which stores program instructions, when the program instructions are executed by a computer, the computer performs the method for biological face alignment in the first aspect or any possible implementation manner of the first aspect, or the computer performs the method for training a biological face alignment model in the second aspect or any possible implementation manner of the second aspect, or the computer performs the method for training biological face alignment in any possible implementation manner of the third aspect or the third aspect.
In an eighth aspect, a computer program product is provided that comprises instructions that, when executed by a computer, cause the computer to perform the method for biological face alignment in the first aspect or any of the possible implementations of the first aspect, or that, when executed by a computer, cause the computer to perform the method for training a biological face alignment model in the second aspect or any of the possible implementations of the second aspect, or that, when executed by a computer, cause the computer to perform the method for training biological face alignment in any of the possible implementations of the third aspect or the third aspect.
Drawings
FIG. 1 is a schematic diagram of the architecture of the system provided herein;
FIG. 2 is a schematic diagram of the architecture of a GAN provided herein;
fig. 3 is a schematic diagram of a two-dimensional face-filling result based on GAN provided in the present application;
fig. 4 is a schematic flow chart of a biological face alignment method provided in the present application;
FIG. 5 is a schematic diagram of a large-angle human face side face angle provided by the present application;
FIG. 6 is a flow chart illustrating a method for training a biological face alignment model according to the present application;
fig. 7 is a schematic flow chart of a biological face alignment method provided in the present application;
FIG. 8 is a schematic flow chart of another biological face alignment method provided herein;
FIG. 9 is a schematic flow chart of another alternative biological facial symmetry patch provided herein;
FIG. 10 is a flow chart of another training method for a biological face alignment model provided in the present application;
fig. 11 is a schematic network structure diagram of a 3D volume generation countermeasure network provided in the present application;
fig. 12 is a schematic flow chart of another face alignment method provided in the present application;
FIG. 13 is a schematic block diagram of an apparatus provided herein;
FIG. 14 is a block diagram of a schematic configuration of a processing unit provided herein;
fig. 15 is a schematic hardware structure diagram of an apparatus provided in the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be described below with reference to the accompanying drawings.
Embodiments of the present application may be applicable to biological face alignment systems, including but not limited to products based on optical biological face imaging. The biological face detection system can be applied to various electronic devices with image acquisition devices (such as cameras), the electronic devices can be personal computers, computer workstations, smart phones, tablet computers, smart cameras, media consumption devices, wearable devices, set top boxes, game machines, Augmented Reality (AR) AR/Virtual Reality (VR) devices, vehicle-mounted terminals and the like, and the embodiment disclosed by the application is not limited to this.
It should be understood that the specific examples are provided herein only to assist those skilled in the art in better understanding the embodiments of the present application and are not intended to limit the scope of the embodiments of the present application.
It should also be understood that, in the various embodiments of the present application, the sequence numbers of the processes do not mean the execution sequence, and the execution sequence of the processes should be determined by the functions and the inherent logic of the processes, and should not constitute any limitation to the implementation process of the embodiments of the present application.
It should also be understood that the various embodiments described in this specification can be implemented individually or in combination, and the examples in this application are not limited thereto.
Unless otherwise defined, all technical and scientific terms used in the examples of this application have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used in the present application is for the purpose of describing particular embodiments only and is not intended to limit the scope of the present application. As used herein, the term "and/or" includes any and all combinations of one or more of the associated listed items.
For better understanding of the solution of the embodiment of the present application, a brief description is given below to a possible application scenario of the embodiment of the present application with reference to fig. 1.
As shown in fig. 1, the present embodiment provides a system architecture 100. In fig. 1, a data acquisition device 160 is used to acquire training data. For the method of biological face alignment of the embodiments of the present application, the training data may include training images or training videos.
After the training data is collected, data collection device 160 stores the training data in database 130, and training device 120 trains target model/rule 101 based on the training data maintained in database 130.
The above target model/rule 101 can be used to implement the method of biological face alignment of the embodiment of the present application. The target model/rule 101 in the embodiment of the present application may specifically be a neural network. It should be noted that, in practical applications, the training data maintained in the database 130 may not necessarily all come from the acquisition of the data acquisition device 160, and may also be received from other devices. It should be noted that, the training device 120 does not necessarily perform the training of the target model/rule 101 based on the training data maintained by the database 130, and may also obtain the training data from the cloud or other places for performing the model training.
The target model/rule 101 obtained by training according to the training device 120 may be applied to different systems or devices, for example, the execution device 110 shown in fig. 1, where the execution device 110 may be a terminal, such as a mobile phone terminal, a tablet computer, a notebook computer, or the like, and may also be a server or a cloud. In fig. 1, the execution device 110 configures an input/output (I/O) interface 112 for data interaction with an external device, and a user may input data to the I/O interface 112 through the client device 140, where the input data may include: a pending video or a pending image input by the client device 140.
In some embodiments, the client device 140 may be the same device as the execution device 110, for example, the client device 140 may be a terminal device as the execution device 110.
In other embodiments, the client device 140 and the execution device 110 may be different devices, for example, the client device 140 is a terminal device, the execution device 110 is a cloud, a server, or the like, the client device 140 may interact with the execution device 310 through a communication network of any communication mechanism/communication standard, the communication network may be a wide area network, a local area network, a peer-to-peer connection, or the like, or any combination thereof.
The computing module 111 of the execution device 110 is configured to process according to input data (e.g., an image to be processed) received by the I/O interface 112. During the process of executing the calculation and the like by the calculation module 111 of the execution device 110, the execution device 110 may call data, codes and the like in the data storage system 150 for corresponding processing, and may also store data, instructions and the like obtained by corresponding processing in the data storage system 150, for example, store the result of the biological face alignment in the data storage system 150 for subsequent biological face recognition, expression recognition, emotion recognition and the like.
Finally, the I/O interface 112 returns the processing result, such as the biological face-aligned result obtained as described above, to the client device 140, thereby providing it to the user.
It should be noted that the training device 120 may generate corresponding target models/rules 101 for different targets or different tasks based on different training data, and the corresponding target models/rules 101 may be used to achieve the targets or complete the tasks, so as to provide the user with the required results.
In the case shown in fig. 1, the user may manually give the input data, which may be operated through an interface provided by the I/O interface 112. Alternatively, the client device 140 may automatically send the input data to the I/O interface 112, and if the client device 140 is required to automatically send the input data to obtain authorization from the user, the user may set the corresponding permissions in the client device 140. The user can view the result output by the execution device 110 at the client device 140, and the specific presentation form can be display, sound, action, and the like. The client device 140 may also serve as a data collection terminal, collecting input data of the input I/O interface 112 and output results of the output I/O interface 112 as new sample data, and storing the new sample data in the database 130. Of course, the input data inputted to the I/O interface 112 and the output result outputted from the I/O interface 112 as shown in the figure may be directly stored in the database 130 as new sample data by the I/O interface 112 without being collected by the client device 140.
It should be noted that fig. 1 is only a schematic diagram of a system architecture provided by an embodiment of the present application, and the position relationship between the devices, modules, and the like shown in the diagram does not constitute any limitation, for example, in fig. 1, the data storage system 150 is an external memory with respect to the execution device 110, and in other cases, the data storage system 150 may also be disposed in the execution device 110.
As shown in fig. 1, a target model/rule 101 is obtained by training according to a training device 120, where the target model/rule 101 may be a neural network in this embodiment, specifically, the neural network in this embodiment may be a 3D graph convolution network (3D-GCN) or other types of neural networks, and this application is not limited in this respect.
In addition, the trained target model/rule 101 in the present application may be composed of a plurality of individually trained target models/rules, and is not limited to a neural network. For example, the input of the neural network a is the input data of the client device 140, the output of the neural network a may be the input of the neural network B, and finally the output of the neural network B is finally transmitted as the output result to the client device 140 through the I/O interface 112, or stored as the input data of the biometric facial recognition in the data storage system 150.
It should be noted that, in the embodiment of the present application, the description of the training method of the biological face alignment model and the biological face alignment method mainly takes the training method of the face alignment model and the face alignment method as an example for description, and the model training method and the face alignment method flow may also be used for other biological faces, which is not limited in the embodiment of the present application.
In reality, the collected face image may be a front face image or a non-front face image, that is, a side face image, and when the non-front face image is a large-angle side face image, because an invisible area of a face is large, the non-front face image is directly subjected to face recognition, and a face recognition result may be affected due to the loss of face feature information. Therefore, how to convert a large-angle non-frontal-face image into a frontal-face image that can be used for face recognition becomes a research hotspot.
The human face alignment technology is characterized in that a non-frontal face image is corrected by positioning key points with semantic features in a human face, so that the difference of the human face in postures of large-angle rotation or side faces and the like is solved. At present, most of common face alignment technologies perform affine transformation (affine transformation) on a non-frontal face two-dimensional image, and align the non-frontal face image to a self-defined two-dimensional template face, thereby achieving face alignment. Wherein the face position is already defined in the two-dimensional template face.
Affine transformation is a linear transformation of two-dimensional coordinates into two-dimensional coordinates that preserves the straightness of the two-dimensional image (so-called straightness is that a straight line remains after transformation) and the parallelism (so-called parallelism is that the relative position between two-dimensional images remains unchanged). When the affine transformation is applied to the face alignment technology, the invisible area of the face cannot be automatically complemented because the affine transformation only performs coordinate transformation on the visible two-dimensional face image, which causes deformation of the non-frontal face image after the affine transformation. If the invisible area in the face image is larger, the obtained front face image is more seriously deformed, and further influences the subsequent application performance, for example, in the face recognition, the front face image with serious deformation seriously influences the recognition result.
Therefore, the two-dimensional face alignment method based on affine transformation cannot automatically complement the invisible area, so that when a large-angle side face is faced, the processing result of face alignment is seriously deformed, and subsequent application is influenced.
At present, a face alignment method based on generation of a antagonistic adaptive network (GAN) exists for how to fill an invisible area of a face. GAN is a deep learning model, and although an unsupervised learning model, can also be used for semi-supervised learning, fully supervised learning, and reinforcement learning.
Fig. 2 is a schematic diagram of the architecture of GAN provided in the present application, and as shown in fig. 2, GAN is a good output generated by mutual game learning of the generator and the decider. Taking the generation of the picture as an example, the generator receives random noise z, generates a false picture close to the real picture through the random noise, and is marked as G (z), and the judger needs to judge whether the false picture is the real picture. Then, the judger receives the real picture x and the dummy picture g (z), and outputs a judgment result d (x), wherein d (x) represents the probability that the dummy picture is true. The purpose of the generator is to generate a false picture close to the reality to deceive the judger, the judger is to distinguish the false picture and the real picture generated by the generator as much as possible, and the false picture and the real picture are continuously optimized in a dynamic game to finally obtain the generator capable of generating an image similar to the reality.
In the two-dimensional face complementing method based on the GAN, a non-frontal face image is taken as input, and the frontal face image is output through a generator in the trained GAN. This can indeed fill up the invisible area of the face to a certain extent to obtain the front face image, but because the input of GAN facing out of the field will not be stably output, i.e. the generator in GAN facing out of the training data will not necessarily obtain better output, i.e. the feature of the filled area may be too different from the original feature, thereby affecting the subsequent application.
Fig. 3 is a schematic diagram of a two-dimensional face-filling result based on GAN provided in the present application. The input image is the left half face of the face image, as shown in fig. 3 (a). The generator in GAN needs to predict the right half-face information from the left half-face information to complete the padding, but the generated face image of the front face may have a right half-face that is not necessarily similar enough to the left half-face, as shown in fig. 3 (b).
Therefore, the problem of inconsistent filling exists in the front face image obtained by the two-dimensional face filling method based on the GAN.
At present, there is also a three-dimensional face alignment method, which generally uses a three-dimensional deformable face model (3D portable face model,3DMM) as a priori knowledge model, regresses a non-frontal two-dimensional face image to form a 3D face reconstructed image, then uses rotation parameters corresponding to the 3D face reconstructed image to rotate a three-dimensional face to form a frontal three-dimensional face, and finally projects the frontal three-dimensional face to two dimensions to realize face alignment. The three-dimensional face image is estimated through the two-dimensional face image, and due to the lack of depth information, the depth information of the reconstructed three-dimensional face image is inaccurate, so that the parameters of the 3D face reconstructed image are converged in a local optimal solution.
In order to solve the above problems, the present application provides a training method for a biological face alignment model, a biological face alignment method and an apparatus.
First, the following explanation is made on terms of art to which the present application relates.
1. A 3D point cloud is a set of vectors in a three-dimensional coordinate system, each 3D point cloud including three-dimensional coordinates and may also include color information or reflection intensity information. The 3D point cloud in the embodiment of the present application may include only three-dimensional coordinates, or may include three-dimensional information and color information, or may include three-dimensional information, color information, and reflection intensity information, which is not limited in the embodiment of the present application.
2. A 3D graph convolution network (3D-GCN), which is a deep learning model for processing and learning structural information of 3D point clouds and for observing and extracting structural information of disordered 3D point clouds having arbitrary shapes and sizes, wherein, since 3D point cloud data is a disordered set, a convolution kernel with a fixed size cannot be obtained. Therefore, the learnable kernel is used as the convolution kernel of the graph convolution layer in the 3D graph convolution network to extract the characteristics, and then the important characteristics are reserved through a maximum pooling mode through a pooling layer, so that the purpose of extracting deep characteristics in the 3D point cloud is achieved, and the size of the model can be reduced.
A method for training the biological face alignment model and a method for biological face alignment will be described in detail with reference to fig. 4 to 12.
Fig. 4 is a schematic flow chart of a biological face alignment method provided in the present application.
As shown in fig. 4, the biological face alignment method in the embodiment of the present application may include the following steps.
S410, acquiring a first 3D point cloud, wherein the first 3D point cloud is a single 3D point cloud of the non-frontal biological face.
As a possible implementation manner, taking the biological face as a human face as an example, the first 3D point cloud may be a preprocessed 3D point cloud. The preprocessed 3D point cloud is a single non-frontal biological face 3D point cloud. There are many methods for pretreatment, and the examples of the present application do not limit the methods.
For example, the preprocessing process may be to obtain a two-dimensional face image corresponding to the 3D point cloud to be preprocessed, select coordinates of non-frontal face images in the two-dimensional face image one by the face detection classifier, and obtain corresponding 3D point clouds of the face one by one according to the coordinates of the non-frontal face images, that is, the first 3D point cloud, so as to ensure that the first 3D point cloud only includes one non-frontal face 3D point cloud.
It should be understood that, in the embodiment of the present application, the 3D point cloud of the non-frontal face is a large-angle 3D point cloud of the face-side face, that is, an included angle between the face orientation and the direction of the face acquisition device is [90 °,180 °), specifically, as shown in fig. 5, fig. 5 is a schematic diagram of the large-angle face-side face angle provided in the present application.
As shown in fig. 5 (a), fig. 5 (a) shows spatial coordinates in which the x-axis direction is the right direction of the face, the y-axis direction is the upward direction of the face, and the z-axis is the forward direction of the face in the spatial coordinate system. Fig. 5 (b) shows a schematic view of the face and the top view angle of the acquisition device. The face orientation is forward, and is unanimous with z axle direction in the space coordinate, and when the point cloud collection equipment of people's face 3D was in position a, the contained angle of collection equipment's direction and face orientation was 90, just can only gather the left half face of people's face this moment promptly. When the human face 3D point cloud collection device is at position b, the included angle between the direction of the collection device and the orientation of the human face is between 90 ° and 180 °, that is, only the left half face and part of the right half face of the human face can be collected at this time. When the human face 3D point cloud collection device is at position c, the included angle between the direction of the collection device and the orientation of the human face is 180 °, that is, the human face 3D point cloud collected at this time is a front face 3D point cloud, which is not in the scene considered in the application, that is, the human face 3D point cloud collected at position c of the collection device is not considered.
It should be noted that, in the present application, it is not limited whether the large-angle human face side face is a left face or a right face, the human face acquired when the included angle between the direction of the acquisition device and the orientation of the human face shown in fig. 5 is 90 ° is a left face, which is only an exemplary effect, and the human face acquired when the included angle between the direction of the acquisition device and the orientation of the human face is 90 ° may also be a right face.
And S420, performing correction processing on the first 3D point cloud by using a 3D graph convolution network to obtain a second 3D point cloud.
It should be understood that the second 3D point cloud is a corrected 3D point cloud of the side of the biological face, and it is the position information in the first 3D point cloud that is rotated.
And S430, performing symmetrical filling processing on the second 3D point cloud to obtain a third 3D point cloud.
It should be understood that the third 3D point cloud after being symmetrically filled is the 3D point cloud after being primarily filled, and there are some holes in the primarily filled third 3D point cloud, and these holes may be point cloud data that is originally missing in the first 3D point cloud, or point cloud data that is missing in the symmetric filling process, which is not limited in this embodiment of the present application.
And S440, generating a confrontation network by utilizing the 3D image convolution, and performing hole filling processing on the third 3D point cloud to obtain a fourth 3D point cloud, wherein the fourth 3D point cloud is the 3D point cloud of the non-broken-hole front biological face.
It should be understood that the 3D map convolution generation countermeasure network includes a 3D map convolution generation network and a 3D map convolution decision network. And performing hole filling processing on the third 3D point cloud, namely generating a network through 3D image convolution to generate a target synthetic point cloud which is sufficiently similar to the third 3D point cloud, wherein the fourth 3D point cloud is the target synthetic point cloud.
And (3) correcting the 3D point cloud on the side surface of the large-angle biological face, and primarily and symmetrically filling the corrected 3D point cloud to obtain the 3D point cloud on the front surface of the biological face with a plurality of broken holes. And taking the preliminarily supplemented 3D point cloud of the front face as a comparison point cloud, so that the generated 3D point cloud of the non-broken-hole front face image is similar to the comparison point cloud in characteristics, and the identification accuracy of the subsequent application of the non-broken-hole front face 3D point cloud is improved.
The biological face alignment process in fig. 4 will be described in detail with reference to fig. 6 to 12, where fig. 6, 11 and 12 are specific training processes of a biological face alignment model that needs to be used in the biological face alignment process of the present application.
Fig. 6 is a schematic flow chart of a training method of a biological face alignment model provided in the present application, where the first biological face alignment model is used to estimate rotation parameters required for 3D point cloud correction of different biological face poses, and a biological face is described by taking a human face as an example.
S601, acquiring first training data, wherein the first training data comprises position information of a plurality of biological face 3D point clouds and rotation parameters corresponding to each biological face 3D point cloud.
It should be understood that the position information of the 3D point cloud is the position information of the different poses of the biological face 3D point clouds in the three-dimensional direction, that is, the position information of each point cloud in the x, y, and z directions of the spatial coordinates.
It should be understood that the rotation parameter corresponding to each biological face 3D point cloud may be an euler angle parameter required for each 3D point cloud with different postures to rotate to the front, which is a rotation parameter in the x-axis direction (pitch angle parameter), a rotation parameter in the y-axis direction (yaw angle parameter), and a rotation parameter in the z-axis direction (roll angle parameter), respectively.
As a possible implementation manner, the plurality of point cloud location information in the first training data may be a plurality of point cloud location information generated in the 3DMM, and each point cloud location information generated by the 3DMM has a corresponding rotation parameter. For example, the 3D point clouds of the face images corresponding to different angles in different face poses are corrected, and each 3D point cloud has its own rotation parameter.
In the scheme of the application, the 3DMM is three-dimensional point cloud data obtained through two-dimensional image reconstruction, so that the data volume of the first training data generated through the 3DMM can be large and is easy to obtain, besides, the correctness of the first training data can also be ensured, and especially the correctness of the rotation parameter corresponding to each point cloud data can be ensured. Therefore, the first training data generated by the 3DMM is used as the training data, and the obtained first face alignment model is more accurate.
As a possible implementation manner, the plurality of point cloud location information and the rotation parameter corresponding to each point cloud location information may also be acquired by a human face image data acquisition device (which may be the data acquisition device 160 in fig. 1), where the human face image data acquisition device may be a ToF camera, an RGB-D depth camera, or another depth camera, which is not limited herein. The present application may also have other manners of obtaining the first training data, which is not limited in this embodiment of the present application.
Through the face image data acquisition device, although the acquisition process is complex and time-consuming, the acquired first training data is more real, and therefore, a first face alignment model obtained by taking the first training data acquired by the face image data acquisition device as the training data is more real.
S602, training first training data by using a 3D graph convolution network to obtain a first biological face alignment model, wherein the output of the first biological face alignment model is a first rotation parameter, and the first rotation parameter is used for performing correction processing on non-frontal biological face 3D point cloud.
As a possible implementation, the first training data is normalized, limiting the value of the first training data to be between [0,1] or [ -1,1 ].
In the embodiment of the application, the normalized first training data is used as the model training data, so that the convergence speed of the 3D graph convolution network can be increased, and the efficiency of model training is further improved.
As a possible implementation, the first training data is trained using a 3D graph convolution network, wherein the first training data is feature extracted by graph convolution layers in the 3D graph convolution network to obtain a first biological face alignment model with output as rotation parameters.
In the embodiment of the application, due to the disorder of the 3D point cloud data structure, the feature extraction of the 3D point cloud data can be realized by performing feature extraction on the training data through the graph convolution layer, and the depth information in the 3D point cloud data is considered, so that the subsequently obtained 3D point cloud data of the face image without broken holes has the depth information, and the real reliability of the depth information is ensured. In addition, the amount of computation required to extract the 3D point cloud features by the map convolution layer is much smaller than that required to extract the 3D point cloud features by the full connected layer.
It should be noted that, although the above training method mainly aims at the large-angle human face side face, the method can also be applied to the face of other living beings or other objects that need to be corrected, and at this time, the first training data can be changed into the 3D point cloud of the face of other living beings and the corresponding rotation parameters thereof. The embodiments of the present application do not limit this.
S420 is executed according to the trained first biological face alignment model, i.e. the trained 3D image convolution network. Fig. 7 is a schematic flow chart of a biological face alignment method provided in the present application, and specifically, S420 is expanded in detail, and a biological face is described by taking a human face as an example.
S421, processing the first 3D point cloud by using the 3D graph convolution network to obtain a first rotation parameter.
It should be understood that the 3D graph convolution network is a 3D graph convolution network trained according to the above-mentioned method, and when a non-frontal face 3D point cloud is input, that is, when a first 3D point cloud is input, a first rotation parameter is output, and the first rotation parameter is used to perform rotation transformation on the position information of the first 3D point cloud to obtain the position information of the corrected 3D point cloud.
S422, according to the first rotation parameter, obtaining the position information of the corrected first 3D point cloud, and further obtaining a second 3D point cloud.
As a possible implementation manner, when the first 3D point cloud only includes the position information, the position information of the first 3D point cloud obtained by performing rotation transformation according to the first rotation parameter is the second 3D point cloud. I.e., the second 3D point cloud includes only the three-dimensional position information for each point cloud.
As a possible implementation manner, when the first 3D point cloud includes the position information and other information, for example, the first 3D point cloud includes each position information and color information, the position information of the first 3D point cloud is subjected to rotation transformation by the first rotation parameter, and the color information of the first 3D point cloud remains unchanged. The second 3D point cloud comprises the corrected position information of the first 3D point cloud and the color information of the first 3D point cloud corresponding to the position information.
It should be noted that the position information of each point cloud in the first 3D point cloud corresponds to other information one to one, and the other information in this embodiment may also be reflection intensity information or acquired infrared information, and the type of the other information is not limited in this application.
According to the scheme of the embodiment of the application, the rotating parameters corresponding to the 3D point cloud of the face needing to be corrected can be obtained through the trained 3D graph convolution network, the position information of the first 3D point cloud is rotated according to the rotating parameters, and the rotating parameters are converted into rigid transformation, so that compared with affine transformation, the second 3D point cloud after being corrected is not deformed, and the real reliability of the corrected face 3D point cloud is improved.
S430 will be described in detail with reference to fig. 8 and 9, where fig. 8 is a schematic flow chart of another biological face alignment method provided in the present application, and a biological face is described as an example.
S431, extracting a partial point cloud of the second 3D point cloud.
As one possible implementation, a plane with x ═ N corresponding to the three-dimensional coordinates of the second 3D point cloud is obtained. The value of N depends on the position of the three-dimensional coordinate of the second 3D point cloud, for example, when the three-dimensional coordinate of the second 3D point cloud is located at the center position of the second 3D point cloud, N is 0. When the three-dimensional coordinate is not located at the center position of the second 3D point cloud, the value of N may be determined by obtaining the relative distance between the three-dimensional coordinate and the center position in the x direction, and the three-dimensional coordinate may be converted to the center position by coordinate conversion to obtain the second 3D point cloud after coordinate conversion, where N is 0.
As a possible implementation manner, in manner 1, the partial point cloud of the second 3D point cloud is a part with a larger number of point clouds in two parts of the second 3D point cloud according to a plane where x is equal to N, that is, a side face with complete facial features.
In the embodiment of the application, the partial point cloud for extracting the second 3D point cloud gives up the missing side face part, so that the symmetry and the harmony of the face after subsequent symmetrical filling are ensured.
As one possible implementation manner, manner 2 is to obtain two end faces of the second 3D point cloud in the x direction, where x is equal to x on the left end face1The right end face is x ═ x2Let x be2>x1The end face of the coordinate center of the second 3D point cloud three-dimensional coordinate is x ═ N, wherein x is1≤N<x2Or x1<N≤x2. Part of the second 3D point cloud is at x ═ 2N-x1And x ═ x2Or a part of the second 3D point cloud is in x ═ x1And x is 2N-x2Point clouds in between.
In the embodiment of the application, the acquired partial point cloud of the second 3D point cloud is the partial point cloud with the missing left half face or right half face, the original 3D point cloud of the left half face or right half face is reserved, and the authenticity of the follow-up symmetrical and complete face can be ensured.
S432, carrying out symmetrical processing on part of point cloud of the second 3D point cloud to obtain a third 3D point cloud
As a possible implementation manner, the partial point cloud of the second 3D point cloud extracted in S431 is subjected to symmetry processing according to the plane where x is equal to N, so as to obtain a preliminarily supplemented face 3D point cloud, that is, a third 3D point cloud.
The method for symmetrically filling up a human face is described in detail below with reference to fig. 9, where fig. 9 is a schematic flow chart of another biological face symmetric filling-up method according to the present application, and a human face is taken as an example of the biological face for description. It should be noted that fig. 9 is a flow for reflecting symmetric completion through a two-dimensional image, and a 3D point cloud of a human face is used in the present application instead of the two-dimensional image, where the two-dimensional image is used to facilitate visual description of the symmetric completion method used in the embodiments of the present application.
Fig. 9 (a), (b), and (c) are schematic diagrams of a symmetrical filling process corresponding to the method 1, where fig. 9 (a) is a corrected first 3D point cloud, that is, a second 3D point cloud, a plane with the center position x of the second 3D point cloud being N is obtained, a part of the extracted second 3D point cloud is a right half face shown in fig. 9 (b), a point cloud of an incomplete left half face is discarded, and a symmetrical filling is performed on the right half face in fig. 9 (b) with the plane x being N, so as to obtain a preliminarily filled human face 3D point cloud, as shown in fig. 9 (c).
Fig. 9 (e), (f), and (g) are schematic diagrams of a symmetrical completion flow corresponding to the mode 2, and fig. 9 (e) and (a) are second 3D point clouds. Acquiring a plane with the central position x being N and a left end face x being x of the second 3D point cloud1And the right end face x ═ x2And further acquiring a section where the partial point cloud of the second 3D point cloud is located, wherein x is 2N-x1And x ═ x2Then, as shown in fig. 9 (e), a face 3D point cloud after preliminary patching is obtained by performing symmetrical patching on the face with x being equal to N in fig. 9 (e), as shown in fig. 9 (f).
It should be noted that the preliminarily supplemented face 3D point cloud obtained here is not completely applicable to subsequently applied face 3D point clouds, and there may be a missing situation of the 3D point cloud in some key positions, so that a hole-filling process is subsequently required to be performed on the third 3D point cloud.
The following will describe in detail the hole filling process of the preliminarily filled 3D point cloud with reference to fig. 10 to 12, where fig. 10 and 11 are specific training procedures for describing another biological face filling model, and a biological face is described as an example.
Fig. 10 is a flowchart illustrating a training method of another biological face alignment model provided in the present application, where a second biological face alignment model is used to generate a target synthetic 3D point cloud.
S1001, second training data are obtained, and the second training data comprise a plurality of groups of biological facial feature vectors and real human face 3D point clouds.
As a possible implementation manner, the biological face is described by taking a human face as an example, the multiple sets of human face feature vectors may include multiple sets of multidimensional human face feature values, and various attributes of the human face output in the network model generated in the second human face alignment model, such as a human face size, a human face skin color, a human face gender, and the like, may be controlled by the human face feature values, which is not limited in this application.
As a possible implementation, a real face 3D point cloud may be obtained from a baseface model (BFM). The BFM is a model which generates a plurality of real faces based on the 3DMM technology, M pieces of face 3D point cloud data are randomly selected from the BFM, new face 3D point cloud is obtained in a weighted average mode, and the new face 3D point cloud is used as the real face 3D point cloud in the second training data.
And S1002, generating a countermeasure network according to the 3D image convolution, and training second training data to obtain a second biological face alignment model, wherein the 3D image convolution generation countermeasure network comprises a 3D image convolution generating network and a 3D image convolution judging network, and the 3D image convolution generating network in the second biological face alignment model is used for generating a target synthetic 3D point cloud.
It should be appreciated that the primary purpose of training a 3D graph convolution generating countermeasure network (3D GCN-GAN) is to generate a 3D graph convolution generating network for generating a target synthetic 3D point cloud.
It should be noted that, although the above training method mainly generates a 3D point cloud of a human face, the same method can be applied to generate a 3D point cloud of the face of another living being, and at this time, the second training data may be changed into feature vectors of the face of another living being and a real 3D point cloud of another living being. The embodiments of the present application do not limit this.
Fig. 11 is a schematic network structure diagram of a 3D volume generation countermeasure network provided in the present application. The description will be given by taking a human face as an example.
As shown in fig. 11, first, a plurality of sets of face feature vectors are used as input of a 3D volume generation network, and features are extracted through the volume layer, thereby obtaining a 3D point cloud g (z)' of which the face is to be verified.
And then, taking the 3D point cloud G (z) ' to be verified of the face and the 3D point cloud of the real face as the input of a 3D image volume judgment network, judging whether the 3D point cloud to be verified of the face and the 3D point cloud of the real face are similar, and outputting a first probability and a second probability, wherein the first probability is the probability D (G (z) ' that the 3D point cloud to be verified of the face is true, and the second probability is the probability D (x ') that the 3D point cloud of the real face is true. Wherein, the 3D graph convolution judging network also extracts the characteristics through the graph convolution layer.
As a possible implementation, the 3D graph convolution decision network is trained according to a judger loss function, wherein the judger loss function is associated with a first probability and a second probability; training a 3D graph convolution generation network according to a generator loss function, wherein the generator loss function is related to the first probability; and cross-training the 3D graph convolution judgment network and the 3D graph convolution generation network.
Illustratively, specifically, according to the determiner loss function, parameters of the 3D graph convolution determining network are optimized, and a first 3D graph convolution determining network is obtained through training; and optimizing parameters of the 3D map convolution generating network according to a generator loss function related to the updated first probability, and training to obtain the first 3D map convolution generating network. And obtaining a final trained 3D image volume through a cross training mode to generate a confrontation network.
It should be understood that the trained 3D atlas generation countermeasure network is the second biological face alignment model, and the 3D atlas generation network model of the second biological face alignment model may be used to generate a biological face 3D point cloud, for example, the trained 3D atlas generation network model based on the face feature vectors may be used to generate a face 3D point cloud.
The features of the second training data are extracted through graph convolution, compared with the feature extraction of a full connection layer, the operation amount is obviously reduced, and compared with the feature extraction of a convolution layer, the graph convolution layer is more suitable for extracting the features of the 3D point cloud.
S440 is performed according to the trained second biological face alignment model, i.e., the trained 3D volume generation network in the 3D volume generation countermeasure network. Fig. 12 is a schematic flow chart of another face alignment method provided in the present application, and specifically, S440 is described in detail, and a biological face is described by taking a face as an example.
S441, target hidden data are obtained, a network model is generated by means of 3D graph convolution, and target synthetic 3D point cloud is obtained.
It should be understood that, in the embodiment of the present application, target hidden data (vector code) is a plurality of sets of face feature vectors, and a target synthesized 3D point cloud is generated by using a trained 3D graph convolution generation network model according to the target hidden data. The target hidden data can express hidden dimensions and control the hidden relation of the data, so that the attribute of the human face can be reflected by the target synthesized 3D point cloud generated by the target hidden data. The target hidden data may also be referred to as a target hidden quantity or a target hidden code.
As a possible implementation manner, random initialization processing is performed on the target hidden data, and a specific initialization process may be to randomly generate random data with the same dimension as the target hidden data, and may also be to screen out better target hidden data by performing preliminary estimation on the target hidden data, so as to reduce the number of times of optimization on the target hidden data and obtain the final target hidden data used for generating the target synthetic 3D point cloud similar to the third 3D point cloud more quickly.
For example, the target hidden data may be 18 sets of 512-dimensional face feature vectors for generating the target synthesized 3D point cloud, and the embodiment of the present application does not limit the amount of the face feature vectors.
S442, determining whether the target synthesized 3D point cloud is similar to the third 3D point cloud to obtain a fourth 3D point cloud.
As a possible implementation manner, whether the target synthesized 3D point cloud and the third 3D point cloud are similar is determined by determining whether a loss function value corresponding to a loss function between the target synthesized 3D point cloud and the third 3D point cloud is less than or equal to a similarity threshold, so that the target synthesized 3D point cloud similar to the third 3D point cloud is output as a fourth 3D point cloud, that is, the fourth 3D point cloud and the third 3D point cloud are similar point clouds.
It should be noted that different similarity thresholds may be different for different loss function values, the determination of the similarity threshold may be made according to the requirement for the similarity degree between the target synthesized 3D point cloud and the third 3D point cloud, and the smaller the similarity threshold is, the more similar the target synthesized 3D point cloud and the third 3D point cloud may be as the fourth 3D point cloud. It should be understood that the similarity threshold may be a specific value or an interval range, and the embodiment of the present application is not limited thereto.
As a possible implementation, the loss function between the target synthesized 3D point cloud and the third 3D point cloud may be at least one of the following loss functions: the first loss function is an absolute error of a distance between the target synthesized 3D point cloud and the third 3D point cloud; the second loss function is a similarity error between the target synthesized 3D point cloud and the third 3D point cloud in each layer of the same face recognition model; the third loss function is a similarity error between the target synthesized 3D point cloud and the layer characteristics of the third 3D point cloud in the point set neural network; and the fourth loss function is the error of the target synthesized 3D point cloud and the third 3D point cloud at the key point of the human face.
It should be noted that the first loss function is used to limit the distribution similarity between the target synthesized 3D point cloud and the third 3D point cloud, that is, the overall distribution between the two point clouds is controlled by the first loss function.
It should be noted that the second loss function is used to limit the contour similarity between the target synthesized 3D point cloud and the third 3D point cloud. Specifically, first, the target synthesized 3D point cloud and the third 3D point cloud are used as inputs of the same face recognition model to be respectively calculated. Subsequently, the feature F1 of the ith layer of the target synthesized 3D point cloud in the face recognition model and the feature F2 of the ith layer of the third 3D point cloud in the face recognition model are respectively extracted. Finally, a similarity error between the feature F1 and the feature F2 is obtained for determining whether the target synthesized 3D point cloud and the third 3D point cloud are similar, thereby deciding the next step. It should be understood that the embodiments of the present application can compare features of one and the same layer, and can also compare features of a plurality of and the embodiments of the present application are not limited thereto.
It should be noted that the third loss function and the second loss function have the same function, and the difference is that the specific generation mode of the third loss function is to input the target synthesized 3D point cloud and the third 3D point cloud into the same point set neural network, for example, into the deep learning model PointNet + +, so as to obtain the similarity error between the features of the two 3D point clouds in the same layer in the PointNet + +. The deep learning model PointNet + + is mainly used for point cloud classification, point cloud segmentation and the like.
It should be noted that the fourth loss function is used to limit an error between the target synthesized 3D point cloud and the key position of the face of the third 3D point cloud, that is, the fourth loss function controls the error between the key positions of the face of the target synthesized 3D point cloud and the key position of the face of the third 3D point cloud to be within a certain range.
It should be understood that the above-mentioned loss function may be combined in any way to determine whether the target synthesized 3D point cloud and the third 3D point cloud are similar, which is not limited in this application.
According to the scheme of the embodiment of the application, the similarity between the target synthesis 3D point cloud and the third 3D point cloud is judged from the overall distribution condition, the contour appearance and the face key position of the target synthesis 3D point cloud and the face key position through the four loss functions, so that the similarity between the output fourth 3D point cloud and the input third 3D point cloud can be effectively ensured in multiple aspects, and the fourth 3D point cloud is more real.
And S443, the loss value corresponding to the loss function is smaller than or equal to a similarity threshold, the target synthesized 3D point cloud is similar to the third 3D point cloud, and the fourth 3D point cloud is the target synthesized 3D point cloud generated according to the target hidden data.
It should be understood that the loss function may be a combination of any of the above loss functions, and each loss function may have a respective similarity threshold, which is not limited in the embodiments of the present application.
And S444, if the loss value corresponding to the loss function is larger than the similarity threshold value, the target synthesized 3D point cloud is not similar to the third 3D point cloud, and the target hidden data is updated.
As a possible implementation manner, the target hidden data is updated by a gradient descent method, so that a similarity error between the target synthesized 3D point cloud generated by updating the screened target hidden data and the third 3D point cloud is reduced.
As a possible implementation manner, the updated target hidden data is input into the 3D graph convolution generation network model as new target hidden data to obtain a new target synthesized 3D point cloud (S441), and then the similarity between the new target synthesized 3D point cloud and the third 3D point cloud is continuously compared (S442). When the target synthesized 3D point cloud and the third 3D point cloud are judged to be dissimilar, the target hidden data is continuously updated (S444). S441, S442, and S444 are cyclically executed until the updated target synthesized 3D point cloud and the third 3D point cloud are judged to be similar, S443 is executed to obtain a fourth 3D point cloud, and the fourth 3D point cloud is output.
In the scheme of the embodiment of the application, the target hiding data is updated in an iterative manner by controlling the similarity between the target synthesized 3D point cloud and the front face 3D point cloud of the biological face with the broken hole, so that the target synthesized 3D point cloud obtained according to the updated target hiding data is sufficiently similar to the front face 3D point cloud of the biological face needing hole filling, and the final target synthesized 3D point cloud is output as the front face 3D point cloud of the biological face finished with hole filling. By the method, the output front face 3D point cloud of the biological face can be ensured to have no broken hole, the overall harmony of the front face 3D point cloud of the biological face can be ensured, and the subsequent application of the front face 3D point cloud of the biological face is facilitated.
The embodiments of the biological face alignment model training method and the biological face alignment method in the present application are described in detail above with reference to fig. 4 to 12, and the embodiments of the biological face alignment model training apparatus and the biological face alignment apparatus are described below with reference to fig. 13 to 15. It should be understood that the apparatus embodiments and the method embodiments correspond to each other, and in order to avoid redundancy, the detailed description may refer to the method embodiments and will not be repeated herein.
Fig. 13 is a schematic block diagram of an apparatus provided herein. The apparatus 1300 shown in fig. 13 includes an obtaining unit 1310 and a processing unit 1320, and optionally, the apparatus may further include an output unit 1330.
In one implementation, the apparatus 1300 may serve as a biological face alignment apparatus, in which case the apparatus 1300 may include an acquisition unit 1310, a processing unit 1320, and an output unit 1340.
As shown in fig. 13, the biological face alignment apparatus includes: an obtaining unit 1310 configured to obtain a first 3D point cloud, where the first 3D point cloud is a single non-frontal biological face 3D point cloud; the processing unit 1320 is configured to obtain a fourth 3D point cloud according to the first 3D point cloud, where the fourth 3D point cloud is a 3D point cloud of the front biological face without the hole; optionally, the output unit 1340 outputs the fourth 3D point cloud.
Specifically, as shown in fig. 14, fig. 14 is a schematic structural block diagram of a processing unit provided in the present application, and the processing unit 1320 may include a biological face correction module 1321, a biological face symmetry module 1322, and a biological face hole filling module 1323, and specifically includes: the biological face correction module 1321 performs correction processing on the first 3D point cloud by using a 3D graph convolution network to obtain a second 3D point cloud; the biological face symmetry module 1322 is used for performing symmetrical complement processing on the second 3D point cloud to obtain a third 3D point cloud; the biological face hole filling module 1323 generates a countermeasure network by using the 3D graph convolution, and performs hole filling processing on the third 3D point cloud to obtain a fourth 3D point cloud, wherein the fourth 3D point cloud is a 3D point cloud of the non-broken hole front biological face. The detailed process can refer to the method embodiment, which is not described herein.
The modules are divided logically from functional points of view, and the modules are not limited to being independent hardware units.
In the technical scheme, the 3D point cloud of the large-angle biological face is corrected, and according to the corrected 3D point cloud, the 3D point cloud of the front of the biological face with a plurality of broken holes is obtained through preliminary symmetrical filling. And taking the preliminarily supplemented 3D point cloud of the front face as a comparison point cloud, so that the generated 3D point cloud of the non-broken-hole front face image is similar to the comparison point cloud in characteristics, and the identification accuracy of the subsequent application of the non-broken-hole front face 3D point cloud is improved.
In one implementation, the apparatus 1300 may be a biological face alignment model training apparatus, and in this case, the apparatus 1300 may include an obtaining unit 1310 and a processing unit 1320.
As shown in fig. 13, the training apparatus for a biological face alignment model includes: an obtaining unit 1310 configured to obtain first training data, where the first training data includes position information of a plurality of biological face 3D point clouds and a rotation parameter corresponding to each biological face 3D point cloud; the processing unit 1320, using the 3D graph convolution network to train the first training data to obtain a first biological face alignment model, where an output of the first biological face alignment model is a first rotation parameter, and the first rotation parameter is used to perform a correction process on the non-frontal biological face 3D point cloud.
According to the technical scheme, the first biological face alignment model obtained through 3D graph convolution network training can output first rotating parameters, so that the non-positive biological face is corrected through rigid conversion, the corrected non-positive biological face is free of deformation, and the real reliability of 3D point cloud is guaranteed.
In one implementation, the apparatus 1300 may serve as another biological face alignment model training apparatus, and in this case, the apparatus 1300 may include an obtaining unit 1310 and a processing unit 1320.
As shown in fig. 13, the training apparatus for a biological face alignment model includes: an obtaining unit 1310 configured to obtain second training data, where the second training data includes a plurality of sets of biological facial feature vectors and a real biological facial 3D point cloud; the processing unit 1320 generates a countermeasure network according to the 3D image convolution, trains the second training data to obtain a second biological face alignment model, where the 3D image convolution generation countermeasure network includes a 3D image convolution generation network and a 3D image convolution determination network, and the 3D image convolution generation network in the second biological face alignment model is used to generate a target synthetic 3D point cloud.
In the technical scheme of this application, 3D picture convolution generates the network to the resisting including 3D picture convolution generates network and 3D picture convolution judges the network, draws the characteristic of second training data through the picture convolution, compares in full tie-layer extraction characteristic, and the operand is showing and descends, compares in the convolution layer and draws the characteristic, and the picture convolution layer is more suitable for the characteristic of drawing the 3D point cloud to improve the reliability of target synthesis 3D point cloud.
Fig. 15 is a schematic hardware structure diagram of an apparatus provided in the present application. The apparatus 1500 shown in fig. 15 (which apparatus 1500 may specifically be a computer device) includes a memory 1510, a processor 1520, a communication interface 1530, and a bus 1540. The memory 1510, the processor 1520, and the communication interface 1530 are communicatively connected to each other via a bus 1540.
The memory 1510 may be a Read Only Memory (ROM), a static memory device, a dynamic memory device, or a Random Access Memory (RAM). The memory 1510 may store a program, and the processor 1520 and the communication interface 1530 are used to perform the steps of the training method of the biological face alignment model and the method of biological face alignment of the embodiments of the present application when the program stored in the memory 1510 is executed by the processor 320.
The processor 1520 may adopt a general-purpose Central Processing Unit (CPU), a microprocessor, an Application Specific Integrated Circuit (ASIC), a Graphics Processing Unit (GPU) or one or more integrated circuits, and is configured to execute related programs to implement functions required to be executed by modules in the biological face alignment apparatus and the biological face alignment model training apparatus according to the embodiment of the present application, or to execute the biological face alignment method and the biological face alignment model training method according to the embodiment of the present application.
The processor 1520 may also be an integrated circuit chip having signal processing capabilities. In implementation, the various steps of the method of biological face alignment of the present application may be accomplished by instructions in the form of hardware integrated logic circuits or software in the processor 1520. The processor 1520 may also be a general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic, or discrete hardware components. The various methods, steps, and logic blocks disclosed in the embodiments of the present application may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present application may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The storage medium is located in the memory 1510, and the processor 1520 reads information in the memory 1510, and performs, in combination with hardware thereof, functions required to be performed by modules in the biological face alignment apparatus and the biological face alignment model training apparatus according to the embodiment of the present application, or performs the method for biological face alignment and the method for training a biological face alignment model according to the embodiment of the present application.
Communication interface 1530 enables communication between apparatus 150 and other devices or communication networks using transceiver equipment such as, but not limited to, a transceiver. For example, input data may be acquired through communication interface 1530.
Bus 1540 may include a pathway to transfer information between components of apparatus 1500 (e.g., memory 1510, processor 1520, communication interface 1530).
It should be noted that although the apparatus 1500 shown in fig. 15 shows only the memory 1510, the processor 1520, the communication interface 1530 and the bus 1540, in a specific implementation, those skilled in the art will appreciate that the apparatus 1500 also includes other devices necessary for normal operation. Also, those skilled in the art will appreciate that the apparatus 1500 may also include hardware components for performing other additional functions, according to particular needs. Furthermore, those skilled in the art will appreciate that apparatus 1500 may also include only those components necessary to implement embodiments of the present application, and need not include all of the components shown in FIG. 15.
It is to be understood that the apparatus 1500 may correspond to the apparatus 1300 in fig. 13 described above, that the functions of the processing unit 1320 in the apparatus 1300 may be implemented by the processor 1520, and that the functions of the obtaining unit 1310 and the output unit 1330 may be implemented by the communication interface 1530. To avoid repetition, detailed description is appropriately omitted here.
The embodiment of the application also provides a processing device, which comprises a processor and an interface; the processor is used for executing the method for biological face alignment and the training method for the biological face alignment model in any one of the method embodiments.
It should be understood that the processing means may be a chip. For example, the processing device may be a field-programmable gate array (FPGA), an application-specific integrated circuit (ASIC), a system on chip (SoC), a Central Processing Unit (CPU), a Network Processor (NP), a digital signal processing circuit (DSP), a Microcontroller (MCU), a Programmable Logic Device (PLD), or other integrated chips.
The embodiment of the present application further provides a platform system, which includes the aforementioned biological face alignment apparatus and a biological face alignment model training apparatus.
The embodiments of the present application also provide a computer-readable medium, on which a computer program is stored, which, when executed by a computer, implements the method of any of the above-mentioned method embodiments.
The embodiment of the present application further provides a computer program product, and the computer program product implements the method of any one of the above method embodiments when executed by a computer.
The embodiment of the application also provides electronic equipment which can comprise the biological face alignment device of the embodiment of the application.
For example, the electronic device is a smart door lock, a mobile phone, a computer, an access control system, or the like, which requires face recognition. The biological face device comprises software and hardware devices used for the biological face in the electronic equipment.
Optionally, the electronic device may further include a depth map acquisition device, such as a ToF camera or an RGB-D depth camera.
Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
As used in this specification, the terms "unit," "module," "system," and the like are intended to refer to a computer-related entity, either hardware, firmware, a combination of hardware and software, or software in execution. For example, a component may be, but is not limited to being, a process running on a processor, an object, an executable, a thread of execution, a program, and/or a computer. By way of illustration, both an application running on a computing device and the computing device can be a component. One or more components can reside within a process and/or thread of execution and a component can be localized on one computer and/or distributed between 2 or more computers. In addition, these components can execute from various computer readable media having various data structures stored thereon. The components may communicate by way of local and/or remote processes such as in accordance with a signal having one or more data packets (e.g., data from two components interacting with another component in a local system, distributed system, and/or across a network such as the internet with other systems by way of the signal).
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solutions of the present application may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present application, and shall be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (36)

1. A method of biological facial alignment, comprising:
acquiring a first 3D point cloud, wherein the first 3D point cloud is a single non-frontal biological face 3D point cloud;
performing correction processing on the first 3D point cloud by using a 3D graph convolution network to obtain a second 3D point cloud;
performing symmetrical filling processing on the second 3D point cloud to obtain a third 3D point cloud;
and generating a countermeasure network by utilizing the convolution of the 3D image, and performing hole filling processing on the third 3D point cloud to obtain a fourth 3D point cloud, wherein the fourth 3D point cloud is the 3D point cloud of the front biological face without the broken hole.
2. The method of claim 1, wherein the using a 3D atlas network to forward the first 3D point cloud to obtain a second 3D point cloud comprises:
and processing the first 3D point cloud by utilizing the 3D graph convolution network to obtain a first rotation parameter, wherein the first rotation parameter is used for correcting the first 3D point cloud.
3. The method of claim 2, wherein inverting the first 3D point cloud to obtain a second 3D point cloud comprises:
correcting the position information of the first 3D point cloud through the first rotation parameter;
the second 3D point cloud includes location information of the first 3D point cloud after rectification.
4. The method of claim 3, further comprising:
the color information of the first 3D point cloud is kept unchanged, and the second 3D point cloud comprises the corrected position information of the first 3D point cloud and the color information of the first 3D point cloud corresponding to the position information.
5. The method of any of claims 3 or 4, wherein the symmetrically inpainting the second 3D point cloud to obtain a third 3D point cloud comprises:
extracting a partial point cloud of the second 3D point cloud;
and carrying out symmetrical processing on part of the second 3D point cloud to obtain a third 3D point cloud.
6. The method according to any one of claims 1 to 5, wherein the generating a countermeasure network by using 3D graph convolution and performing hole filling processing on the third 3D point cloud to obtain a fourth 3D point cloud comprises:
acquiring target hidden data;
generating a target synthesized 3D point cloud according to the target hidden data;
and judging whether the target synthesized 3D point cloud is similar to the third 3D point cloud or not to obtain the fourth 3D point cloud.
7. The method of claim 6, wherein generating a target synthesized 3D point cloud from the target hidden data comprises:
taking the target hidden data as the input of a 3D image convolution generating network, outputting the target synthesized 3D point cloud, wherein the 3D image convolution generating confrontation network comprises the 3D image convolution generating network;
and generating a graph convolution layer of the network by utilizing the 3D graph convolution, and extracting characteristics of the target hidden data.
8. The method of any one of claims 6 or 7, wherein determining whether the target synthetic 3D point cloud and the third 3D point cloud are similar to obtain the fourth 3D point cloud comprises:
and judging whether the target synthesized 3D point cloud is similar to the third 3D point cloud or not by judging whether a loss function value corresponding to a loss function between the target synthesized 3D point cloud and the third 3D point cloud is less than or equal to a similarity threshold corresponding to the loss function.
9. The method of claim 8, wherein the loss function comprises at least one of:
a distance absolute error between the target synthetic 3D point cloud and the third 3D point cloud;
similarity errors between layers of the target synthesized 3D point cloud and the third 3D point cloud in the same biological face recognition model;
similarity errors between layer features of the target synthetic 3D point cloud and the third 3D point cloud in a point set neural network;
an error between the target synthetic 3D point cloud and the third 3D point cloud at the biological facial key points.
10. The method of claim 9, wherein the determining whether the target synthesized 3D point cloud and the third 3D point cloud are similar comprises:
if the loss value corresponding to the loss function is larger than the similarity threshold value and the target synthesized 3D point cloud is not similar to the third 3D point cloud, updating the target hidden data; alternatively, the first and second electrodes may be,
and if the loss value corresponding to the loss function is smaller than or equal to the similarity threshold value, and the target synthesized 3D point cloud is similar to the third 3D point cloud, the target synthesized 3D point cloud is the fourth 3D point cloud, and the fourth 3D point cloud is output.
11. The method of claim 10, wherein the updating the target hidden data comprises: and updating the target hidden data according to a gradient descent method.
12. A training method of a biological face alignment model is characterized by comprising the following steps:
acquiring first training data, wherein the first training data comprises position information of a plurality of biological face 3D point clouds and rotation parameters corresponding to each biological face 3D point cloud;
and training the first training data by utilizing a 3D graph convolution network to obtain a first biological face alignment model, wherein the output of the first biological face alignment model is a first rotation parameter, and the first rotation parameter is used for performing correction processing on the non-frontal biological face 3D point cloud.
13. The method of claim 12, wherein the training the first training data to obtain a first biological face alignment model using a 3D atlas network comprises:
and extracting the characteristics of the first training data through the 3D map convolution layer corresponding to the 3D map convolution network.
14. The method according to claim 12 or 13, wherein the first training data is normalized position information of the plurality of biological face 3D point clouds and a rotation parameter corresponding to each biological face 3D point cloud.
15. A training method of a biological face alignment model is characterized by comprising the following steps:
acquiring second training data, wherein the second training data comprises a plurality of groups of biological facial feature vectors and real biological facial 3D point clouds;
generating a countermeasure network according to the 3D image convolution, training the second training data to obtain a second biological face alignment model, wherein the 3D image convolution generation countermeasure network comprises a 3D image convolution generating network and a 3D image convolution judging network, and the 3D image convolution generating network model in the second biological face alignment model is used for generating a target synthetic 3D point cloud.
16. The method of claim 15, wherein the generating the countermeasure network from the 3D graph volume, training the second training data, comprises:
inputting the multiple groups of biological facial feature vectors into the 3D image convolution generation network to obtain a 3D point cloud to be verified;
inputting the real biological face 3D point cloud and the 3D point cloud to be verified into the 3D image convolution judgment network so as to respectively obtain a first probability and a second probability, wherein the first probability is the probability that the 3D point cloud to be verified is true, the second probability is the probability that the real biological face 3D point cloud is true, and the first probability and the second probability are used for obtaining a loss function for training the 3D image convolution judgment network; the first probability is used to obtain a loss function for training the 3D map volume generation network.
17. The method of claim 16, wherein inputting the plurality of sets of biometric facial feature vectors into the 3D atlas generation network to obtain a 3D point cloud to be verified comprises:
extracting the features of the multiple groups of biological facial feature vectors through the 3D image convolution layer corresponding to the 3D image convolution generating network so as to obtain the 3D point cloud to be verified;
the inputting the 3D point cloud of the real biological face and the 3D point cloud to be verified into the 3D volume determination network to obtain a first probability and a second probability, respectively, includes:
and extracting the characteristics of the 3D point cloud of the real biological face and the 3D point cloud to be verified through the 3D image convolution layer corresponding to the 3D image convolution judging network so as to obtain the first probability and the second probability.
18. An apparatus for aligning a biological face, the apparatus comprising an acquisition unit and a processing unit, the processing unit comprising a biological face correction module, a biological face symmetry module and a biological face hole filling module:
the acquisition unit is used for acquiring a first 3D point cloud, wherein the first 3D point cloud is a single non-frontal biological face 3D point cloud;
the biological face correcting module is used for correcting the first 3D point cloud by using a 3D image convolution network to obtain a second 3D point cloud;
the biological face symmetry module is used for carrying out symmetrical filling processing on the second 3D point cloud so as to obtain a third 3D point cloud;
and the biological face hole filling module generates a confrontation network by utilizing 3D image convolution, and performs hole filling processing on the third 3D point cloud to obtain a fourth 3D point cloud, wherein the fourth 3D point cloud is the 3D point cloud of the front biological face without the broken hole.
19. The apparatus of claim 18, wherein the biometric face rectification module utilizes a 3D volume network to rectify the first 3D point cloud to obtain a second 3D point cloud, comprising:
the biological face correcting module processes the first 3D point cloud by using the 3D image convolution network to obtain a first rotation parameter, and the first rotation parameter is used for correcting the first 3D point cloud.
20. The apparatus of claim 19, wherein the orthogonalizing the first 3D point cloud to obtain a second 3D point cloud comprises:
the biological face correction module is used for correcting the position information of the first 3D point cloud through the first rotation parameter;
wherein the second 3D point cloud comprises the corrected position information of the first 3D point cloud.
21. The apparatus of claim 20, further comprising:
the color information of the first 3D point cloud is kept unchanged, and the second 3D point cloud comprises the corrected position information of the first 3D point cloud and the color information of the first 3D point cloud corresponding to the position information.
22. The apparatus of claim 20 or 21, wherein the biological face symmetry module is configured to perform a symmetric registration process on the second 3D point cloud to obtain a third 3D point cloud, and comprises:
the biological face symmetry module is used for extracting partial point clouds of the second 3D point cloud;
the biological face symmetry module is used for carrying out symmetry processing on part of the point cloud of the second 3D point cloud so as to obtain the third 3D point cloud.
23. The apparatus of any one of claims 18 to 22, wherein the biological face hole filling module generates a countermeasure network by using 3D graph convolution, performs hole filling processing on the third 3D point cloud to obtain a fourth 3D point cloud, wherein the fourth 3D point cloud is a 3D point cloud of a non-hole front biological face, and includes:
the acquisition unit is used for acquiring target hidden data;
the biological face hole filling module generates target synthesis 3D point cloud according to the target hidden data;
and the biological face hole filling module is used for judging whether the target synthesized 3D point cloud is similar to the third 3D point cloud so as to obtain the fourth 3D point cloud.
24. The apparatus of claim 23, wherein the biometric facial hole filling module generates a target synthesized 3D point cloud from the target hidden data, comprising:
the biological face hole filling module is used for taking the target hidden data as the input of a 3D graph convolution generating network and outputting the target synthesized 3D point cloud, wherein the 3D graph convolution generating confrontation network comprises the 3D graph convolution generating network;
and generating a graph convolution layer of the network by utilizing the 3D graph convolution, and extracting characteristics of the target hidden data.
25. The apparatus of claim 23 or 24, wherein the biological face hole filling module is configured to determine whether the target synthetic 3D point cloud and the third 3D point cloud are similar to obtain the fourth 3D point cloud, and comprises:
and the biological face hole filling module judges whether the target synthesized 3D point cloud is similar to the third 3D point cloud or not by judging whether a loss function value corresponding to a loss function between the target synthesized 3D point cloud and the third 3D point cloud is less than or equal to a similarity threshold corresponding to the loss function.
26. The apparatus of claim 25, wherein the loss function comprises at least one of:
a distance absolute error between the target synthetic 3D point cloud and the third 3D point cloud;
similarity errors between layers of the target synthesized 3D point cloud and the third 3D point cloud in the same biological face recognition model;
similarity errors between layer features of the target synthetic 3D point cloud and the third 3D point cloud in a point set neural network;
an error between the target synthetic 3D point cloud and the third 3D point cloud at the biological facial key points.
27. The apparatus of claim 26, further comprising an output unit that determines whether the target synthesized 3D point cloud and the third 3D point cloud are similar, comprising:
if the loss value corresponding to the loss function is larger than the similarity threshold value and the target synthesized 3D point cloud is not similar to the third 3D point cloud, the biological face hole filling module is used for updating the target hidden data; alternatively, the first and second electrodes may be,
and if the loss value corresponding to the loss function is smaller than or equal to the similarity threshold value and the target synthesized 3D point cloud is similar to the third 3D point cloud, the target synthesized 3D point cloud is the fourth 3D point cloud, and the output unit is used for outputting the fourth 3D point cloud.
28. The apparatus of claim 27, wherein the biometric facial patch module updates the target hidden data, comprising: and the biological face hole filling module updates the target hidden data according to a gradient descent method.
29. An apparatus for training a biological face alignment model, comprising:
the training device comprises an acquisition unit, a processing unit and a display unit, wherein the acquisition unit is used for acquiring first training data, and the first training data comprises position information of a plurality of biological face 3D point clouds and rotation parameters corresponding to each biological face 3D point cloud;
and the processing unit is used for training the first training data by utilizing a 3D graph convolution network so as to obtain a first biological face alignment model, the output of the first biological face alignment model is a first rotation parameter, and the first rotation parameter is used for correcting the non-frontal biological face 3D point cloud.
30. The apparatus of claim 29, wherein the processing unit, using a 3D graph convolution network to train the first training data to obtain a first biological face alignment model, comprises:
and the processing unit extracts the characteristics of the first training data through the 3D map convolution layer corresponding to the 3D map convolution network.
31. The apparatus of claim 29 or 30, wherein the first training data is normalized position information of the plurality of biological face 3D point clouds and a rotation parameter corresponding to each biological face 3D point cloud.
32. An apparatus for training a biological face alignment model, comprising:
the acquisition unit is used for acquiring second training data, and the second training data comprises a plurality of groups of biological facial feature vectors and real biological facial 3D point clouds;
the processing unit is used for generating a countermeasure network according to the 3D image convolution, training the second training data to obtain a second biological face alignment model, wherein the 3D image convolution generation countermeasure network comprises a 3D image convolution generating network and a 3D image convolution judging network, and the 3D image convolution generating network in the second biological face alignment model is used for generating a target synthetic 3D point cloud.
33. The apparatus of claim 32, wherein the processing unit generates a countermeasure network from the 3D graph convolution, and trains the second training data, comprising:
the processing unit is used for inputting the multiple groups of biological facial feature vectors into the 3D image convolution generation network to obtain a 3D point cloud to be verified;
the processing unit is further configured to input the 3D point cloud of the real biological face and the 3D point cloud to be verified into the 3D image convolution determination network to obtain a first probability and a second probability, respectively, where the first probability is a probability that the 3D point cloud to be verified is true, the second probability is a probability that the 3D point cloud of the real biological face is true, and the first probability and the second probability are used to obtain a loss function for training the 3D image convolution determination network; the first probability is used to obtain a loss function for training the 3D map volume generation network.
34. The apparatus of claim 33, wherein the processing unit is configured to input the plurality of sets of biometric facial feature vectors into the 3D volume generation network to obtain a 3D point cloud to be verified, and further comprising:
the processing unit is used for extracting the features of the multiple groups of biological facial feature vectors through the 3D image convolution layer corresponding to the 3D image convolution generation network so as to obtain the 3D point cloud to be verified;
the processing unit is further configured to input the 3D point cloud of the real biological face and the 3D point cloud to be verified into the 3D graph convolution determination network to obtain a first probability and a second probability, respectively, and includes:
the processing unit is further configured to extract features of the 3D point cloud of the real biological face and the 3D point cloud to be verified through the 3D map convolution layer corresponding to the 3D map convolution determining network, so as to obtain the first probability and the second probability.
35. A computer-readable storage medium for storing program instructions which, when executed by a computer, the computer performs the method of biological face alignment of any one of claims 1 to 11, or the computer performs the method of training the biological face alignment model of any one of claims 12 to 14, or the computer performs the method of training the biological face alignment model of any one of claims 15 to 17.
36. A computer program product containing instructions which, when executed by a computer, cause the computer to carry out the method of biological face alignment of any one of claims 1 to 11, or which, when executed by a computer, cause the computer to carry out the method of training a biological face alignment model of any one of claims 12 to 14, or which, when executed by a computer, cause the computer to carry out the method of training a biological face alignment model of any one of claims 15 to 17.
CN202111097168.5A 2021-09-18 2021-09-18 Biological face alignment model training method, biological face alignment method and device Active CN113837053B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111097168.5A CN113837053B (en) 2021-09-18 2021-09-18 Biological face alignment model training method, biological face alignment method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111097168.5A CN113837053B (en) 2021-09-18 2021-09-18 Biological face alignment model training method, biological face alignment method and device

Publications (2)

Publication Number Publication Date
CN113837053A true CN113837053A (en) 2021-12-24
CN113837053B CN113837053B (en) 2024-03-15

Family

ID=78959828

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111097168.5A Active CN113837053B (en) 2021-09-18 2021-09-18 Biological face alignment model training method, biological face alignment method and device

Country Status (1)

Country Link
CN (1) CN113837053B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111652827A (en) * 2020-04-24 2020-09-11 山东大学 Front face synthesis method and system based on generation countermeasure network
CN111899353A (en) * 2020-08-11 2020-11-06 长春工业大学 Three-dimensional scanning point cloud hole filling method based on generation countermeasure network
CN112037320A (en) * 2020-09-01 2020-12-04 腾讯科技(深圳)有限公司 Image processing method, device, equipment and computer readable storage medium
CN112200905A (en) * 2020-10-15 2021-01-08 革点科技(深圳)有限公司 Three-dimensional face completion method
CN112767554A (en) * 2021-04-12 2021-05-07 腾讯科技(深圳)有限公司 Point cloud completion method, device, equipment and storage medium
US20210256776A1 (en) * 2018-09-12 2021-08-19 Sony Interactive Entertainment Inc. Method and system for generating a 3d reconstruction of a human

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210256776A1 (en) * 2018-09-12 2021-08-19 Sony Interactive Entertainment Inc. Method and system for generating a 3d reconstruction of a human
CN111652827A (en) * 2020-04-24 2020-09-11 山东大学 Front face synthesis method and system based on generation countermeasure network
CN111899353A (en) * 2020-08-11 2020-11-06 长春工业大学 Three-dimensional scanning point cloud hole filling method based on generation countermeasure network
CN112037320A (en) * 2020-09-01 2020-12-04 腾讯科技(深圳)有限公司 Image processing method, device, equipment and computer readable storage medium
CN112200905A (en) * 2020-10-15 2021-01-08 革点科技(深圳)有限公司 Three-dimensional face completion method
CN112767554A (en) * 2021-04-12 2021-05-07 腾讯科技(深圳)有限公司 Point cloud completion method, device, equipment and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
张加强: "基于生成模型的三维重建算法研究及实现", 中国优秀硕士学位论文全文数据库 信息科技辑 *

Also Published As

Publication number Publication date
CN113837053B (en) 2024-03-15

Similar Documents

Publication Publication Date Title
CN108875522B (en) Face clustering method, device and system and storage medium
US9818023B2 (en) Enhanced face detection using depth information
US11232286B2 (en) Method and apparatus for generating face rotation image
US9747493B2 (en) Face pose rectification method and apparatus
Chen et al. Human action recognition using star skeleton
Cohen et al. Inference of human postures by classification of 3D human body shape
EP3928248A1 (en) Neural network for skeletons from input images
CN111291885A (en) Near-infrared image generation method, network generation training method and device
CN108596193B (en) Method and system for building deep learning network structure aiming at human ear recognition
CN108182397B (en) Multi-pose multi-scale human face verification method
KR102476016B1 (en) Apparatus and method for determining position of eyes
WO2015195301A1 (en) Obtaining structural information from images
Zheng et al. Attention-based spatial-temporal multi-scale network for face anti-spoofing
KR20170092533A (en) A face pose rectification method and apparatus
Núnez et al. Real-time human body tracking based on data fusion from multiple RGB-D sensors
JP2021136012A (en) Method and apparatus for detecting liveness based on phase difference
Azis et al. Weighted averaging fusion for multi‐view skeletal data and its application in action recognition
US20160182769A1 (en) Apparatus and method for generating motion effects by analyzing motions of objects
KR20190018274A (en) Method and apparatus for recognizing a subject existed in an image based on temporal movement or spatial movement of a feature point of the image
Gheitasi et al. Estimation of hand skeletal postures by using deep convolutional neural networks
US11430146B2 (en) Two-stage depth estimation machine learning algorithm and spherical warping layer for EQUI-rectangular projection stereo matching
Kanaujia et al. Part segmentation of visual hull for 3d human pose estimation
Wang et al. Handling occlusion and large displacement through improved RGB-D scene flow estimation
Huang et al. Multi‐class obstacle detection and classification using stereovision and improved active contour models
CN111339973A (en) Object identification method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant