CN111739168B

CN111739168B - Large-scale three-dimensional face synthesis method with suppressed sample similarity

Info

Publication number: CN111739168B
Application number: CN202010610545.XA
Authority: CN
Inventors: 罗国亮; 肖乾; 杨辉; 陈梦成; 曹义亲; 朱志亮; 童杨
Original assignee: East China Jiaotong University
Current assignee: East China Jiaotong University
Priority date: 2020-06-30
Filing date: 2020-06-30
Publication date: 2021-01-29
Anticipated expiration: 2040-06-30
Also published as: CN111739168A

Abstract

The invention discloses a large-scale three-dimensional face synthesis method for sample similarity inhibition, which comprises the following steps: acquiring a first three-dimensional face model set; constructing a large-scale three-dimensional face synthesis network model, and synthesizing the three-dimensional face model of the first three-dimensional face model set to obtain a synthesized three-dimensional face model; and constructing a sample similarity suppression network model, and carrying out similarity detection on the synthesized three-dimensional face model and the obtained three-dimensional face model set by using the sample similarity suppression network model so as to output the synthesized three-dimensional face model qualified in similarity detection. According to the invention, on the basis of designing the three-dimensional face synthesis model to synthesize the three-dimensional face model, the data privacy problem required by the three-dimensional face model synthesis technology is fully considered, and the similarity of the synthesis model and the actually acquired face model is inhibited by designing a similarity inspection model, so that the output three-dimensional face synthesis result can meet the special requirement of the face data privacy.

Description

Large-scale three-dimensional face synthesis method with suppressed sample similarity

Technical Field

The invention relates to the technical field of three-dimensional face reconstruction, in particular to a large-scale three-dimensional face synthesis method for sample similarity suppression.

Background

In the world virtual reality industry congress of 10 months in 2018, professor yellow iron force of Beijing university proposes that traditional two-dimensional images and videos are not real visual media for human visual nervous system to observe the world. Therefore, the research facing the three-dimensional model is more suitable for observing the objective existence of the object by human beings. In common three-dimensional model data, due to the individual identity representation of a three-dimensional face model and the special attributes of a personal image and an interactive portal, the three-dimensional face model data is widely applied to various fields such as virtual human simulation in games, movies and television, simulation systems and the like. For example, the widely known apple cell phone faceID technology for detecting three-dimensional face depth information has been successfully applied in industrialization. With the rapid development of three-dimensional scanning technology and three-dimensional modeling technology based on vision and depth maps, the acquisition modes of three-dimensional face models are diversified. Although the three-dimensional reconstruction technology is gradually developed and matured in the aspects of hardware and algorithm, the reconstructed three-dimensional face model is not an artificially synthesized model and cannot meet the requirements of wide anonymized virtual face models. Therefore, while the three-dimensional reconstruction of the real human face model is rapidly developed, the research of the artificial synthesis method for the three-dimensional human face model will gradually become one of the main research subjects in the field of computer graphics.

The Generative Adaptive Network (GAN) model proposed by Goodfellow in 2013 opens another door to the data synthesis method. To date, GAN-based image synthesis methods have been widely discussed and studied by researchers for performance optimization. The GAN model has the characteristic of ensuring the quality of the synthetic result, so that the GAN model is almost the first choice tool for the research of the human face synthetic method. However, there are still many deficiencies in training the underlying GAN model: the random gradient descent (SGD) problem, the gradient collapse problem of the generator as the discriminator becomes more and more accurate, makes GAN require further research work in both stability and large-scale input data computational optimization. In order to improve the stability of the GAN model, the researchers provide that the stability of the GAN model is further dynamically optimized through mean value and covariance feature matching of the McGAN model; in order to improve the large-scale data processing capability of the GAN model, because the convergence speed of the discriminator is faster than that of the generator, the generator/discriminator proposed by scholars uses different training periods to improve the data processing efficiency.

However, through relevant research at home and abroad, we have analyzed and found that, up to now, research based on generation of an antagonistic network model mainly focuses on improvement of calculation energy and stability of the model and application research oriented to automatic synthesis of data such as images and videos, but research and research on related problems of an automatic synthesis method oriented to a three-dimensional model, particularly a three-dimensional face model are seriously insufficient, and the method cannot adapt to the requirement of automatic synthesis of the three-dimensional face model. In 2018, the professor of computer aided design and graphic national key laboratory Zhou Kun of Zhejiang university also put forward the concept of intelligent graphics in many important occasions such as China graphic society (China graphic 2018), ACM SIGGAPHAsia 2018 and the like, and advocates that the artificial intelligence method is widely applied to the field of computer graphics research. Artificial intelligence era published by the eu committee in 3 months of 2018: the development direction of human-oriented artificial intelligence is put forward in the establishment of human-oriented European strategy. Under such background requirements, human face data is taken as highly sensitive privacy and identity information, and data privacy protection is a key consideration for researchers and application industries in artificial intelligence. It can be seen that under the drive of many special requirements of "anonymity" and copyright limitation, such as a virtual human, how to pay attention to the data privacy problem while developing a synthetic three-dimensional face model is also an indispensable requirement for industry development.

In addition, due to the wide demand of the industries such as film and television, games, virtual reality and the like, the three-dimensional face model has become a research hotspot and a focus of experts and scholars related to the field of computer graphics at home and abroad, and an analysis processing technology facing the three-dimensional face model is developed accordingly. Regarding the feature descriptors of the three-dimensional model, the feature descriptors for the three-dimensional face model are roughly classified into three types of methods based on feature points, feature curves, and local surface statistics. However, these methods are mainly used for face identity or expression recognition, but most of them are irreversible methods by extracting face geometric feature information, and cannot be directly applied to three-dimensional face reconstruction. That is, it is a task that cannot be realized by reversely restoring the three-dimensional model based on curvature data without coordinate information, and therefore, the existing three-dimensional face model description method has certain defects.

Disclosure of Invention

The invention aims to solve the problems that the existing three-dimensional face synthesis model only focuses on an optimization model and does not consider the special requirement of face data privacy required by the three-dimensional face synthesis model technology in the prior art, and provides a large-scale three-dimensional face synthesis method with sample similarity inhibition.

In order to achieve the above purpose, the invention provides the following technical scheme:

a large-scale three-dimensional face synthesis method for sample similarity suppression comprises the following steps:

step 1, acquiring a three-dimensional face model, and preprocessing the acquired three-dimensional face model to obtain a first three-dimensional face model set;

step 2, constructing a large-scale three-dimensional face synthesis network model, inputting the first three-dimensional face model set into the large-scale three-dimensional face synthesis network model for synthesizing the face three-dimensional model to obtain a synthesized three-dimensional face model; the first three-dimensional face model set is subjected to down-sampling to obtain a second three-dimensional face model set;

and 3, constructing a sample similarity suppression network model, and detecting the similarity of the synthesized three-dimensional face model and the second three-dimensional face model set by using the sample similarity suppression network model so as to output the synthesized three-dimensional face model meeting the similarity requirement.

Preferably, the sample similarity suppression network model is constructed based on a trained discarding depth network model. Preferably, the sample similarity suppression network model calculates the similarity between the synthesized three-dimensional face model and the second three-dimensional face model set in a feature extraction manner, and outputs the synthesized three-dimensional face model with the similarity lower than a threshold value, so as to suppress the similarity between the synthesized three-dimensional face model and the second three-dimensional face model set.

Preferably, the network model of the discarding deep network model is:

p_o＝f^q(m_o)＝f^q(w_i*m_i*p_i+b)

wherein m is_oScalar output representing all mesh nodes of the face model based on the discarded network model code, f^qRepresents a normalized probability check function for the output, where i 1., | z |, | z | represents the face structured representation mesh node m_iNumber of nodes of, said f^qThe function is an activation function or a gaussian function.

Preferably, the step 1 comprises:

step 101, acquiring a three-dimensional face model and point cloud data thereof, and performing maximum boundary linear normalization processing on the point cloud data of the acquired three-dimensional face model to obtain a depth map corresponding to the three-dimensional face model;

and 102, carrying out gridding surface fitting on the obtained depth map to obtain a grid depth map matrix corresponding to the three-dimensional face model, wherein the obtained grid depth map matrix is the first three-dimensional face model set.

Preferably, the obtained depth map is fitted with a gridded surface by a gradient-constrained smoothing method and a ridge regression solution.

Further, the large-scale three-dimensional face synthesis network model is constructed based on a trained generative confrontation network model integrated with the self-attention network.

Preferably, the training stability of the generative confrontation network model fused with the self-attention network is improved by introducing the spectrum norm.

Preferably, the training time of the generative confrontation network model of the converged self-attention network is reduced by introducing a dual-time scale update rule.

Preferably, an embedded gaussian function is adopted as a calculation function of the self-attention network, and an output layer of the self-attention network is constructed into a residual form based on a non-local operation operator.

Compared with the prior art, the invention has the beneficial effects that:

1. the invention provides a three-dimensional face synthesis method based on three-dimensional face sample similarity detection and inhibition, which fully considers the data privacy problem required by the three-dimensional face model synthesis technology on the basis of designing a three-dimensional face synthesis model for three-dimensional face model synthesis, and inhibits the similarity between the synthesis model and a real collected face model by designing a similarity test model, so that the similarity between the corresponding automatic synthesis result of the three-dimensional face model and the real face is lower than a threshold value, the difference between the three-dimensional synthesis face model and a sample (the real face model) is ensured, the special requirement on the face data privacy can be met, and the method is safer and more reliable.

2. The invention provides a (grid) structured representation method of a three-dimensional face model aiming at the problems that geometric information existing in a three-dimensional face model feature descriptor based on face geometric information in the prior art is irreversible and cannot be directly applied to three-dimensional face reconstruction, wherein point cloud data of a three-dimensional face with high reversibility is fully utilized, and the three-dimensional face point cloud model is structurally processed, so that the three-dimensional face model can be directly used as the input of a deep neural network model like an image, and more possibilities are opened up for the intelligent processing of the three-dimensional face model.

3. The invention provides a data-driven large-scale three-dimensional human face artificial synthesis method, which improves the calculation capability of generating a confrontation model for processing large-scale data in the traditional synthesis technology by fusing a self-attention model.

4. The stability of the three-dimensional face synthesis model designed by the invention in the large-scale data training process is improved by introducing various methods such as a spectrum norm, a double-time scale rule and the like, and the large-scale three-dimensional face synthesis model with high efficiency, convenience, reliability and high precision can be obtained.

Description of the drawings:

fig. 1 is a schematic block diagram of a large-scale three-dimensional face synthesis method with sample similarity suppression according to an exemplary embodiment of the present invention.

Fig. 2 is a schematic diagram of an acquired real three-dimensional face model according to an exemplary embodiment of the present invention.

Fig. 3 is a schematic diagram of a three-dimensional face model obtained after downsampling according to an exemplary embodiment of the present invention.

Fig. 4 is a schematic diagram of a synthesized three-dimensional face model obtained by synthesizing a model according to an exemplary embodiment of the present invention.

Fig. 5 is a schematic diagram of a three-dimensional face model satisfying similarity requirements output by a joint suppression model according to an exemplary embodiment of the present invention.

Fig. 6 is a design diagram of a three-dimensional face structural representation method according to an exemplary embodiment of the present invention.

Fig. 7 is a schematic diagram of a non-local self-attention model representing the effect of a three-dimensional face model according to an exemplary embodiment of the present invention.

Fig. 8 is a schematic diagram of a three-dimensional face self-attention network model and an output result thereof according to an exemplary embodiment of the invention.

Fig. 9 is a schematic diagram of a drop network model according to an exemplary embodiment of the present invention.

Fig. 10 is a schematic diagram of a sample similarity verification method for merging a self-attention network and a discarded network model according to an exemplary embodiment of the present invention.

Fig. 11 is a schematic structural diagram of a large-scale three-dimensional face synthesis system with sample similarity suppression according to an exemplary embodiment of the present invention.

Fig. 12 is a diagram illustrating a scanning result of a three-dimensional scanning device according to an exemplary embodiment of the present invention.

Detailed Description

The present invention will be described in further detail with reference to test examples and specific embodiments. It should be understood that the scope of the above-described subject matter is not limited to the following examples, and any techniques implemented based on the disclosure of the present invention are within the scope of the present invention.

Example 1

Fig. 1 shows a technical route of a sample similarity suppressed large-scale three-dimensional face synthesis method according to an exemplary embodiment of the present invention. The method comprises the following steps:

step 1, acquiring a three-dimensional face model, and preprocessing the acquired three-dimensional face model to obtain a first three-dimensional face model set (denoted as FaceDataSet1) shown in fig. 2;

step 2, down-sampling the first three-dimensional face model set to obtain a second three-dimensional face model set (also called FaceDataSet2) as shown in fig. 3;

constructing a large-scale three-dimensional face synthesis network model, inputting the first three-dimensional face model set into the large-scale three-dimensional face synthesis network model for synthesizing the face three-dimensional model, and obtaining a plurality of synthesized three-dimensional face models (fig. 4 shows a three-dimensional face model synthesized by the embodiment);

step 3, a sample similarity suppression network model is constructed, similarity detection comparison is carried out on the synthesized three-dimensional face model obtained in the step 2 and a second three-dimensional face model set obtained in the down-sampling mode through the sample similarity suppression network model, a synthesized three-dimensional face model which satisfies the similarity requirement (is lower than a threshold value) after the similarity suppression is screened out from the multiple synthesized three-dimensional face models obtained in the step 2 and is output, and fig. 5 shows that in the embodiment, the synthesized three-dimensional face model which satisfies the similarity requirement is output through the suppression network model. As can be seen, fig. 5 is less similar to a real human face than the example of fig. 4. In this embodiment, the invention provides a three-dimensional face synthesis method based on three-dimensional face sample similarity detection and suppression, which fully considers the problem of data privacy required by the three-dimensional face model synthesis technology on the basis of designing a three-dimensional face synthesis model for three-dimensional face model synthesis, and suppresses the similarity between the synthesis model and a truly acquired face model by designing a similarity detection model, so that the corresponding automatic synthesis result of the three-dimensional face model can meet the special requirement of face data privacy and is safer and more reliable. The large-scale method involved in the invention is that facial expression features are more (three dimensions are diversified), rather than a simple static image, and the method can also be called multi-scale and comprehensive modeling analysis in the field.

Example 2

Specifically, the invention takes a real three-dimensional face data set as a drive, and further adopts a discarded deep network model to realize the sample similarity detection and inhibition of the artificial synthesized face on the basis of generating a large-scale three-dimensional face by fusing a deep neural network model technology, and the specific scheme is as follows:

(1) three-dimensional face model structured representation

For any three-dimensional face model, according to the special attributes of the face, the invention designs a gridded Depth map (Depth map), thereby realizing the structural representation method of the three-dimensional face model.

Firstly, for any three-dimensional human face point cloud model

(N represents the total number of face models in the data set,

also denoted point cloud set in the ith personal face point cloud model), we can align and transpose the face model proof to the XOY plane with the existing maturation tool and align the face up and down with the Y axis, while detecting the nose tip apex and translating the face model centrally to the origin of coordinates O, as shown in fig. 6 (top left). The projection boundary of the point cloud model in the XOY plane is [ [ X ]_min，X_max]，[Y_min，Y_max]]Further using the same scaling factor max (| X)_min|，|X_max|，|Y_min|，|Y_max|), the maximum boundary linear normalization, linearly scaling all model space points to [ -1, 1 ] in three dimensions of XYZ]As shown in fig. 6 (left).

Further, to realize the structured representation of the face model, the XOY plane is gridded according to the distance d, and the depth map curved surface is represented as F ═ z (x, y), x ═ 1: d: 1, y-1: d: 1, where Z (x, y), represents the corresponding surface space point Z-axis coordinate value in XOY plane (x, y) coordinates. For mesh surface F_iFurther triangulation is performed as shown in fig. 6 (right). If F is the given original face point cloud model F^oGridding the fitted surface of (F)^oAll spatial points included in (a) are located on the F-surface, i.e. microscopically, F^oEach point in F is on a respective triangular patch.

Fig. 6 is a schematic design diagram of a three-dimensional human face structural representation method. Left: three-dimensional face transposition and depth map representation thereof; and (3) right: fitting a mesh surface of the human face point cloud model (yellow represents a mesh point, red represents an original point of the point cloud model, and depth represents a weight of a corresponding point)

As shown in FIG. 6 (right), all spatial points are first divided into v by the up/down triangle_tAnd v_bTwo types. With v_bFor example, as shown in FIG. 6 (right), v is illustrated for convenience of description_bTriangle Δ v_i-1，jv_i，jv_i，j+1Corresponding simplified description as Δ v₁v₂v₃. Then, assume that the curved surface F corresponds to v₁，v₂，v₃Depth value z of a point₁，z₂，z₃Are each v₁，v₂，v₃The weight value of (c) can be obtained according to the barycentric coordinates (barycentric coordinates) principle:

w₁×z₁+w₂×z₂+w₃×z₃＝z(v_b)(1)

wherein, w₁，w₂，w₃Is point v_bThe weight value of which the distance to the corresponding vertex is inversely proportional, z (v)_b) Representing the XOY plane point v_bDepth value on the curved surface F. Due to the triangle Deltav₁v₂v₃The characteristics of the isosceles right triangle of (a) can be calculated as follows:

wherein x (v)_b) And y (v)_b) Representing the XOY plane point v_bAnd the% is the modulus operation. In fact, the modulo operation is used in equation (2) above to estimate the slope. Similarly, the upper triangle Δ v will be the same_{i-1，jvi-1，j+1}v_i，j+1Expressed as Δ v in formula (1)₁v₂v₃In the case where the formula (1) represents invariance, the point v may be further calculated_tThe corresponding weight values are expressed as follows:

obviously, for the original face point cloud model F^oThe formula (1-3) can be expressed as follows:

W×z(F)＝z(F^o) (4)

since the weight matrix W can be calculated according to the formulas (2) and (3), according to the basic operation rule of the matrix,

z(F)＝W\z(F^o)

in fact, the formula (4) can also be based on the ridge regression estimation method, and the solution method is as follows:

z(F)＝(W^TW)^-1W^Tz(F^o) (5)

further, the equation (4) can solve the smoothness of the space assurance curved surface F through constraint. Taking fig. 6 (right) as an example, the gradient is first constrained in the X-axis direction as follows:

z(v_i，j-1)-2·z(v_i，j)+z(v_i，j+1)＝0 (6)

similarly, the gradient is constrained in the Y-axis direction as follows:

z(v_i-1，j)-2·z(v_i，j)+z(v_i+1，j)＝0 (7)

the above equations (5) and (6) represent global optimal fitting of equation (4) while ensuring local monotonicity in the X-axis and Y-axis directions. Further, the above two formulas (5) and (6) can be respectively and commonly expressed as:

W_x×z(F)＝0； (8)

W_y×z(F)＝0； (9)

the following equations (4), (8) and (9) can be combined:

the above equation (10) can be solved by referring to the ridge regression optimization method shown in equation (5).

Therefore, any three-dimensional human face point cloud model

Can be used for dredgingThe grid depth map matrix z (x, y) is expressed as a curved surface F_iIt is a structured representation like a grayscale image. Therefore, the deep neural network can be directly used for analyzing and processing the three-dimensional face model, and more possibilities are increased for application research of the three-dimensional face model. The structural representation method of the three-dimensional face model provides a new way for analyzing and processing the data by applying a deep neural network, and is important for designing a large-scale three-dimensional face synthesis method and a face sample similarity inhibition method.

(2) Large-scale three-dimensional face synthesis method

Large-scale three-dimensional face model global feature representation

The human face model provided by the invention is data which is sensitive to precision and pays attention to details, and if the generated model is asymmetric in left and right faces, the data synthesis quality is seriously influenced. The traditional convolution neural network is difficult to realize the capture of data characteristics in a large range, for example, when the left eye area is convoluted, the left eye area cannot be influenced by the right eye, and a synthesized face model easily lacks the basic characteristics of a face structure. We note that the self-attention mechanism in computer vision is gradually widely applied as a non-local operation method, and a response of a certain pixel in a picture is represented by taking a weighted average value of all pixel points in a picture of interest. The method is very effective in quantifying the correlation between non-local samples by weight. Fig. 7 is a schematic diagram showing the effect of the three-dimensional face model represented by the non-local self-attention model. Wherein, the right: the blue arrows represent the non-local attention model operations of the corresponding face curvature points. Left lower: an original model; upper left: a three-dimensional face model represented by self-attention.

First, according to the definition of non-local operations in computer vision, the non-local operations in deep learning can be expressed as:

where i is one of the positions of the output face mesh representation and j is an index of all possible positions. In thatIn the local convolution operator, j is generally more than or equal to i-1 and less than or equal to i +1, and j comprises all nodes in the three-dimensional face mesh representation. z is input three-dimensional face representation, y is a synthetic face, and c (z) is a normalization function to ensure that the whole information is unchanged before and after transformation; g is a univariate input function for directly transforming the information of the input z, usually a convolution function g (z) for computing a weighted sum_j)＝W_sz_j。f^pIs a pair calculation function for calculating the correlation of the ith position with all other positions. Among them, gaussian function, embedded gaussian function, dot product and mosaic are four most common methods, and the result shows the calculation function f^pThe selection of (a) is insensitive to non-local model effects. For the sake of understanding, the present invention takes a commonly used embedded gaussian as an example, that is:

fig. 8 is a schematic diagram illustrating a three-dimensional face self-attention network model and an output result thereof according to an exemplary embodiment of the present invention. Left: a self-attention network model; the bottom right is the three-dimensional face model represented using the self-attention model, with the arrow pointing to the red point shown as the mainly relevant non-local point in the top right original model.

The characteristic of the embedded Gauss is to z_iAnd z_jAt embedded theta and

the gaussian distance is spatially calculated.

Is a corresponding weight matrix

I.e. correspondinglyA convolution function. Further, a normalization function is calculated as

Equation (11) can be re-expressed in the form of softmax:

finally, for y output of equation (11)_iMultiplied by a scaling parameter gamma and added back to the input element map z_iThe output of the final face mesh representation position i is given by:

z′_i＝γ·y_i+z_i (14)

it can be seen that the output is constructed in equation (14) in the form of a residual based on a non-local operator. The advantage of this approach is that it can be embedded into any pre-trained network at will, since as long as the set γ is initialized to 0, there is no impact, and then new weights are learned in the transfer learning. This way, the pre-training weights are not rendered unusable by the introduction of new modules.

Self-attention generating antagonistic network model

The conventional generative confrontation network model includes two deep network models, a Generator (Generator, G) and a Discriminator (Discriminator, D). The mechanism of operation for generating the countermeasure network model resembles a zero-sum game (zero-sum game), which is expressed as follows:

in the above formula, the generation model G captures the distribution of sample data, and generates a sample similar to real training data with noise s obeying a certain distribution (uniform distribution, gaussian distribution, etc.), and the pursuit effect is better as the real sample is; the discriminant model D is a two-classifier that estimates the probability that a sample is from training data (rather than from the generated data), and if the sample is from real training data, D outputs a large probability, otherwise, D outputs a small probability. In the training process, one side is fixed, the network weight of the other side is updated, and iteration is performed alternately, and in the process, both sides optimize own networks to the utmost extent, so that competitive confrontation is formed until both sides reach a dynamic balance (Nash equilibrium). It can be seen that, when the network G is generated fixedly, the optimization of the discrimination network D can be understood as follows: the input is from real data, the D optimization network structure enables the D optimization network structure to output 1, the input is from generated data, and the D optimization network structure enables the D optimization network structure to output 0; when the discrimination network D is fixed, G optimizes the network to output the sample as much as the real data, and the generated sample is discriminated by D to output high probability by D.

Looking back at the basic non-local operator in equation (14), we initialize γ to 0 and then learn gradually to assign more weight to the non-local features. By learning a simple task first and then gradually increasing the complexity of the task. Fusing a self-attention network and a generation countermeasure network, applying a self-attention module to a generator G and a discriminator D, and designing loss function expressions of the generator G and the discriminator D as follows:

the two formulas are combined with the formula (15) to generate a confrontation network model, and a generator G and a discriminator D are trained in an alternating mode of minimizing confrontation loss, so that a three-dimensional face synthesis method model integrating a self-attention model and generating the confrontation network model is finally realized.

(3) Three-dimensional human face sample similarity suppression model

The invention provides a three-dimensional human face model synthesis method, which aims to generate an artificially synthesized human face model with high fidelity so as not to involve infringement in various applications. In general, the above-mentioned confrontation network model based on attention large-scale generation can synthesize a real face, the similarity between the synthesized model and the real face is related to the precision of the network model, and the precision of the synthesized model (the similarity with the real face) in this embodiment is generally about 50% to 75%. However, in order to further ensure the difference between the generated face model and the training sample, the invention provides a sample similarity suppression module based on a discarded self-encoder to realize the screening and the discrimination of the generated model. The sample similarity suppression network model based on the discarded self-encoder calculates the similarity between the synthesized three-dimensional face model and the second three-dimensional face model set through feature extraction (feature comparison and similarity calculation are performed by adopting a face geometric information extraction mode in a common three-dimensional face reconstruction technology), and outputs the synthesized three-dimensional face model with the similarity lower than a threshold (in the embodiment, the threshold can be taken from 52% to 57%) so as to suppress the similarity between the synthesized three-dimensional face model and the real three-dimensional face model set. Therefore, on the basis of designing a three-dimensional face synthesis model for synthesizing the three-dimensional face model, the data privacy problem required by the three-dimensional face model synthesis technology is fully considered, and the similarity of the synthesis model and the real collected face model is inhibited by designing a similarity inspection model, so that the corresponding automatic synthesis result of the three-dimensional face model can meet the special requirement of the privacy of the face data and is safer and more reliable. Fig. 9 shows a drop network model architecture diagram of an exemplary embodiment of the present invention.

As the name implies, the discarding of the self-encoder is that the hidden layer node of the neural network is based on the preset probability p in the training process_i(i ═ 1., | z |, | z | representing the face structured representation mesh node m_iThe number of nodes) occur randomly, there is no guarantee that any two implied nodes will occur at the same time each time, as shown in fig. 9. Therefore, the updating of the weight value is not dependent on the joint action of the implicit nodes with fixed relations, and the situation that some characteristics are effective only under other specific characteristics is prevented. According to fig. 9, a probability factor is added to each link unit of the training network, and the neural network model is represented as follows:

p_o＝f^q(m_o)＝f^q(w_i*m_i*p_i+b) (16)

wherein m is_oScalar output representing all mesh nodes of the face model based on the discarded network model code, f^qRepresenting a normalized probability check function for the output. w is a_iIs the weight of node i and b is the bias term. In practice, f^qThe function can be evaluated and optimized in the activation function, the Gaussian function and the like.

Finally, check function f is checked^qProbability of output p_oAnd (4) carrying out binarization by adopting a threshold value, evaluating the similarity of the generated data and the training sample not to be excessive, and finally realizing sample similarity suppression, as shown in fig. 9.

(4) Optimization discussion of stability and large-scale data processing performance of three-dimensional face synthesis method

It is noted that the feature and advantage of generating the confrontation network model is that under the condition that the training process is stable, it can be ensured that G finally outputs the synthetic data similar to the sample. However, since the generation of the confrontation network model requires the training of both the G and D deep network models at the same time when in use, the stability and the large-scale data processing capability of the training of processing the large-scale face sample data still remain to be researched. The invention aims to adopt the following strategies to develop the optimization discussion of the three-dimensional face synthesis method for inhibiting the sample similarity:

sampling the training face model

The large-scale face model synthesis method provided by the invention is based on a self-attention network model. Based on the fact that redundant information exists in adjacent face grid nodes, the invention aims to uniformly sample the training face model, thereby further improving the training efficiency of the deep network model under the condition of not influencing the training quality.

Spectral norm Normalization (Spectral Normalization)

The generation of the confrontation network model by self attention is a mode of adding the spectrum norm normalization to both the discriminator and the generator, namely, dividing the network parameters of each layer by the spectrum norm (the maximum singular value of the weight matrix) of the parameter matrix of the layer. The discriminator meets the 1-lipschitz limit, the solution search space is obviously reduced, the calculated amount is reduced, and the problem of gradient collapse is relieved; meanwhile, gradient abnormity caused by excessive parameters of the generator is avoided; thereby making the whole training more stable and efficient.

Dual timescale update rule (two-timescale update rule, TTUR)

In previous work on generating confrontational network models, regularization of discriminators generally slows down the learning process of network models. In practice, methods using regularized discriminators typically require multiple update steps per generator during training. The present invention contemplates a training method that uses separate learning rates for the generator and the arbiter, i.e., a dual time scale update rule (TTUR). Thus, TTUR is used exclusively to compensate for the problem of slow learning in regularizing classifiers, making it possible to use fewer generator steps for each classifier step. Using this method we can produce better results in the same unit time.

Sample similarity checking method for fusing self-attention model and discarded network

When the sample similarity of the synthetic face data is checked, because extra calculation burden is added to large-scale input data, the invention aims to perform downsampling on all face models in a training sample data set (as shown in fig. 3). The method is used for data similarity calculation formula, and is verified not to influence the data classification result. Meanwhile, since the human face represents the topological correlation of the adjacent nodes of the mesh, the invention is further intended to be based on the self-attention model (fig. 8) for all mesh nodes m shown in fig. 9_iSelf-attention encoding is performed as shown in fig. 10. Surprisingly, the downsampled input data loses part of the adjacent information, which makes the fusion of the node information codes fused with the self-attention model shown in fig. 10 especially necessary.

With the diversified development of three-dimensional face model making and scanning technologies, online three-dimensional face data sets are gradually increased in recent years, and necessary data accumulation is provided for researching and designing a data-driven three-dimensional face synthesis algorithm. In combination with the research accumulation of the applicant and the investigation of the three-dimensional face data set, the data set for developing and testing the algorithm model proposed by the present invention is listed as follows: the three-dimensional facial expression database (faceWarehouse) established by Zhou Kun professor at Zhejiang university collects 150 face models of different ethnicities between 7 and 80 years old, and constructs 150 × 47 three-dimensional grid models (corresponding to 47 basic expressions of 150 persons) with the same topological connection relation. The "BU-3 DFE" three-dimensional face dataset created by Yin Lijun et al, University of Binghamton University (Binghamton University), new york state, contains 700 face models, including 100 (56 women 44 men) individuals from six different skin color ethnicities, age areas 18-70 years, all over the world, each presenting in turn 7 different expressions. Similarly, the university of Callos King, Huan, Spain (Universal Red itself Juan Carlos) established a data set of 549 three-dimensional face models from 51 acquisition subjects. The data sets have important significance for researching three-dimensional face synthesis oriented to different ethnicities and different expressions. The 'BJUT-3D Chinese Face Database' three-dimensional Face data set established by professor of Beijing industry contains 500 expressionless Chinese Face three-dimensional models, and non-Face parts are cut, so that the quality of the Face models is ensured.

In a further embodiment of the present invention, as shown in fig. 11, there is also provided a large-scale three-dimensional face synthesis system with sample similarity suppression, including:

the three-dimensional scanning system apparatus (handheld three-dimensional scanners Go SCAN 20 and Go SCAN 50) as shown in fig. 12 is used for scanning to obtain a three-dimensional face model and outputting the three-dimensional face model to the first processing module;

the first processing module is used for receiving the three-dimensional face model and carrying out structuralization processing on the three-dimensional face model to obtain a first three-dimensional face model set; outputting the first three-dimensional face model set to a second processing module and a third processing module; the second processing module is used for receiving the first three-dimensional face model set; configuring a large-scale three-dimensional face synthesis network model, synthesizing a face three-dimensional model based on the first three-dimensional face model set to obtain a synthesized three-dimensional face model, and outputting the synthesized three-dimensional face model to a third processing module; the third processing module is used for carrying out downsampling on the first three-dimensional face model set to obtain a second three-dimensional face model set; and configuring a sample similarity suppression network model, wherein the sample similarity suppression network model can carry out similarity detection on the synthesized three-dimensional face model and the second three-dimensional face model set so as to output the synthesized three-dimensional face model qualified in similarity detection.

In conclusion, the scheme provides a method for detecting and inhibiting the similarity of the three-dimensional face sample to verify the synthetic result of the three-dimensional face model so as to obtain a more reliable synthetic result. The invention provides a data-driven large-scale three-dimensional human face artificial synthesis method, integrates a deep neural network technology, and designs a large-scale three-dimensional human face synthesis method with suppressed sample similarity. Although the training of the deep neural network usually requires processing of a large training sample and a large amount of training calculation, once training is completed and all network parameters are obtained, the large-scale artificial three-dimensional face model can be rapidly synthesized only by using a generation network model of the trained deep neural network model. Therefore, the data-driven sample similarity suppression large-scale three-dimensional face synthesis method is more convenient, efficient and reliable in the long-term. The (grid) structured representation method of the three-dimensional face model designed by the scheme can enable the three-dimensional face model to be directly used as the input of the deep neural network model like an image, opens up more possibilities for the intelligent processing of the three-dimensional face model, improves the computing capability of processing large-scale data by a traditional generation confrontation model due to the integration of a self-attention model, and improves the stability of the generation confrontation network model in the large-scale data training process by introducing various methods such as spectrum norm, double-time scale rules and the like, and the (grid) structured representation method of the three-dimensional face model designed by the scheme can finally and effectively output the artificially synthesized large-scale three-dimensional face model.

Claims

1. A large-scale three-dimensional face synthesis method for sample similarity suppression is characterized by comprising the following steps:

step 3, constructing a sample similarity suppression network model based on the trained discarded deep network model; and the sample similarity suppression network model calculates the similarity between the synthesized three-dimensional face model and the second three-dimensional face model set in a characteristic extraction mode, and outputs the synthesized three-dimensional face model with the similarity lower than a threshold value so as to suppress the similarity between the synthesized three-dimensional face model and the second three-dimensional face model set.

2. The method of claim 1, wherein the network model of the drop deep network model is:

p_o＝f^q(m_o)＝f^q(w_i*m_i*p_i+b)

wherein m is_oScalar output representing all mesh nodes of the face model based on the discarded network model code, f^qRepresents a normalized probability check function for the output, where i 1., | z |, | z | represents the face structured representation mesh node m_iNumber of nodes of, said f^qThe function is an activation function or a Gaussian function; p is a radical of_iIs a preset probability; w is a_iIs the weight of node i and b is the bias term.

3. The method of claim 1, wherein step 1 comprises:

4. The method of claim 3, wherein the obtained depth map is fitted with a gridded surface by a gradient constrained smoothing method and a ridge regression solution.

5. The method of any one of claims 1-4, wherein the large-scale three-dimensional face synthesis network model is constructed based on a trained generative confrontation network model fused to a self-attention network.

6. The method of claim 5, wherein training stability of the generative confrontation network model of the fused self-attention network is improved by introducing a spectral norm.

7. The method of claim 6, wherein training time of the generative confrontation network model of the converged self-attention network is reduced by introducing a dual-timescale update rule.

8. The method of claim 5, wherein an embedded Gaussian function is employed as a computational function of the self-attention network, and an output layer of the self-attention network is constructed in residual form based on a non-local operator.