CN112613460A

CN112613460A - Face generation model establishing method and face generation method

Info

Publication number: CN112613460A
Application number: CN202011604398.1A
Authority: CN
Inventors: 宁欣; 张少林; 南方哲; 许少辉
Original assignee: Shenzhen Weifuyou Technology Co ltd
Current assignee: Shenzhen Weifuyou Technology Co ltd
Priority date: 2020-12-30
Filing date: 2020-12-30
Publication date: 2021-04-06
Anticipated expiration: 2040-12-30
Also published as: CN112613460B

Abstract

The embodiment of the invention discloses a method for establishing a face generation model and a face generation method, wherein the method for establishing the face generation model comprises the following steps: acquiring an original side face image with a single visual angle, and acquiring a three-dimensional vertex matrix and first posture information of a face from the original side face image; obtaining first texture information through the three-dimensional vertex matrix and the first posture information; obtaining a front face and a side face with artifacts based on the three-dimensional vertex matrix, the first posture information and the first texture information; inputting the two images into a generation network to obtain a front face and a side face without artifacts; the original side face and the side face without the artifact are input into an identification network, and a loss function value is obtained to update a generation network and the identification network, so that a face generation model is obtained. The technical scheme disclosed by the invention solves the problem that the generation of the front face image is limited by the scale and range of the data source, and can generate the face with higher precision.

Description

Face generation model establishing method and face generation method

Technical Field

The invention relates to the field of image generation, in particular to a method for establishing a human face generation model and a human face generation method.

Background

The front face image has wide application in the aspects of face recognition, video monitoring, identity verification and the like. And the application and accuracy rate of face recognition are greatly increased due to the heat trend of machine learning and deep learning technologies. Therefore, generation of a frontal image from a non-frontal face image is also slowly attracting attention.

However, the front face image generation method based on deep learning heavily depends on a data set having multi-view images of the same person, the generation result is limited by the scale and range of the data source, and it is difficult to consider the pose information of the person in the image, resulting in that the generated front face is not true enough.

Disclosure of Invention

Aiming at the situation that the generated front image is limited by a data source and the generated result effect is not real, the invention provides a method for establishing a human face generation model and a human face generation method.

In a first aspect, a first embodiment of the present invention provides a method for building a face generation model, including:

acquiring an original side face image with a single visual angle in a training set;

acquiring a three-dimensional vertex matrix and first posture information of a face according to an original side face image;

mapping the three-dimensional vertex matrix and the first posture information to a two-dimensional space to obtain first texture information;

obtaining a front face image with an artifact and a side face image with the artifact through the three-dimensional vertex matrix, the first posture information and the first texture information;

inputting the front face image with the artifact and the side face image with the artifact into a generation network to generate a front face image without the artifact and a side face image without the artifact;

inputting the original side face image and the side face image without the artifact into an identification network to obtain a loss function value;

and training a generating network and an identifying network by taking the minimum loss function value as a target to obtain a face generating model.

Further, the method for establishing the face generation model further includes:

inputting the three-dimensional vertex matrix, the first posture information and the first texture information into a neural mesh renderer to obtain a rendered side face image;

performing image corrosion on the rendered side face image, and extracting second texture information of the front face image subjected to image corrosion;

executing a preset first rotation operation on the first attitude information to obtain second attitude information;

and inputting the three-dimensional vertex matrix, the second posture information and the second texture information into a neural mesh renderer to obtain a front face image with an artifact, and obtaining a side face image with the artifact through the front face image with the artifact.

carrying out image corrosion on the front face image with the artifact, and extracting third texture information of the front face image with the artifact, which is subjected to the image corrosion;

executing a preset second rotation operation on the second attitude information to obtain third attitude information;

and inputting the three-dimensional vertex matrix, the third posture information and the third texture information into a neural mesh renderer to obtain a side face image with an artifact.

inputting a front face image with an artifact and a side face image with the artifact into a downsampling layer and a feature extraction layer which are sequentially connected in a generation network to obtain a first feature vector and a second feature vector;

the down-sampling layer at least comprises a 7 × 7 convolution layer and four 3 × 3 convolution layers which are connected in sequence, the feature extraction layer comprises nine feature extraction blocks which are connected in sequence, and each feature extraction block at least comprises an attention layer, a 3 × 3 convolution layer and an SPADE layer;

inputting the first feature vector and the second feature vector into an up-sampling layer in a generation network, and generating a front face image without an artifact and a side face image without the artifact;

wherein, the upper sampling layer at least comprises four convolution layers of 3 × 3 and one convolution layer of 7 × 7 which are connected in sequence.

acquiring first illumination data of an original side face image and second illumination data of a front face image without an artifact, and obtaining a first illumination loss value through the first illumination data and the second illumination data;

optimizing the front face image without the artifact to obtain a first image by taking the minimized first illumination loss value as a target;

inputting an original side face image serving as an input image and a first image serving as a guide image into a guide filter, and updating the first image to obtain a second image;

acquiring third illumination data of the second image, and obtaining a second illumination loss value through the first illumination data and the third illumination data;

the second image is again optimized with the goal of minimizing the second illumination loss value.

In a second aspect, a second embodiment of the present invention provides a face generation method, including:

acquiring a side face image with a single visual angle;

acquiring a three-dimensional vertex matrix and posture information of the face according to the side face image;

mapping the three-dimensional vertex matrix and the attitude information to a two-dimensional space to obtain texture information of the side face image;

obtaining a front face image with an artifact through the three-dimensional vertex matrix, the attitude information and the texture information;

and inputting the front face image into a face generation model to generate a front face image without artifacts, wherein the face generation model is obtained by the method in the first aspect.

In a third aspect, a third embodiment of the present invention provides an apparatus for creating a face generation model, including:

the system comprises a sample acquisition module, a training set acquisition module and a training set acquisition module, wherein the sample acquisition module is used for acquiring an original side face image with a single visual angle in the training set;

the first extraction module is used for acquiring a three-dimensional vertex matrix and first posture information of the face according to the original side face image;

the second extraction module is used for mapping the three-dimensional vertex matrix and the first attitude information to a two-dimensional space to obtain first texture information;

the artifact image acquisition module is used for acquiring a front face image with an artifact and a side face image with the artifact through the three-dimensional vertex matrix, the first posture information and the first texture information;

the generating module is used for inputting the front face image with the artifact and the side face image with the artifact into a generating network to generate a front face image without the artifact and a side face image without the artifact;

the identification module is used for inputting the original side face image and the side face image without the artifact into an identification network to obtain a loss function value;

and the training module is used for training the generation network and the identification network by taking the minimum loss function value as a target to obtain the face generation model.

In a fourth aspect, a fourth embodiment of the present invention provides a face generation apparatus, including:

the acquisition module is used for acquiring a side face image with a single visual angle;

the first acquisition module is used for acquiring a three-dimensional vertex matrix and posture information of the face according to the side face image;

the second acquisition module is used for mapping the three-dimensional vertex matrix and the posture information to a two-dimensional space to obtain texture information of the side face image;

the artifact image generation module is used for obtaining a front face image with an artifact through the three-dimensional vertex matrix, the posture information and the texture information;

and the artifact eliminating module is used for inputting the front face image into a face generation model and generating the front face image without the artifact, wherein the face generation model is obtained by the method for establishing any one of the face generation models in the first aspect.

In a fifth aspect, a fifth embodiment of the present invention provides a computer device, comprising a memory and a processor, the memory storing a computer program, the computer program, when running on the processor, performing the method for building a face generation model as in the first aspect or the method for generating a face as in the second aspect.

In a sixth aspect, a sixth embodiment of the present invention provides a computer-readable storage medium having stored thereon a computer program which, when run on a processor, performs the method of building a face generation model as in the first aspect or the method of generating a face as in the second aspect.

The method for establishing the face generation model comprises the steps of acquiring an original side face image with a single visual angle in a training set, and acquiring a three-dimensional vertex matrix and first posture information of a face from the original side face image; mapping the three-dimensional vertex matrix and the first posture information to a two-dimensional space to obtain first texture information; obtaining a front face image with an artifact and a side face image with the artifact based on the three-dimensional vertex matrix, the first posture information and the first texture information; inputting the two images into a generation network to remove the artifacts, namely generating a front face image without the artifacts and a side face image without the artifacts; inputting the original side face image and the side face image without the artifact into an identification network to obtain the quality of the side face image without the artifact and indirectly obtain the quality of the generated front face image; and obtaining a loss function value through the side face image without the artifact and the original side face, and updating the training generation network and the identification network based on the minimized loss function value, thereby obtaining a trained face generation model. The method for establishing the human face generation model provided by the invention utilizes the original side face image under the single view angle to generate the front face with the artifact and the side face with the artifact, utilizes the countermeasure network to eliminate the artifact in the two faces, and indirectly judges the generation effect of the front face through the quality of the side face. Based on the scheme, the invention provides the establishment method of the face generation model, which solves the problem that the generation of the face image on the front side is limited by the scale and the range of the data source, and the posture information of people in the image is also considered when the image is generated, so that the authenticity of the generated image is ensured; and the quality of the face on the front side is indirectly judged through the quality of the face on the side surface, so that the image generation effect is further improved.

Drawings

In order to more clearly illustrate the technical solution of the present invention, the drawings required to be used in the embodiments will be briefly described below, and it should be understood that the following drawings only illustrate some embodiments of the present invention, and therefore should not be considered as limiting the scope of the present invention. Like components are numbered similarly in the various figures.

FIG. 1 is a flow chart of a method for building a face generation model according to the present invention;

FIG. 2 shows a schematic diagram of a generated face with artifacts;

FIG. 3 is a schematic flow chart of S140 in the method for building a face generation model according to the present invention;

FIG. 4 is a schematic view showing another flow of step S140 in the method for building a face generation model according to the present invention;

FIG. 5 is a flow chart illustrating the generation of a front face without artifacts from a front face with artifacts;

FIG. 6 is a schematic flow chart of another method for building a face generation model according to the present invention;

FIG. 7 is a flow chart illustrating a face generation method of the present invention;

FIG. 8 is a schematic structural diagram of an apparatus for constructing a face generation model according to the present invention;

fig. 9 is a schematic structural diagram of the face generation device of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments.

The components of embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present invention, presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present invention without making any creative effort, shall fall within the protection scope of the present invention.

Hereinafter, the terms "including", "having", and their derivatives, which may be used in various embodiments of the present invention, are only intended to indicate specific features, numbers, steps, operations, elements, components, or combinations of the foregoing, and should not be construed as first excluding the existence of, or adding to, one or more other features, numbers, steps, operations, elements, components, or combinations of the foregoing.

Furthermore, the terms "first," "second," "third," and the like are used solely to distinguish one from another and are not to be construed as indicating or implying relative importance.

Unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which various embodiments of the present invention belong. The terms (such as those defined in commonly used dictionaries) should be interpreted as having a meaning that is consistent with their contextual meaning in the relevant art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein in various embodiments of the present invention.

Example 1

In this embodiment, referring to fig. 1, a method for establishing a face generation model is shown, which includes the following steps:

s110, acquiring the original side face image with a single visual angle in the training set.

Specifically, the training set for training the neural network comprises a plurality of side face images under different illumination at different viewing angles; but only the side face images under the same visual angle are extracted for training.

And S120, acquiring a three-dimensional vertex matrix and first posture information of the face according to the original side face image.

Specifically, in the present embodiment, a 3d DDFA-V2(3Dimensional Dense Face Alignment-Version 2, second Version of 3Dimensional Dense Face Alignment) algorithm is used to extract a three-Dimensional vertex matrix and pose information in the original side Face image.

The 3D DDFA-V2 algorithm is a 3D auxiliary short video synthesis method, and can simulate the face movement in and out of a plane, so that a still image is converted into a short video, the three-dimensional vertex coordinates of the face are obtained according to the still image, and the posture of a person is estimated.

The pose information, i.e., the azimuthal orientation of the three-dimensional target object in the image, is represented in the form of a matrix. In the embodiment, the 3DDFA-V2 completes the posture estimation of the person on the side face image, so as to obtain the posture information of the person, namely the information of the body orientation, the sight line direction and the like of the person.

S130, mapping the three-dimensional vertex matrix and the first posture information to a two-dimensional space to obtain first texture information.

Specifically, the face has corresponding texture information at each vertex in the three-dimensional space and in the RGB (Red-Green-Blue) space. In the embodiment, each vertex in the three-dimensional image is mapped into the two-dimensional space through orthogonal projection, so that the texture features of each vertex in the face image are obtained.

Further, when the vertex mapping is completed through orthogonal projection, the result of the projection, that is, the first texture information is independent of the Z-axis coordinate in the three-dimensional vertex coordinate, so that the image is rotated from the two-dimensional space to the three-dimensional space, a plurality of identical vertices exist, and the texture features of the vertices are identical. Therefore, in the present embodiment, only the texture information in the three-dimensional space is mapped to the vertex with the largest Z-axis value in the three-dimensional vertex matrix of the human face. Therefore, in the subsequent steps, the rationality of the texture information is ensured, and the generation quality of the image is improved.

Exemplarily, the obtaining of the first texture information may be illustrated by the following formula:

K_j＝argmax_k([0,0,1]*_k)

wherein the ratio of [0,0,1,]*v_kz-axis coordinates representing k vertices; argmax_k([0,0,1,]*v_k) Selecting a vertex with the maximum z coordinate from the k vertices; k_jRepresents that the z-axis coordinate of the jth vertex is maximum in k vertices with the same horizontal and vertical coordinates.

And S140, obtaining a front face image with an artifact and a side face image with the artifact through the three-dimensional vertex matrix, the first posture information and the first texture information.

Specifically, after the three-dimensional vertex matrix, the first posture information and the first texture information are rotated, rendered, corroded and the like, a front face image and a side face image are obtained; since the extracted texture features are the textures of the original side face image, the texture of the other side of the generated face is uncertain. Therefore, the front face image and the side face image obtained by operations such as rotation, rendering and etching have artifacts.

S150, inputting the front face image with the artifact and the side face image with the artifact into a generation network to generate the front face image without the artifact and the side face image without the artifact.

Exemplarily, referring to fig. 2, fig. 2 shows a schematic diagram of a human face with artifacts, wherein 151 is an artifact.

And S160, inputting the original side face image and the side face image without the artifact into an identification network to obtain a loss function value.

Specifically, in this embodiment, the anti-network is used to eliminate the artifacts in the front face image and the side face image, and then the original side face image and the side face image without artifacts output by the generation network are input into the identification network, and the quality of the generated side face image without artifacts is determined by the identification network, so as to indirectly obtain the quality of the generated front face image without artifacts.

Preferably, the authentication network may be a multi-level discriminator based on pix2 pixHD; the loss function values include the countermeasure loss, the feature matching loss, and the perceptual loss.

Specifically, the resistance loss is obtained by the following formula:

L_Gan(G,D)＝E_I[logD(I_a)]+E_Rd[log(1-D(G(F_a))]

wherein G represents a generating network, D represents an authenticating network, and D (I)_a) A prediction value representing a real image, i.e. a prediction value of an original side face image, E_IDistribution probability representing real sample, i.e. distribution probability of original side face image, D (G (F)_a) Represents a sample F obtained by generating a network G_aPredicted value of (E), E_RdRepresenting the distribution probability of the generated samples, i.e. the classification probability of the frontal face image without artifacts, L_GAN(G, D) is the challenge loss.

The feature matching loss is calculated by the distance between the features of the original side face image extracted by each level of the discriminator and the features of the front face image without artifacts, and the specific formula is as follows:

wherein N is_DRepresenting the number of layers in the authentication network,

representing the feature information extracted from the original side face image by the ith layer of the discrimination network,

representing the feature information extracted from the generated side face image by the ith layer of the authentication network; l is_FM(G, D) is the loss of feature matching.

The perception loss is calculated through a pre-trained VGG network, namely the characteristics of an original side face image and a generated side face image without artifacts are respectively extracted through the VGG network, and the loss is calculated based on the characteristics of the two images, wherein the specific formula is as follows:

similarly, similar to the feature matching loss, the calculation of the perception loss is also to calculate the features of the original side face image and the generated side face image without the artifact through the network of the VGG layers; in this embodiment, the perceptual loss is used to adjust the generated result and the generated identity.

In summary, the loss function value in this embodiment is obtained by the following formula:

L_total＝L_GAN+λ₁L_FM+λ₂L_VGG

wherein λ is₁And λ₂The preset proportion parameter can be adjusted according to the actual scene.

Based on the acquisition mode of the loss function, the difference between the generated image and the original image is more accurate, so that the updating of the network is more reasonable, and the training speed and efficiency are improved.

And S170, training a generating network and an identifying network by taking the minimum loss function value as a target to obtain a face generating model.

The identification network obtains a loss function value based on the difference between the original side face image and the side face image without the artifact; and then, reversely updating the generation network and the identification network by the minimized loss function value, and when the F1 parameter (F1-score) is large enough or the loss value is stable to a minimum value, the generation network and the identification network are considered to be well trained, in other words, when the image generated by the generation network is real enough and the identification network is difficult to judge the difference between the generated image and the real image, the face generation model of the embodiment is obtained.

Based on the scheme, the method for establishing the face generation model solves the problem that the generation of the face image on the front side is limited by the scale and the range of a data source, and ensures the authenticity of the generated image by considering the posture information of people in the image when the image is generated; and the quality of the face on the front side is indirectly judged through the quality of the face on the side surface, so that the image generation effect is further improved.

Further, referring to fig. 3, fig. 3 shows a schematic flow chart of S140 in the method for establishing a face generation model according to the present invention, that is, the above S140 includes the following steps:

and S141, inputting the three-dimensional vertex matrix, the first posture information and the first texture information into a neural mesh renderer to obtain a rendered side face image.

Optionally, the present embodiment completes rendering of the image through a Neural Mesh Render model.

And S142, carrying out image corrosion on the rendered side face image, and extracting second texture information of the front face image subjected to image corrosion.

Specifically, in the embodiment, the three-dimensional vertex matrix and the attitude information are acquired through the 3DDFA _ V2, and then the three-dimensional vertex coordinate matrix and the attitude information are mapped to the two-dimensional plane to acquire the texture information; when the 3DDFA _ V2 acquires the three-dimensional vertex matrix, an incorrect vertex may be obtained, and the projection of the incorrect vertex may occur at an abnormal position of the image, so that the texture information of the original image is assigned to the incorrect position during rendering and rotation.

Therefore, in order to ensure the authenticity of the front face image and the side face image, the embodiment performs image erosion (enode) on the rendered front face image to remove texture information at an abnormal position in the front face image. After eliminating the wrong texture information, extracting the characteristics of the image subjected to image corrosion to obtain the texture information of the side face.

Optionally, the image erosion operation may be performed according to an average value of texture information of edges and all vertices of a human face obtained after the three-dimensional vertices are mapped to the two-dimensional space.

S143, executing a preset first rotation operation on the first posture information to obtain second posture information;

optionally, the first pose information, that is, the pose matrix is multiplied by a rotation matrix, so that the pose of the person in the current direction is rotated to obtain the face pose in the frontal perspective.

And S144, inputting the three-dimensional vertex matrix, the second posture information and the second texture information into a neural mesh renderer to obtain a front face image with an artifact, and obtaining a side face image with the artifact through the front face image with the artifact.

Exemplarily, the obtaining of the front face image with artifacts can be illustrated by the following formula:

Rd(x_j,y_j) Representing rendering of each vertex, T representing texture information, K_jRepresenting the same two-dimensional vertex, i.e. each (x)_j,y_j) The vertex with the largest z-axis coordinate value is located in the middle, and N represents the number of the vertexes with the largest z-axis coordinate value.

The rendering of the entire front image can be understood by the following formula:

Rd＝Render({V,P,T})

wherein V represents a three-dimensional vertex matrix, P represents pose information, and T represents texture information.

Based on the rendering formula, inputting the three-dimensional vertex matrix, the second texture information and the second posture information into a neural mesh renderer, so as to obtain a front face image with an artifact; after the front face image with the artifact is obtained, the front face image with the artifact is subjected to image processing, such as rotation, rendering and the like, so as to obtain a side face image.

In a possible embodiment, referring to fig. 4, fig. 4 shows another flow chart of S140 in the method for building a face generation model according to the present invention, that is, after the step of obtaining the front face image with the artifact in S144, the step of obtaining the side face image with the artifact by using the front face image with the artifact includes:

s145, carrying out image corrosion on the front face image with the artifact, and extracting third texture information of the front face image with the artifact, which is subjected to the image corrosion.

Specifically, rotation and rendering are performed when a front face with an artifact is obtained, and an erroneous texture is inevitably assigned to an erroneous vertex, so that, in order to further improve the quality of a generated image, image erosion is performed on the front face image with the artifact, and texture features, that is, third texture information, are extracted from the eroded face image.

And S146, executing a preset second rotation operation on the second posture information to obtain third posture information.

Specifically, the second pose, i.e., the front pose, is rotated back to the original angle to obtain third pose information, i.e., pose information at the same angle as the original side face.

And S147, inputting the three-dimensional vertex matrix, the third posture information and the third texture information into a neural mesh renderer to obtain a side face image with an artifact.

Similarly, as in the case of obtaining a front face image with an artifact, the three-dimensional vertex matrix, the third texture information, and the third pose information are input to a neural mesh renderer, so as to obtain a side face image with an artifact.

Further, referring to fig. 5, fig. 5 shows a schematic flow chart of generating a front face without artifacts by using the front face with artifacts, that is, the above S150 may be performed by:

inputting a front face image with an artifact and a side face image with the artifact into a downsampling layer 152 and a feature extraction layer 153 which are sequentially connected in a generation network to obtain a first feature vector and a second feature vector;

inputting the first feature vector and the second feature vector into an upsampling layer 154 in a generation network, and generating a front face image without an artifact and a side face image without an artifact;

the down-sampling layer 152 at least comprises a 7 × 7 convolutional layer and four 3 × 3 convolutional layers which are connected in sequence, the feature extraction layer 153 at least comprises nine feature extraction blocks which are connected in sequence, and each feature extraction block at least comprises an attention layer, a 3 × 3 convolutional layer and a SPADE (spatial-Adaptive Normalization) layer; the upsampling layer 154 includes at least four 3 x 3 convolutional layers and one 7 x 7 convolutional layer connected in series.

Preferably, the front face image with the artifact and the side face image with the artifact are downsampled to extract image features to obtain feature vectors, and the expressive power of the feature vectors is further improved by using nine incomplete blocks, namely nine feature extraction blocks.

Preferably, a cbam (conditional Block attachment module) module and a SPADE layer are introduced into the feature extraction Block.

CBAM is a kind of attention mechanism module that combines space (spatial) and channel (channel); in the embodiment, a CBAM mechanism is introduced into the feature extraction block, and the feature vector, i.e., the input feature is fed into the channel attention module and the spatial attention module which are sequentially connected in the CBAM, so that the network can pay attention to the feature of the important region and suppress the feature of the unimportant region in the channel and spatial dimensions, thereby effectively emphasizing or compressing the extraction of the intermediate feature.

SPADE is the modification of BN (batch normalization) layer, and gamma and beta in BN are modified from 'obtaining through network training' to 'obtaining through semantic image calculation'. In the embodiment, the SPADE network is added in the feature extraction block for generating the network, so that the network can pay more attention to semantic information of the image in training, information loss is avoided, and extracted feature information is richer.

The method comprises the steps of inputting a front face image with an artifact and a side face image with the artifact into a downsampling layer and a feature extraction layer in a generation network to obtain a first feature vector and a second feature vector, wherein the first feature vector corresponds to the front face image with the artifact, and the second feature vector corresponds to the side face image with the artifact. And inputting the first characteristic vector and the second characteristic vector into an upper sampling layer, so that the characteristic vectors are restored into corresponding images, and a front face image without an artifact and a side face image without the artifact are obtained.

Preferably, referring to fig. 6, fig. 6 shows another flow chart of the method for building a face generation model according to the present invention, after generating a front face image without an artifact and a side face image without an artifact, before inputting the original side face image and the side face image without an artifact to an authentication network, i.e. between S150 and S160, the method further includes:

s170, acquiring first illumination data of the original side face image and second illumination data of the front face image without the artifact, and obtaining a first illumination loss value through the first illumination data and the second illumination data;

s171, optimizing the front face image without the artifact to obtain the first image with the objective of minimizing the first light loss value.

It is difficult to learn a reliable and independent illumination representation from a face image, since the illumination factors in the generated image and the original image are also one of the main factors affecting the image quality, whereas the illumination factors are affected by various illumination conditions, and the illumination conditions are difficult to quantify into labels. In this embodiment, the illumination factor is directly constrained to the image space, and an illumination maintaining path is added between the original side face image and the front face image without the artifact, that is, data such as illumination intensity and brightness of the front face image without the artifact is directly adjusted, so that the front face image without the artifact and the original side face image maintain illumination consistency.

Specifically, illumination information of the two images, such as characteristic information of illumination intensity, brightness and the like, is collected, and loss values of the two images, namely a first illumination loss value, are calculated based on the illumination-related characteristic information. And adjusting the illumination of the front face image without the artifact through the first illumination loss value.

Alternatively, the first illumination loss value may be calculated by an L2 loss function.

S172, inputting the original side face image serving as an input image and the first image serving as a guide image into a guide filter, and updating the first image to obtain a second image;

s173, acquiring third illumination data of the second image, and obtaining a second illumination loss value through the first illumination data and the third illumination data;

s174, the second image is optimized again with the goal of minimizing the second illumination loss value.

However, because the front face image without the artifact and the original side face image have different postures of the people in the two images, namely the illumination intensity of the front face image without the artifact and the illumination intensity of the original side face image have obvious difference; if the difference between the postures of the human face in the front image and the side image is too large, the details of the generated front human face are greatly different from the original human face due to the fact that factors such as brightness and illumination intensity are directly adjusted. Therefore, an illumination keeping path is also introduced in the embodiment, namely, illumination of the image is transmitted by using the guiding filter, so that the face details of the front face image are optimized, and the generated front face image is closer to the original side face image.

Specifically, in the guide filter, the radius of the filter is one fourth of the image resolution. The second loss of illumination may be a pixel loss or a perceptual loss, etc.

Since the guide filter has no trainable parameters, adding directly at model training may cause the model to fall into local minima during training. Therefore, the above-described boot filter is introduced after a number of iterations, thereby ensuring a stable and robust initialization of the model.

Based on the illumination keeping path and the illumination adapting path, the generated front face image without the artifact is continuously optimized, so that the generated front face and the original side face keep illumination consistency, and the generated front face image is more real.

Based on the technical scheme in the embodiment 1, the problem that the generation of the front face image is limited by the scale and the range of a data source is solved, and the posture information of people in the image is also considered when the image is generated, so that the authenticity of the generated image is ensured; and the quality of the face on the front side is indirectly judged through the quality of the face on the side surface, so that the image generation effect is further improved.

Example 2

In this embodiment, referring to fig. 7, fig. 7 shows a schematic flow chart of the face generation method of the present invention, and the face generation method provided in this embodiment includes:

s210, acquiring a side face image with a single visual angle.

Optionally, a plurality of different side face images at different viewing angles are acquired, and one face corresponds to only one viewing angle.

S220, acquiring a three-dimensional vertex matrix and posture information of the face according to the side face image;

s230, mapping the three-dimensional vertex matrix and the posture information to a two-dimensional space to obtain texture information of the side face image;

s240, obtaining a front face image with artifacts through the three-dimensional vertex matrix, the attitude information and the texture information;

and S250, inputting the front face image into a face generation model to generate the front face image without the artifact.

Optionally, when the input image is a plurality of different side faces, the generated front face image corresponds to the faces in the side face images one to one.

The face generation model is obtained by the method for establishing the face generation model in the embodiment 1.

The face generation method in embodiment 2 above can generate the corresponding front face image only through the side face image under a single viewing angle, thereby solving the problem that the current front face image generation is limited by the scale and range of the data source, and the pose information of the person in the image is also considered when generating the image, so as to ensure the authenticity of the generated image, and then the front face is generated through the trained face generation model, so as to ensure the consistency of the input image and the generated image.

Example 3

In this embodiment, referring to fig. 8, an apparatus 300 for building a face generation model is shown, which includes a sample obtaining module 310, a first extracting module 320, a second extracting module 330, an artifact image obtaining module 340, a generating module 350, a discriminating module 360, and a training module 370, wherein:

a sample acquiring module 310, configured to acquire an original side face image of a single viewing angle in a training set;

the first extraction module 320 is configured to obtain a three-dimensional vertex matrix and first pose information of a face according to an original side face image;

the second extraction module 330 is configured to map the three-dimensional vertex matrix and the first pose information to a two-dimensional space, so as to obtain first texture information;

the artifact image obtaining module 340 is configured to obtain a front face image with an artifact and a side face image with an artifact through the three-dimensional vertex matrix, the first pose information, and the first texture information;

a generating module 350, configured to input the front face image with the artifact and the side face image with the artifact into a generating network, and generate a front face image without the artifact and a side face image without the artifact;

the identification module 360 is used for inputting the original side face image and the side face image without the artifact into an identification network to obtain a loss function value;

and the training module 370 is used for training the generation network and the identification network to obtain a face generation model by taking the minimum loss function value as a target.

It should be understood that, in the technical solution of this embodiment, the above functional modules cooperate to implement the method for establishing the face generation model in embodiment 1, and the implementation and beneficial effects related to embodiment 1 are also applicable to this embodiment, and are not described herein again.

Example 4

In this embodiment, referring to fig. 9, a face generation apparatus 400 is shown, which includes an acquisition module 410, a first acquisition module 420, a second acquisition module 430, an artifact image generation module 440, and an artifact removal module 450, where:

an obtaining module 410, configured to obtain a side face image with a single viewing angle;

the first acquisition module 420 is configured to acquire a three-dimensional vertex matrix and posture information of a face according to a side face image;

the second acquisition module 430 is configured to map the three-dimensional vertex matrix and the posture information to a two-dimensional space, so as to obtain texture information of the side face image;

the artifact image generation module 440 is configured to obtain a front face image with an artifact according to the three-dimensional vertex matrix, the posture information, and the texture information;

the artifact removing module 450 is configured to input the front face image into a face generation model, and generate a front face image without an artifact, where the face generation model is obtained by the method for establishing the face generation model in embodiment 1.

It should be understood that, in the technical solution of this embodiment, the above functional modules cooperate to execute the face generation method of the above embodiment 2, and the implementation and beneficial effects related to the embodiment 2 are also applicable to this embodiment, and are not described herein again.

In this embodiment, the present invention further relates to a computer device, which includes a memory and a processor, wherein the memory is used for storing a computer program, and the processor runs the computer program so as to enable the terminal device to execute the method of embodiment 1 or embodiment 2.

In the present embodiment, the present invention also relates to a readable storage medium, which stores a computer program, and when the computer program runs on a processor, the computer program executes the method of embodiment 1 or embodiment 2.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method can be implemented in other ways. The apparatus embodiments described above are merely illustrative and, for example, the flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

In addition, each functional module or unit in each embodiment of the present invention may be integrated together to form an independent part, or each module may exist separately, or two or more modules may be integrated to form an independent part.

The functions, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention or a part of the technical solution that contributes to the prior art in essence can be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a smart phone, a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and all the changes or substitutions should be covered within the scope of the present invention.

Claims

1. A method for establishing a human face generation model is characterized by comprising the following steps:

acquiring a three-dimensional vertex matrix and first posture information of the face according to the original side face image;

mapping the three-dimensional vertex matrix and the first attitude information to a two-dimensional space to obtain first texture information;

and training the generating network and the identifying network to obtain a face generating model by taking the minimized loss function value as a target.

2. The method of claim 1, wherein obtaining a front face image with artifacts and a side face image with artifacts from the three-dimensional vertex matrix, the first pose information, and the first texture information comprises:

performing image corrosion on the rendered side face image, and extracting second texture information of the front face image subjected to the image corrosion;

3. The method of claim 2, wherein obtaining the side face image with the artifact from the front face image with the artifact comprises:

performing image corrosion on the front face image with the artifact, and extracting third texture information of the front face image with the artifact, which is subjected to the image corrosion;

executing a preset second rotation operation on the second posture information to obtain third posture information;

and inputting the three-dimensional vertex matrix, the third posture information and the third texture information into the neural mesh renderer to obtain a side face image with an artifact.

4. The method of claim 1, wherein inputting the artifact-bearing front face image and the artifact-bearing side face image to a generation network generates a non-artifact-bearing front face image and a non-artifact-bearing side face image, comprising:

inputting the front face image with the artifact and the side face image with the artifact into a downsampling layer and a feature extraction layer which are sequentially connected in a generation network to obtain a first feature vector and a second feature vector; the down-sampling layer at least comprises a 7 × 7 convolutional layer and four 3 × 3 convolutional layers which are connected in sequence, the feature extraction layer comprises nine feature extraction blocks which are connected in sequence, and each feature extraction block at least comprises an attention layer, a 3 × 3 convolutional layer and a SPADE layer;

inputting the first feature vector and the second feature vector into an up-sampling layer in a generation network, and generating a front face image without an artifact and a side face image without an artifact; wherein, the up-sampling layer at least comprises four 3 × 3 convolution layers and a 7 × 7 convolution layer which are connected in sequence.

5. The method of claim 1, wherein after said generating a front face image without artifacts and a side face image without artifacts, and before said inputting said original side face image and said side face image without artifacts into an authentication network, said method further comprises:

acquiring first illumination data of the original side face image and second illumination data of the front face image without the artifact, and obtaining a first illumination loss value through the first illumination data and the second illumination data;

optimizing the front face image without the artifact to obtain a first image with the aim of minimizing the first illumination loss value;

inputting the original side face image serving as an input image and the first image serving as a guide image into a guide filter, and updating the first image to obtain a second image;

the second image is optimized again with the goal of minimizing the second illumination loss value.

6. A face generation method, comprising:

acquiring a side face image with a single visual angle;

obtaining a front face image with artifacts through the three-dimensional vertex matrix, the attitude information and the texture information;

inputting the front face image into a face generation model to generate a front face image without artifacts, wherein the face generation model is obtained by the method for building the face generation module according to any one of claims 1 to 5.

7. An apparatus for building a face generation model, comprising:

the second extraction module is used for mapping the three-dimensional vertex matrix and the first posture information to a two-dimensional space to obtain first texture information;

and the training module is used for training the generating network and the identifying network to obtain a face generating model by taking the minimized loss function value as a target.

8. A face generation apparatus, comprising:

the artifact image generation module is used for obtaining a front face image with an artifact through the three-dimensional vertex matrix, the attitude information and the texture information;

a second image module, configured to input the front face image into a face generation model, and generate a front face image without artifacts, wherein the face generation model is obtained by the method according to any one of claims 1 to 5.

9. A computer device comprising a memory and a processor, the memory storing a computer program which, when run on the processor, performs the method of building a face generation model according to any one of claims 1 to 5 or the method of generating a face according to claim 6.

10. A computer-readable storage medium, having stored thereon a computer program which, when run on a processor, performs the method of building a face generation model according to any one of claims 1-5 or the method of generating a face according to claim 6.