CN111652049A

CN111652049A - Face image processing model training method and device, electronic equipment and storage medium

Info

Publication number: CN111652049A
Application number: CN202010308040.8A
Authority: CN
Inventors: 柴振华; 赖申其; 李佩佩; 赫然
Original assignee: Beijing Sankuai Online Technology Co Ltd
Current assignee: Beijing Sankuai Online Technology Co Ltd
Priority date: 2020-04-17
Filing date: 2020-04-17
Publication date: 2020-09-11

Abstract

The application discloses a face image processing model training method, a face image processing model training device, electronic equipment and a storage medium, wherein the method comprises the following steps: encoding a first face image by using an encoder of the face image processing model to obtain a first feature vector of the first face image; splitting the first feature vector into a plurality of sub-feature vectors, the sub-feature vectors including at least an identity sub-feature vector and an age sub-feature vector; determining a vector to be decoded according to each sub-feature vector, and decoding the vector to be decoded by using a decoder of the face image processing model to obtain a second face image; and optimizing parameters of the face image processing model according to the regression loss values of the first face image and the second face image. The human face image processing model obtained through training reduces the dependency on data distribution, is more robust to long-tail data with unbalanced ages, and can generate a facial aging image with a better effect.

Description

Face image processing model training method and device, electronic equipment and storage medium

Technical Field

The application relates to the technical field of image processing, in particular to a face image processing model training method and device, electronic equipment and a storage medium.

Background

The age editing of the face image mainly comprises the steps that the face at the age can be automatically simulated according to the input face image and the corresponding target age, and a related algorithm has a great number of potential applications in numerous entertainment and medical beauty scenes, such as predicting the future appearance of a user, guessing the appearance of the user when the user is small, searching missing children, authentication and comparison and the like. In addition, more face data sets can be generated through the model to be used for assisting the training of related face intelligent analysis tasks. At present, schemes such as generation of an anti-network and variational self-coding are mainly adopted in the field, and a relatively well-known method is a Learning face image development proposed in 2018 CVPR (IEEE Conference on computer Vision and Pattern Recognition, IEEE international computer Vision and Pattern Recognition Conference) Conference: a pyramid architecture of gans (learning the evolution of facial age: generating a pyramid structure of an antagonistic network), and Global and local constitutive adaptive antagonistic networks proposed in the International Conference on Pattern Recognition (ICPR) in 2018 (Global and local consistent age generating antagonistic networks), wherein the pyramid structures are respectively adopted, and the Global characteristics and the local characteristics are respectively distinguished, so that the Recognition effect of image details is improved.

The inventor finds that the model obtained by training based on the existing face image age editing algorithm is not ideal, and the generated face image has poor effect at certain ages. Although the face image age editing algorithm based on the generation of confrontation networks (GANs) has made great progress, the method cannot synthesize a small number of face age images.

Disclosure of Invention

In view of the above, the present application is proposed to provide a face image processing model training method, apparatus, electronic device and storage medium that overcome or at least partially solve the above problems.

According to a first aspect of the present application, there is provided a face image processing model training method, including:

encoding a first face image by using an encoder of the face image processing model to obtain a first feature vector of the first face image;

splitting the first feature vector into a plurality of sub-feature vectors, the sub-feature vectors including at least an identity sub-feature vector and an age sub-feature vector;

determining a vector to be decoded according to each sub-feature vector, and decoding the vector to be decoded by using a decoder of the face image processing model to obtain a second face image;

and optimizing parameters of the face image processing model according to the regression loss values of the first face image and the second face image.

Optionally, the determining a vector to be decoded according to each sub-feature vector includes:

acquiring an identity reference characteristic vector of the first face image by using a face recognition model;

and fitting the identity sub-feature vector according to the identity reference feature vector, and taking the fitted identity sub-feature vector as the vector to be decoded.

and performing knowledge distillation on the encoder according to the identity reference feature vector and the identity sub-feature vector.

fitting the age sub-feature vector with a Gaussian distribution model, and taking the age sub-feature vector subjected to fitting as the vector to be decoded;

and carrying out Gaussian prior constraint on the encoder according to the age sub-feature vector before fitting processing and the age sub-feature vector after fitting processing.

Optionally, the decoding, by the decoder using the face image processing model, the vector to be decoded to obtain a second face image includes:

carrying out batch normalization processing on the vector to be decoded determined according to the identity sub-feature vector to obtain a low-level feature vector, and inputting the low-level feature vector into the decoder to obtain a first feature input result;

and carrying out batch normalization processing on the vector to be decoded determined according to the age sub-feature vector to obtain a high-level feature vector, and inputting the high-level feature vector into the decoder based on the first feature input result to obtain the second face image.

Optionally, the sub-feature vector further comprises a supplementary sub-feature vector, the method further comprising:

inputting the supplementary sub-feature vector into the decoder after inputting the lower-layer feature vector into the decoder to obtain a first feature input result and before inputting the higher-layer feature vector into the decoder.

Optionally, the method further comprises:

for the second face image, carrying out face image processing for N times by using the face image processing model to obtain corresponding N groups of sub-feature vectors, wherein N is a positive integer;

and calculating a repeated training loss value according to the similarity of the N groups of sub-feature vectors, and optimizing the parameters of the face image processing model according to the repeated training loss value.

Optionally, the sub-feature vector comprises a supplementary sub-feature vector, the method further comprising:

and inputting the supplementary sub-feature vectors in the N groups of sub-feature vectors into a discriminator so as to optimize the parameters of the human face image processing model according to the output result of the discriminator.

According to a second aspect of the present application, there is provided a method for generating a face image, including:

acquiring an original face image;

generating a sub-feature vector set of the original face image by using a face image processing model, wherein the face image processing model is obtained by training based on the face image processing model training method;

acquiring an age characteristic vector of a target age, and replacing an age sub-characteristic vector in the sub-characteristic vector set with the age characteristic vector to obtain a replaced sub-characteristic vector set;

and generating a face age change image corresponding to the original face image by using a face image processing model and the replaced sub-feature vector set.

According to a third aspect of the present application, there is provided a face age estimation method, including:

acquiring a face image;

inputting the face image into an encoder of a face image processing model to obtain an age sub-feature vector of the face image, wherein the face image processing model is obtained by training based on the face image processing model training method;

and determining an age estimation result according to the age sub-feature vector.

According to a fourth aspect of the present application, there is provided a face image processing model training apparatus, comprising:

the encoding unit is used for encoding a first face image by using an encoder of the face image processing model to obtain a first feature vector of the first face image;

a splitting unit, configured to split the first feature vector into a plurality of sub-feature vectors, where the sub-feature vectors at least include an identity sub-feature vector and an age sub-feature vector;

the decoding unit is used for determining a vector to be decoded according to each sub-feature vector and decoding the vector to be decoded by using a decoder of the face image processing model to obtain a second face image;

and the first optimization unit is used for optimizing the parameters of the face image processing model according to the regression loss values of the first face image and the second face image.

Optionally, the decoding unit is further configured to:

Optionally, the apparatus further comprises:

the processing unit is used for carrying out face image processing on the second face image for N times by using the face image processing model to obtain corresponding N groups of sub-feature vectors, wherein N is a positive integer;

and the second optimization unit is used for calculating a repeated training loss value according to the similarity of the N groups of sub-feature vectors and optimizing the parameters of the face image processing model according to the repeated training loss value.

Optionally, the sub-feature vector comprises a supplementary sub-feature vector, the apparatus further comprising:

and the third optimization unit is used for inputting the supplementary sub-feature vectors in the N groups of sub-feature vectors into a discriminator so as to optimize the parameters of the face image processing model according to the output result of the discriminator.

According to a fifth aspect of the present application, there is provided a face image generation apparatus, including:

the first acquisition unit is used for acquiring an original face image;

the first generating unit is used for generating a sub-feature vector set of the original face image by using a face image processing model, wherein the face image processing model is obtained by training based on the face image processing model training device;

the replacing unit is used for obtaining an age characteristic vector of a target age, and replacing the age sub-characteristic vector in the sub-characteristic vector set with the age characteristic vector to obtain a replaced sub-characteristic vector set;

and the second generation unit is used for generating a face age change image corresponding to the original face image by using the face image processing model and the replaced sub-feature vector set.

According to a sixth aspect of the present application, there is provided a face age estimation device, comprising:

the second acquisition unit is used for acquiring a face image;

the input unit is used for inputting the face image into an encoder of a face image processing model to obtain an age sub-feature vector of the face image, wherein the face image processing model is obtained by training based on the face image processing model training device;

and the determining unit is used for determining an age estimation result according to the age sub-feature vector.

According to a seventh aspect of the present application, there is provided an electronic apparatus comprising: a processor; and a memory arranged to store computer executable instructions that, when executed, cause the processor to perform a face image processing model training method as described above, or cause the processor to perform a face image generation method as described above, or cause the processor to perform a face age estimation method as described above.

According to an eighth aspect of the present application, there is provided a computer-readable storage medium, wherein the computer-readable storage medium stores one or more programs which, when executed by a processor, implement the face image processing model training method as described above, or implement the face image generation method as described above, or implement the face age estimation method as described above.

According to the technical scheme, the encoder of the face image processing model is used for encoding the first face image to obtain the first feature vector of the first face image; splitting the first feature vector into a plurality of sub-feature vectors, the sub-feature vectors including at least an identity sub-feature vector and an age sub-feature vector; determining a vector to be decoded according to each sub-feature vector, and decoding the vector to be decoded by using a decoder of the face image processing model to obtain a second face image; according to the regression loss values of the first face image and the second face image, parameters of the face image processing model are optimized, the technical problem that the model obtained based on face image age editing algorithm training in the related technology is not ideal is solved, the face image processing model obtained through training of the application reduces the dependence on data distribution, is more robust to long-tail data with unbalanced ages, and can generate a face aging image with better effect.

The foregoing description is only an overview of the technical solutions of the present application, and the present application can be implemented according to the content of the description in order to make the technical means of the present application more clearly understood, and the following detailed description of the present application is given in order to make the above and other objects, features, and advantages of the present application more clearly understandable.

Drawings

Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the application. Also, like reference numerals are used to refer to like parts throughout the drawings. In the drawings:

FIG. 1 is a flow diagram illustrating a method for training a face image processing model according to an embodiment of the present application;

FIG. 2 is a schematic diagram illustrating a training process of a face image processing model according to an embodiment of the present application;

FIG. 3 is a flow chart diagram illustrating a method for generating a face image according to an embodiment of the present application;

FIG. 4 is a schematic diagram illustrating the face aging effect of a face image according to an embodiment of the present application;

FIG. 5 illustrates an age exchange effect diagram of a face image according to one embodiment of the present application;

FIG. 6 is a diagram illustrating the effect of generating age images that are not present in a training set according to one embodiment of the present application;

FIG. 7 is a flow chart diagram illustrating a method for estimating age of a human face according to an embodiment of the present application;

FIG. 8 is a schematic structural diagram of a face image processing model training apparatus according to an embodiment of the present application;

fig. 9 is a schematic structural diagram of a face image generation apparatus according to an embodiment of the present application;

fig. 10 is a schematic structural diagram of a face age estimation device according to an embodiment of the present application;

FIG. 11 shows a schematic structural diagram of an electronic device according to an embodiment of the present application;

FIG. 12 shows a schematic structural diagram of a computer-readable storage medium according to an embodiment of the present application.

Detailed Description

Exemplary embodiments of the present application will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present application are shown in the drawings, it should be understood that the present application may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

The existing age editing algorithm based on the face images depends on specific data sets which are always in accordance with long-tail distribution, and the difference of the number of images at different ages is large, so that the model obtained by training is not ideal, and the generated face images have poor effect at certain ages. Although the age editing algorithm of the face image based on the generation countermeasure network is widely applied, the method also depends on the data distribution of the data set, so that a small number of face age images cannot be synthesized.

Based on this, an embodiment of the present application provides a method for training a face image processing model, as shown in fig. 1, the method includes the following steps S110 to S140:

and step S110, encoding the first face image by using the encoder of the face image processing model to obtain a first feature vector of the first face image.

A Variational Auto-Encoder (VAE) is a deep hidden space generation model, and the principle is to encode original data into hidden variables conforming to specific distribution, and then restore the approximate probability distribution of the original data according to the generated hidden variable probability distribution. VAE is an unsupervised model that can generate output data similar to input and is therefore widely used in the field of image generation. VAEs generate images from their accepted distributions and allow potentially complex priors to be set up, learning powerful potential characterizations.

The face image processing model of the embodiment of the application is based on a model frame of a variational self-encoder and also comprises two connected networks: an encoder and a decoder. The purpose of the encoder is to generate a feature mapping Z by obtaining an input X, and in application scenes such as face recognition, image generation and the like, a face image to be processed can be used as the input of the encoder, so that a feature vector corresponding to the face image is obtained.

Step S120, splitting the first feature vector into a plurality of sub-feature vectors, where the sub-feature vectors at least include an identity sub-feature vector and an age sub-feature vector.

The vector splitting of the obtained first feature vector is equivalent to a feature decoupling process, and the decoupling process is essentially to model key factors (factors) influencing the data form, so that the change of a certain key factor only causes the change of data on a certain feature, and other features are not influenced. For example, if the human face can be successfully decoupled and represented, the skin color of a human face can be changed by changing the corresponding key factor (which may be a certain dimension of a low-dimensional hidden variable), while the hair style, the five sense organs and other features of the human face remain unchanged.

In the scene of age image generation, key features influencing a face image mainly comprise identity features, age features and the like, the age feature vector is equivalent to a variable quantity, except that the generated age image is close to a real image as much as possible, and more importantly, under the condition that some basic attribute features such as the identity features and the like are not changed, images corresponding to a certain person at different ages are generated. Therefore, the feature vectors extracted from the face image are decoupled into the identity sub-feature vector and the age sub-feature vector to be trained and learned respectively, and then the model can fully learn the identity feature representation and the age feature representation in the face image, so that when the age feature vector of the face image is changed, the face images of other ages corresponding to the person can be obtained under the condition that the identity feature is not changed.

Step S130, determining a vector to be decoded according to each sub-feature vector, and decoding the vector to be decoded by using a decoder of the face image processing model to obtain a second face image.

Under the premise that the encoder is not constrained or trained, the feature vector obtained after the encoder encodes the image is usually not the feature vector which is wanted by the encoder, so that the sub-feature vector obtained preliminarily by the encoder can be processed in some ways to obtain the vector to be decoded which can be input into a decoder for decoding. Specifically, after each decoupled sub-feature vector is obtained, the sub-feature vectors are subjected to constraint processing in combination with a priori data and the like, and then a vector to be decoded is determined. For example, the gaussian distribution model is used for performing gaussian distribution fitting processing on the age characteristic vector so that the trained model can generate as many new images as possible, the knowledge distillation model can be used for processing the identity characteristic vector, and the like, so that the model structure is simplified, and a good model effect can be ensured.

And step S140, optimizing parameters of the facial image processing model according to the regression loss values of the first facial image and the second facial image.

The purpose of the face image processing model training is to make a newly generated image and a real image as close as possible, so that the model has high image generation capacity and generalization capacity, and therefore after a second face image is obtained, the degree of proximity of the second face image and a first face image can be judged in some ways. In the embodiment of the application, in order to ensure that a newly generated image is as close as possible to a real image, a regression loss value between a first facial image and a second facial image can be calculated in an L1 regression loss function mode, and parameters of a facial image processing model are optimized by reducing loss as much as possible, so that a facial image processing model with better performance is obtained.

The regression Loss (L1 Loss), called Mean Absolute Error (MAE), of L1 is used to measure the average distance between the predicted value f (x) and the true value y, and the formula is as follows:

where n is the number of samples, f (x)_i) For the model prediction value, it may refer to the second face image, y in the embodiment of the present application_iThe true value of the model may be referred to as a first face image in this embodiment, and i is an integer not less than 1.

It should be noted that the encoder and the decoder in the embodiment of the present application may adopt any model, for example, a neural network model may be used as the encoder and the decoder. The input data is reduced to a code through a neural network, then a generated data similar to the input original data is obtained through decoding by another neural network, and then the parameters of a coder and a decoder in the network are trained by comparing the two data and minimizing the difference between the two data.

In an embodiment of the present application, the determining a vector to be decoded according to each of the sub-feature vectors includes: acquiring an identity reference characteristic vector of the first face image by using a face recognition model; and fitting the identity sub-feature vector according to the identity reference feature vector, and taking the fitted identity sub-feature vector as the vector to be decoded.

The identity sub-feature vector preliminarily obtained by encoding the image by the encoder usually does not have any meaning, so in order to obtain the unique identity feature information capable of representing the person corresponding to the face image, the identity sub-feature vector preliminarily output by the encoder can be processed. Specifically, the first face image input to the encoder may be simultaneously input to a trained face recognition network model, and the identity reference feature vector corresponding to the first face image is extracted through the face recognition network, where the face recognition model is a network model capable of recognizing and outputting the identity feature vector in the face image. And then, determining a vector to be decoded according to the difference between the identity reference feature vector output by the face recognition model and the identity sub-feature vector output by the encoder, for example, if the difference between the identity reference feature vector and the identity sub-feature vector is smaller than a preset value, the identity sub-feature vector output by the encoder can be used as the vector to be decoded, and if the difference is larger than the preset value, the identity reference feature vector output by the face recognition model is used as the vector to be decoded of the face image.

In an embodiment of the present application, the determining a vector to be decoded according to each of the sub-feature vectors includes: and performing knowledge distillation on the encoder according to the identity reference feature vector and the identity sub-feature vector.

In order to enable the encoder in the embodiment of the application to automatically output the identity characteristic vector capable of representing the person corresponding to the face image and improve the training efficiency of the model, the encoder can be trained through the knowledge distillation model. Knowledge Distillation (KD) refers to the transfer of Knowledge learned by a complex model or multiple models (teacher model) to another lightweight model (student model), so that the models can be conveniently deployed while being light, and the performance of the models is not lost as much as possible. In the embodiment of the application, the face recognition model is equivalent to a teacher model, the encoder is equivalent to a student model, in order to enable the encoder to learn the output of the face recognition model as much as possible, a loss value can be calculated by calculating the difference between an identity reference feature vector output by the face recognition model and an identity sub-feature vector output by the encoder, and the encoder is trained by continuously reducing the loss value so as to have identity feature generation capability basically the same as that of a face recognition network.

In an embodiment of the present application, the determining a vector to be decoded according to each of the sub-feature vectors includes: fitting the age sub-feature vector with a Gaussian distribution model, and taking the age sub-feature vector subjected to fitting as the vector to be decoded; and carrying out Gaussian prior constraint on the encoder according to the age sub-feature vector before fitting processing and the age sub-feature vector after fitting processing.

Gaussian distribution (also called Normal distribution), which is a data set obeyed to Gaussian distribution generated by a network by adding a constraint term in a variational self-encoder, so that relevant data can be arbitrarily taken according to the mean and variance rules of Gaussian distribution, and then a new sample is generated by a decoder.

In the scene of face image generation, the age sub-feature vector is a main variable for generating face images of different ages, so that the face images of different ages can be generated as far as possible by increasing the randomness of the age sub-feature vector. Specifically, the age sub-feature vector output by the encoder may be used as the vector to be decoded by fitting the age sub-feature vector output preliminarily by the encoder to a gaussian distribution model, if the distribution of the age sub-feature vector output by the encoder conforms to the gaussian distribution, and if the distribution of the age sub-feature vector output by the encoder does not conform to the gaussian distribution, the age sub-feature vector may be adjusted so as to conform to the gaussian distribution, and the adjusted age sub-feature vector may be used as the vector to be decoded. Similarly, in order to enable the encoder to automatically output the age sub-feature vector conforming to the gaussian distribution and increase the randomness of data sample sampling, the loss value can be calculated according to the difference between the age sub-feature vector before fitting processing and the age sub-feature vector after fitting processing, and the encoder is trained by continuously reducing the loss value, so that the age sub-feature vector output by the trained encoder conforms to the gaussian prior constraint.

The object added with the Gaussian prior constraint in the embodiment of the application mainly refers to an age characteristic vector, because in a face image generation scene, the identity characteristic attribute of a person is relatively fixed and basic, and the distribution of the age characteristic vector accords with Gaussian distribution, so that face images at different ages can be generated as far as possible.

In an embodiment of the present application, the decoding, by the decoder using the face image processing model, the vector to be decoded to obtain a second face image includes: carrying out batch normalization processing on the vector to be decoded determined according to the identity sub-feature vector to obtain a low-level feature vector, and inputting the low-level feature vector into the decoder to obtain a first feature input result; and carrying out batch normalization processing on the vector to be decoded determined according to the age sub-feature vector to obtain a high-level feature vector, and inputting the high-level feature vector into the decoder based on the first feature input result to obtain the second face image.

After the vector to be decoded is obtained, the identity sub-feature vector and the age sub-feature vector in the vector to be decoded can be input into a decoder in a certain mode. Specifically, in a face image generation scene, the identity sub-feature vector is a relatively basic feature, and on the premise that the identity feature of a person is determined, face images of different ages can be generated by changing the age feature, so that the identity sub-feature vector can be input as a low-level feature vector of a decoder, and on the basis, the age sub-feature vector is input as a high-level feature vector of the decoder.

In addition, in the prior art, in the neural network training, only the normalization processing is generally carried out on the input layer data, but the normalization processing is not carried out in the middle layer. It is to be noted that although the input data is normalized, the data distribution of the input data is likely to be changed after a series of matrix multiplication and nonlinear operation, and the data distribution will be changed more and more with the multi-layer operation of the deep network. If normalization processing can be carried out in the middle of the network, the training of the network can be improved. The method for performing normalization processing on the intermediate layer of the neural network to improve the training effect is Batch normalization (BN for short). The BN can accelerate the training speed in the neural network training and improve the model training precision. In the embodiment of the application, the age sub-feature vector and the identity sub-feature vector are relatively important feature vectors, so that in order to avoid the problem that the training effect is poor due to the reduction of the model training speed caused by the change of data distribution after the age sub-feature vector and the identity sub-feature vector are input into a network, the identity sub-feature vector and the age sub-feature vector are respectively subjected to batch normalization processing before the vectors to be decoded are input into a decoder, so that the training precision of a face image processing model is improved.

In one embodiment of the present application, the sub-feature vector further comprises a supplementary sub-feature vector, the method further comprising: inputting the supplementary sub-feature vector into the decoder after inputting the lower-layer feature vector into the decoder to obtain a first feature input result and before inputting the higher-layer feature vector into the decoder.

In scenes such as face image generation, besides the identity sub-feature vector and the age sub-feature vector, other attribute feature vectors such as human hair, expression, image background and the like can be involved, the other attribute feature vectors except the identity sub-feature vector and the age sub-feature vector are used as supplementary sub-feature vectors, and the learning of the face image processing model on the detail features of the face can be enhanced by increasing the coding capability of a coder on the supplementary sub-feature vectors and the decoding capability of a decoder on the supplementary sub-features, so that a face image with higher reality degree can be generated.

When the supplementary sub-feature vector which can be input into the decoder is determined, Gaussian prior constraint can be added to the encoder which outputs the supplementary sub-feature vector as with the age sub-feature vector, so that the supplementary sub-feature vector output by the encoder conforms to Gaussian distribution, and further, the model can capture and learn more human face detail features. The supplementary sub-feature vector may be input to the decoder after the identity sub-feature vector is input to the decoder and before the age sub-feature vector is input to the decoder, or the supplementary sub-feature vector and the identity sub-feature vector may be input simultaneously, and the specific input order of the supplementary sub-feature vector may be flexibly set according to the actual situation, which is not listed herein.

In one embodiment of the present application, the method further comprises: for the second face image, carrying out face image processing for N times by using the face image processing model to obtain corresponding N groups of sub-feature vectors, wherein N is a positive integer; and calculating a repeated training loss value according to the similarity of the N groups of sub-feature vectors, and optimizing the parameters of the face image processing model according to the repeated training loss value.

In order to further improve the performance of the model, the first face image generated by the decoder can be sent into the encoder again to obtain a new feature vector, the new feature vector is sent into the decoder according to the method, a new image is generated, the process is repeated for N times to obtain corresponding N groups of sub-feature vectors, the N groups of identity sub-feature vectors and the age sub-feature vectors obtained after N times of training are constrained to be the same, the parameters of the model are continuously optimized, and therefore the image generated by the face image processing model after each training can keep high consistency. The selection of the specific iteration times can be flexibly set by those skilled in the art according to actual situations, and is not specifically limited herein.

In one embodiment of the present application, the sub-feature vector comprises a supplementary sub-feature vector, the method further comprising: and inputting the supplementary sub-feature vectors in the N groups of sub-feature vectors into a discriminator so as to optimize the parameters of the human face image processing model according to the output result of the discriminator.

The Discriminator is a Discriminator in a Generative Adaptive Networks (GANs), and the Generative adaptive Networks are composed of two basic Neural Networks, namely, a Generative Neural Network (Generative Neural Network) and a Discriminator Neural Network (Discriminator Neural Network), one of which is used for generating content and the other is used for discriminating the generated content. The generator produces synthetic data from a given noise (typically a uniform or normal distribution) and the discriminator resolves the output of the generator from the true data. The former attempts to produce data that is closer to the true one, and correspondingly, the latter attempts to more perfectly distinguish the true data from the generated data.

In the repeated training process of the face image generation model of the embodiment of the application, in order to enable the model to capture and learn more detailed features of the face image and improve the truth of the generated image, a discriminator can be added behind a decoder network, for example, a supplementary sub-feature vector obtained by the face image processing model is an expression sub-feature vector, the corresponding expression of a person is smile, the expression of a person of a real image is sadness, and the difference between the supplementary sub-feature vector obtained by the face image processing model and the supplementary sub-feature vector of the real image can be discriminated by inputting the two feature vectors into the discriminator network. Parameters of the face image processing model are continuously optimized through the output result of the discriminator, so that the face image processing model can generate a face image with higher reality degree.

As shown in fig. 2, a schematic diagram of a training process of a face image processing model is provided. Firstly, inputting a first face image X into an encoder E for encoding to obtain a feature vector Z corresponding to the first face image X, splitting the feature vector to obtain a plurality of sub-feature vectors including an identity sub-feature vector, an age sub-feature vector and a supplementary sub-feature vector, and determining a vector to be decoded according to the sub-feature vectors, wherein the identity sub-feature vector Z in the vector to be decoded_IObtained by knowledge distillation of KD, age sub-feature vector Z_AAnd the supplementary sub-feature vector Z_EObtained by adding a Gaussian prior distribution (standard normal distribution). Then the identity sub-feature vector Z_IAnd age sub-feature vector Z_AAnd respectively carrying out batch normalization processing BN, inputting the vector to be coded into a decoder G in the sequence of identity sub-feature vector first and age sub-feature vector last, finally decoding by the decoder to generate a second face image Xr, and optimizing the face image processing model by calculating the regression loss between the second face image Xr and the first face image X.

In order to further improve the generating capability of the model, the second face image Xr output by the decoder can be input into the encoder E to generate a new feature vector Z ', the new feature vector Z' is input into the decoder after being processed, and a new face image Xs is generated, so that the training process is repeated N, and the identity sub-feature vector and the age sub-feature vector generated by each training are constrained to be the same. In addition, in order to make the generated image more realistic, the supplementary sub-feature vector Z generated by each training is_E’、Z_E"separately input to the discriminator to complement the realityCharger eigenvector Z_EAnd comparing, and further continuously optimizing the parameters of the model, so that the supplementary sub-feature vector output by the model each time is as close to the real data as possible, and the quality of the generated image is improved.

The present application further provides a method for generating a face image, as shown in fig. 3, the method includes the following steps S310 to S340:

step S310, an original face image is acquired.

The original face image is a precondition basis for generating a new face age image, so that the original face image of the user can be acquired firstly in the generation scene of the face age image.

Step S320, generating a sub-feature vector set of the original face image by using a face image processing model, wherein the face image processing model is obtained by training based on the face image processing model training method.

The face image processing model in the embodiment of the application is obtained by training based on the face image processing model training method. Specifically, a first face image is encoded by using an encoder of a face image processing model to obtain a first feature vector of the first face image; splitting the first feature vector into a plurality of sub-feature vectors, the sub-feature vectors including at least an identity sub-feature vector and an age sub-feature vector; determining a vector to be decoded according to each sub-feature vector, and decoding the vector to be decoded by using a decoder of the face image processing model to obtain a second face image; and optimizing parameters of the face image processing model according to the regression loss values of the first face image and the second face image.

And inputting the obtained original face image into an encoder of the face image processing model to extract a feature vector, and finally obtaining a sub-feature vector set corresponding to the original face image, wherein the sub-feature vector set can comprise an identity sub-feature vector, an age sub-feature vector, a supplementary sub-feature vector and the like.

Step S330, acquiring an age characteristic vector of the target age, and replacing the age sub-characteristic vector in the sub-characteristic vector set with the age characteristic vector to obtain a replaced sub-characteristic vector set.

In order to generate a new face age image, an age feature vector corresponding to a target age that a user wants to generate may also be obtained, for example, an original face image is a face image of 20 years old, and a face image of 40 years old is generated by the user, the target age of 40 years old is converted into an age feature vector corresponding to the target age, and the obtained age sub-feature vector in the sub-feature set corresponding to the original face image is replaced with the age feature vector corresponding to the target age, that is, the age feature vector corresponding to the target age is combined with the identity sub-feature vector and the supplementary sub-feature vector in the sub-feature vector set corresponding to the original face image to form a new sub-feature vector set.

And step S340, generating a face age change image corresponding to the original face image by using the face image processing model and the replaced sub-feature vector set.

And taking the recombined sub-feature vector set as the input of a decoder of the face image processing model, and decoding by the decoder to obtain a face age change image corresponding to the original face image.

As shown in fig. 4, a schematic diagram of the face aging effect of a face image is provided, and by replacing the age feature vector obtained by the first encoder, an input image can be changed into a face image of various ages, and the face image of different ages and different sexes has a better generation effect.

As shown in fig. 5, an age exchange effect diagram of a face image is provided, that is, an age sub-feature vector of a is combined with an identity sub-feature vector of b and a supplementary sub-feature vector to obtain a new image of b corresponding to the age of a, and the two images exchange age sub-feature vectors with each other to mutually transform the ages of the finally obtained face images.

As shown in table 1, as a result of Face verification in the embodiment of the present application, Face verification is performed on the images AG1 (age 30-40), AG2 (age 40-50), AG3 (age 50+) and the real image (Testing Face) generated by the Face image processing model, and the Face verification results (Ours) obtained based on the Face image processing model in the embodiment of the present application are all over 98%, which proves that the identity characteristics of the Face are well maintained in the image generation process.

TABLE 1

The embodiment of the application can also generate age images which do not exist in the training set, and as shown in fig. 6, an effect schematic diagram for generating the age images which do not exist in the training set is provided, wherein the age range of the Morph data set is 16-77 years old, and the generation method of the embodiment of the application can predict the age image of the face of 13 years old. (the first line in FIG. 6 is the image of the input model, the second line is the model predicted 13 year old face image).

The present application further provides a method for estimating a face age, as shown in fig. 7, the method includes the following steps S710 to S730:

step S710, a face image is acquired.

The face image is a prerequisite basis for estimating the face age, so that the face image of the user can be acquired firstly in the scene of face age estimation.

Step S720, inputting the face image into an encoder of a face image processing model to obtain an age sub-feature vector of the face image, wherein the face image processing model is obtained by training based on the face image processing model training method.

And inputting the obtained face image into an encoder of a face image processing model, and encoding through the encoder to obtain an age sub-feature vector corresponding to the face image. The face image processing model in the embodiment of the application is obtained by training based on the face image processing model training method. Specifically, a first face image is encoded by using an encoder of a face image processing model to obtain a first feature vector of the first face image; splitting the first feature vector into a plurality of sub-feature vectors, the sub-feature vectors including at least an identity sub-feature vector and an age sub-feature vector; determining a vector to be decoded according to each sub-feature vector, and decoding the vector to be decoded by using a decoder of the face image processing model to obtain a second face image; and optimizing parameters of the face image processing model according to the regression loss values of the first face image and the second face image.

And step S730, determining an age estimation result according to the age sub-feature vector.

And converting the age sub-feature vector obtained after encoding by the encoder into the actual age, and outputting the actual age as a final age estimation result.

As shown in table 2, as a result of comparing the age estimation performance of the embodiment of the present application, the embodiment of the present application directly predicts the corresponding age of the face image by using an encoder in the face image processing model, the age estimation method adopted by the article at the conference 2018 where ThinAgeNet and SSR-Net are IJCAI (International joint conference on Artificial intelligence l1 gene) in table 2, and the age estimation method adopted by the article at the conference 2018 where M-V Loss is CVPR (IEEE conference on Computer Vision and Pattern Recognition, IEEE International Computer Vision and Pattern Recognition conference). The evaluation index in table 2 is MAE (mean absolute error) on the Morph data set, that is, the average of absolute values of the predicted age and the actual age, and it can be seen that the encoder obtained based on the embodiment of the present application has very good effect on the task of age estimation.

TABLE 2

As shown in table 3, to generate an age estimation result according to the embodiment of the present application, age estimation is performed on AG1 (ages 30 to 40), AG2 (ages 40 to 50), and AG3 (age 50+) Data generated according to the embodiment of the present application by using an age estimation tool provided in Face + +, and the final age estimation values are averaged, so that the Face age estimation result (Ours) according to the embodiment of the present application is found to be closer to the Real Data (Real Data), and the Face age estimation method according to the embodiment of the present application can generate a more accurate age.

TABLE 3

The present application further provides a facial image processing model training apparatus 800, as shown in fig. 8, the apparatus 800 includes an encoding unit 810, a splitting unit 820, a decoding unit 830, and a first optimizing unit 840:

the encoding unit 810 of the embodiment of the present application is configured to encode a first face image by using an encoder of the face image processing model, so as to obtain a first feature vector of the first face image.

The splitting unit 820 of the embodiment of the present application is configured to split the first feature vector into a plurality of sub-feature vectors, where the sub-feature vectors at least include an identity sub-feature vector and an age sub-feature vector.

The decoding unit 830 of the embodiment of the application is configured to determine a vector to be decoded according to each sub-feature vector, and decode the vector to be decoded by using the decoder of the face image processing model to obtain a second face image.

The first optimization unit 840 of the embodiment of the application is configured to optimize parameters of the facial image processing model according to regression loss values of the first facial image and the second facial image.

In an embodiment of the present application, the decoding unit 830 is further configured to: acquiring an identity reference characteristic vector of the first face image by using a face recognition model; and fitting the identity sub-feature vector according to the identity reference feature vector, and taking the fitted identity sub-feature vector as the vector to be decoded.

In an embodiment of the present application, the decoding unit 830 is further configured to: and performing knowledge distillation on the encoder according to the identity reference feature vector and the identity sub-feature vector.

In an embodiment of the present application, the decoding unit 830 is further configured to: fitting the age sub-feature vector with a Gaussian distribution model, and taking the age sub-feature vector subjected to fitting as the vector to be decoded; and carrying out Gaussian prior constraint on the encoder according to the age sub-feature vector before fitting processing and the age sub-feature vector after fitting processing.

In an embodiment of the present application, the decoding unit 830 is further configured to: carrying out batch normalization processing on the vector to be decoded determined according to the identity sub-feature vector to obtain a low-level feature vector, and inputting the low-level feature vector into the decoder to obtain a first feature input result; and carrying out batch normalization processing on the vector to be decoded determined according to the age sub-feature vector to obtain a high-level feature vector, and inputting the high-level feature vector into the decoder based on the first feature input result to obtain the second face image.

In an embodiment of the present application, the decoding unit 830 is further configured to: inputting the supplementary sub-feature vector into the decoder after inputting the lower-layer feature vector into the decoder to obtain a first feature input result and before inputting the higher-layer feature vector into the decoder.

In one embodiment of the present application, the apparatus 800 further comprises: the processing unit is used for carrying out face image processing on the second face image for N times by using the face image processing model to obtain corresponding N groups of sub-feature vectors, wherein N is a positive integer; and the second optimization unit is used for calculating a repeated training loss value according to the similarity of the N groups of sub-feature vectors and optimizing the parameters of the face image processing model according to the repeated training loss value.

In one embodiment of the present application, the sub-feature vector comprises a supplementary sub-feature vector, and the apparatus 800 further comprises: and the third optimization unit is used for inputting the supplementary sub-feature vectors in the N groups of sub-feature vectors into a discriminator so as to optimize the parameters of the face image processing model according to the output result of the discriminator.

The present application further provides a device 900 for generating a face image, as shown in fig. 9, the device 900 includes a first obtaining unit 910, a first generating unit 920, a replacing unit 930, and a second generating unit 940:

the first obtaining unit 910 of the embodiment of the present application is configured to obtain an original face image.

The first generating unit 920 of the embodiment of the application is configured to generate a sub-feature vector set of the original face image by using a face image processing model, where the face image processing model is obtained by training based on the face image processing model training apparatus as described above.

The face image processing model in the embodiment of the application is obtained by training based on the face image processing model training device. Specifically, a first face image is encoded by using an encoder of a face image processing model to obtain a first feature vector of the first face image; splitting the first feature vector into a plurality of sub-feature vectors, the sub-feature vectors including at least an identity sub-feature vector and an age sub-feature vector; determining a vector to be decoded according to each sub-feature vector, and decoding the vector to be decoded by using a decoder of the face image processing model to obtain a second face image; and optimizing parameters of the face image processing model according to the regression loss values of the first face image and the second face image.

The replacing unit 930 in the embodiment of the present application is configured to obtain an age feature vector of a target age, replace an age sub-feature vector in the sub-feature vector set with the age feature vector, and obtain a replaced sub-feature vector set.

The second generating unit 940 in the embodiment of the application is configured to generate a face age change image corresponding to the original face image by using a face image processing model and the replaced sub-feature vector set.

The present application further provides a face age estimation apparatus 1000, as shown in fig. 10, the apparatus 1000 includes a second obtaining unit 1010, an input unit 1020, and a determining unit 1030:

the second obtaining unit 1010 of the embodiment of the application is configured to obtain a face image.

The input unit 1020 of the embodiment of the present application is configured to input the face image into an encoder of a face image processing model, so as to obtain an age sub-feature vector of the face image, where the face image processing model is obtained by training based on the face image processing model training apparatus as described above.

And inputting the obtained face image into an encoder of a face image processing model, and encoding through the encoder to obtain an age sub-feature vector corresponding to the face image. The face image processing model in the embodiment of the application is obtained by training based on the face image processing model training device. Specifically, a first face image is encoded by using an encoder of a face image processing model to obtain a first feature vector of the first face image; splitting the first feature vector into a plurality of sub-feature vectors, the sub-feature vectors including at least an identity sub-feature vector and an age sub-feature vector; determining a vector to be decoded according to each sub-feature vector, and decoding the vector to be decoded by using a decoder of the face image processing model to obtain a second face image; and optimizing parameters of the face image processing model according to the regression loss values of the first face image and the second face image.

The determining unit 1030 according to this embodiment of the present application is configured to determine an age estimation result according to the age sub-feature vector.

It should be noted that, for the specific implementation of each apparatus embodiment, reference may be made to the specific implementation of the corresponding method embodiment, which is not described herein again.

In summary, according to the technical scheme of the application, a first face image is encoded by using an encoder of the face image processing model to obtain a first feature vector of the first face image; splitting the first feature vector into a plurality of sub-feature vectors, the sub-feature vectors including at least an identity sub-feature vector and an age sub-feature vector; determining a vector to be decoded according to each sub-feature vector, and decoding the vector to be decoded by using a decoder of the face image processing model to obtain a second face image; according to the regression loss values of the first face image and the second face image, parameters of the face image processing model are optimized, the technical problem that the model obtained based on face image age editing algorithm training in the related technology is not ideal is solved, the face image processing model obtained through training of the application reduces the dependence on data distribution, is more robust to long-tail data with unbalanced ages, can generate a face aging image with a better effect, and can obtain a better effect on an age estimation task.

It should be noted that:

the algorithms and displays presented herein are not inherently related to any particular computer, virtual machine, or other apparatus. Various general purpose devices may be used with the teachings herein. The required structure for constructing such a device will be apparent from the description above. In addition, this application is not directed to any particular programming language. It will be appreciated that a variety of programming languages may be used to implement the teachings of the present application as described herein, and any descriptions of specific languages are provided above to disclose the best modes of the present application.

In the description provided herein, numerous specific details are set forth. However, it is understood that embodiments of the application may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.

Similarly, it should be appreciated that in the foregoing description of exemplary embodiments of the application, various features of the application are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the application and aiding in the understanding of one or more of the various inventive aspects. However, the disclosed method should not be interpreted as reflecting an intention that: this application is intended to cover such departures from the present disclosure as come within known or customary practice in the art to which this invention pertains. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this application.

Those skilled in the art will appreciate that the modules in the device in an embodiment may be adaptively changed and disposed in one or more devices different from the embodiment. The modules or units or components of the embodiments may be combined into one module or unit or component, and furthermore they may be divided into a plurality of sub-modules or sub-units or sub-components. All of the features disclosed in this specification (including any accompanying claims, abstract and drawings), and all of the processes or elements of any method or apparatus so disclosed, may be combined in any combination, except combinations where at least some of such features and/or processes or elements are mutually exclusive. Each feature disclosed in this specification (including any accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.

Furthermore, those skilled in the art will appreciate that while some embodiments described herein include some features included in other embodiments, rather than other features, combinations of features of different embodiments are meant to be within the scope of the application and form different embodiments. For example, in the following claims, any of the claimed embodiments may be used in any combination.

The various component embodiments of the present application may be implemented in hardware, or in software modules running on one or more processors, or in a combination thereof. It will be appreciated by those skilled in the art that a microprocessor or Digital Signal Processor (DSP) may be used in practice to implement some or all of the functions of some or all of the components of the facial image processing model training apparatus, the facial image generation apparatus, and the facial age estimation apparatus according to the embodiments of the present application. The present application may also be embodied as apparatus or device programs (e.g., computer programs and computer program products) for performing a portion or all of the methods described herein. Such programs implementing the present application may be stored on a computer readable medium or may be in the form of one or more signals. Such a signal may be downloaded from an internet website or provided on a carrier signal or in any other form.

For example, fig. 11 shows a schematic structural diagram of an electronic device according to an embodiment of the present application. The electronic device 1100 comprises a processor 1110 and a memory 1120 arranged to store computer executable instructions (computer readable program code). The memory 1120 may be an electronic memory such as a flash memory, an EEPROM (electrically erasable programmable read only memory), an EPROM, a hard disk, or a ROM. The memory 1120 has a storage space 1130 storing computer readable program code 1131 for performing any of the method steps described above. For example, the memory space 1130 for storing the computer readable program code may include respective computer readable program codes 1131 for respectively implementing various steps in the above methods. The computer readable program code 1131 may be read from and written to one or more computer program products. These computer program products comprise a program code carrier such as a hard disk, a Compact Disc (CD), a memory card or a floppy disk. Such a computer program product is typically a computer readable storage medium such as described in fig. 12. FIG. 12 shows a schematic diagram of a computer-readable storage medium according to an embodiment of the present application. The computer readable storage medium 1200 stores computer readable program code 1131 for performing the steps of the method according to the present application, which is readable by the processor 1110 of the electronic device 1100, and when the computer readable program code 1131 is executed by the electronic device 1100, causes the electronic device 1100 to perform the steps of the method described above, in particular the computer readable program code 1131 stored by the computer readable storage medium may perform the method shown in any of the embodiments described above. The computer readable program code 1131 may be compressed in a suitable form.

It should be noted that the above-mentioned embodiments illustrate rather than limit the application, and that those skilled in the art will be able to design alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The application may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The usage of the words first, second and third, etcetera do not indicate any ordering. These words may be interpreted as names.

Claims

1. A face image processing model training method is characterized by comprising the following steps:

2. The training method of a facial image processing model according to claim 1, wherein the determining a vector to be decoded according to each sub-feature vector comprises:

3. The training method of the facial image processing model according to claim 2, wherein the determining the vector to be decoded according to each sub-feature vector comprises:

4. The training method of a facial image processing model according to claim 1, wherein the determining a vector to be decoded according to each sub-feature vector comprises:

5. The method for training a face image processing model according to claim 1, wherein the decoding the vector to be decoded by using the decoder of the face image processing model to obtain a second face image comprises:

6. The method of claim 5, wherein the sub-feature vectors further comprise supplementary sub-feature vectors, the method further comprising:

7. The facial image processing model training method of claim 1, the method further comprising:

8. The method of claim 7, wherein the sub-feature vectors comprise complementary sub-feature vectors, the method further comprising:

9. A method for generating a face image is characterized by comprising the following steps:

acquiring an original face image;

generating a sub-feature vector set of the original face image by using a face image processing model, wherein the face image processing model is obtained by training based on the face image processing model training method of any one of claims 1 to 8;

10. A face age estimation method is characterized by comprising the following steps:

acquiring a face image;

inputting the face image into an encoder of a face image processing model to obtain an age sub-feature vector of the face image, wherein the face image processing model is obtained by training based on the face image processing model training method of any one of claims 1 to 8;

11. A facial image processing model training device is characterized by comprising:

12. An apparatus for generating a face image, comprising:

the first acquisition unit is used for acquiring an original face image;

a first generating unit, configured to generate a sub-feature vector set of the original face image by using a face image processing model, where the face image processing model is trained based on the face image processing model training apparatus of claim 11;

13. A face age estimation device, comprising:

the second acquisition unit is used for acquiring a face image;

an input unit, configured to input the face image into an encoder of a face image processing model, so as to obtain an age sub-feature vector of the face image, where the face image processing model is trained based on the face image processing model training apparatus according to claim 11;

14. An electronic device, wherein the electronic device comprises: a processor; and a memory arranged to store computer executable instructions that when executed cause the processor to perform a facial image processing model training method as claimed in any one of claims 1 to 8, or cause the processor to perform a generation method of a facial image as claimed in claim 9, or cause the processor to perform a facial age estimation method as claimed in claim 10.

15. A computer-readable storage medium, wherein the computer-readable storage medium stores one or more programs which, when executed by a processor, implement the facial image processing model training method of any one of claims 1 to 8, or implement the generation method of a facial image of claim 9, or implement the facial age estimation method of claim 10.