CN111860380A

CN111860380A - Face image generation method, device, server and storage medium

Info

Publication number: CN111860380A
Application number: CN202010731169.XA
Authority: CN
Inventors: 曹辰捷; 徐国强
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2020-07-27
Filing date: 2020-07-27
Publication date: 2020-10-30
Also published as: WO2022022043A1

Abstract

The embodiment of the application discloses a face image generation method, a face image generation device, a server and a storage medium, wherein the method comprises the following steps: acquiring a face image of a target person and an expression label corresponding to the face image; carrying out face detection on the face image to obtain a standard face image of the face image; performing expression synthesis by using an expression generation model according to the standard facial image and the expression label to obtain a first synthesized facial image; performing face synthesis according to the standard face image and the first synthesized face image to obtain a second synthesized face image; a face image including the second synthesized face image is generated. By the aid of the method and the device, the generated expression is more stable, and the editing requirements of a user on the expression, particularly the micro expression can be met. The application also relates to a block chain technology, which can write index information of a face image comprising a second synthetic face image into the block chain, and also relates to an image processing technology in the field of artificial intelligence.

Description

Face image generation method, device, server and storage medium

Technical Field

The present application relates to the field of computer technologies, and in particular, to a method and an apparatus for generating a face image, a server, and a storage medium.

Background

Deep learning is a popular technical research field in recent years, and related applications are also endless. For example, applications for generating a countermeasure network (GAN) generally include generating images of different styles, complementing images, generating high definition data for data enhancement of other machine learning models, and so on.

The wide and interesting application of GAN and the high technological threshold make GAN the focus of research in various high-tech companies and schools. Here we focus on the application of GAN to the aspect of expression generation. The traditional method for locally adjusting the character image through the GAN model has the defects that the generated expression is unstable, and is particularly unstable in a field scene. In addition, the number of expressions generated by the traditional method is very limited, and generally only a few expressions are supported, so that the requirements of users on the expressions, especially on the editing of micro-expression, cannot be met.

Disclosure of Invention

The embodiment of the application provides a face image generation method, a face image generation device, a server and a storage medium, which not only can enable the generated expression to be more stable, but also can meet the requirements of a user on the expression, particularly on the editing of micro-expression.

In a first aspect, an embodiment of the present application provides a face image generation method, including:

acquiring a face image of a target person and an expression label corresponding to the face image;

carrying out face detection on the face image to obtain a standard face image of the face image;

performing expression synthesis by using an expression generation model according to the standard facial image and the expression label to obtain a first synthesized facial image; the expression corresponding to the first synthesized facial image is the expression indicated by the expression label;

performing face synthesis according to the standard face image and the first synthesized face image to obtain a second synthesized face image; the expression corresponding to the second synthesized facial image is the expression indicated by the expression label;

generating a face image including the second synthesized face image.

Optionally, after generating the face image including the second synthesized face image, the method further includes:

utilizing an enhanced super-resolution generation confrontation network model to perform image adjustment on the face image comprising the second synthetic face image to obtain the adjusted face image comprising the second synthetic face image;

and outputting the adjusted face image comprising the second synthetic face image.

Optionally, the performing face detection on the face image to obtain a standard face image of the face image includes:

calling an image detection library to carry out face detection on the face image to obtain an original face image of the face image;

and carrying out face alignment on the original face image to obtain a standard face image of the face image.

Optionally, before performing expression synthesis by using the expression generation model according to the standard facial image and the expression label to obtain a first synthesized facial image, the method further includes:

acquiring a training data set, wherein the training data set comprises a plurality of facial images, and the facial images carry corresponding expression labels;

carrying out face detection on each face image in the training data set to obtain a face image set, wherein the face image set comprises standard face images corresponding to the face images;

training a countermeasure network model by using each standard facial image in the facial image set and an expression label carried by a facial image corresponding to the standard facial image, and obtaining the trained countermeasure network model as an expression generation model.

Optionally, the method further comprises:

calling a face recognition tool library to label each face image in the training data set to obtain first label data corresponding to each face image;

and according to the first labeling data corresponding to each face image, obtaining second labeling data corresponding to each face image as the expression label corresponding to each face image.

Optionally, the obtaining, according to the first annotation data corresponding to each face image, second annotation data corresponding to each face image as an expression label corresponding to each face image includes:

and normalizing the first labeling data corresponding to each face image to obtain second labeling data corresponding to each face image, wherein the second labeling data are used as expression labels corresponding to each face image.

Optionally, after performing expression synthesis by using the expression generation model according to the standard facial image and the expression label, the method further includes:

obtaining a characteristic parameter corresponding to the first synthetic face image;

the performing face synthesis according to the standard face image and the first synthesized face image to obtain a second synthesized face image includes:

and carrying out face synthesis according to the standard face image, the first synthesized face image and the characteristic parameters corresponding to the first synthesized face image to obtain a second synthesized face image.

In a second aspect, an embodiment of the present application provides a face image generating apparatus, including:

the system comprises an acquisition module, a display module and a display module, wherein the acquisition module is used for acquiring a face image of a target person and an expression label corresponding to the face image;

the processing module is used for carrying out face detection on the face image to obtain a standard face image of the face image;

the synthesis module is used for performing expression synthesis according to the standard facial image and the expression label by using an expression generation model to obtain a first synthesized facial image; the expression corresponding to the first synthesized facial image is the expression indicated by the expression label;

the synthesis module is further used for carrying out face synthesis according to the standard face image and the first synthesized face image to obtain a second synthesized face image; the expression corresponding to the second synthesized facial image is the expression indicated by the expression label;

the processing module is further configured to generate a face image including the second synthesized face image.

In a third aspect, an embodiment of the present application provides a server, including a processor and a memory, where the processor and the memory are connected to each other, where the memory is used to store a computer program, and the computer program includes program instructions, and the processor is configured to call the program instructions to execute the method according to the first aspect.

In a fourth aspect, the present application provides a computer-readable storage medium, which stores a computer program, where the computer program is executed by a processor to implement the method according to the first aspect.

In summary, the server may perform image processing on the face image of the target person to obtain a standard face image of the face image, and perform expression synthesis according to the standard face image and the expression label corresponding to the face image by using the expression generation model to obtain a first synthesized face image, so as to perform face synthesis according to the standard face image and the first synthesized face image to obtain a second synthesized face image and generate the face image including the second synthesized face image.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

Fig. 1A is a schematic flowchart of a face image generation method according to an embodiment of the present application;

fig. 1B is a schematic diagram of an image processing process for a face image according to an embodiment of the present disclosure;

FIG. 1C is a schematic diagram of a face synthesis process provided in an embodiment of the present application;

fig. 2A is a schematic flow chart of another face image generation method according to an embodiment of the present application;

fig. 2B is a schematic diagram of an image adjustment process for a face image according to an embodiment of the present disclosure;

fig. 3 is a schematic structural diagram of a face image generation apparatus according to an embodiment of the present application;

fig. 4 is a schematic structural diagram of a server according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be described below with reference to the drawings in the embodiments of the present application.

Fig. 1A is a schematic flow chart of a face image generation method according to an embodiment of the present application. The method may be applied to a server, which may be a server or a cluster of servers, and in particular, the method may comprise the steps of:

s101, obtaining a face image of a target person and an expression label corresponding to the face image.

The expression label may be a feature vector formed by a value of each expression unit in the at least one expression unit, that is, the expression label may include a value of each expression unit in the at least one expression unit. The expression unit can be used to describe an expression. In one embodiment, the expression Units may be referred to as Action Units (AU). At least one of the references may be one or more. Wherein, the at least one expression unit includes but is not limited to at least one of the following 17 expression units: raising inner eyebrow, raising outer eyebrow, frowning, raising eyelid, squinting, locking eye curtain, frowning nose, raising upper lip, raising mouth angle, pulling cheek muscle, pulling mouth angle, raising chin, pressing lip, locking lip, opening mouth, dropping jaw and closing eye. The expression units include, but are not limited to, the expression units listed above.

In one embodiment, the server may provide a plurality of emoticons to the user terminal. The user can upload the facial image of the target person and the target expression image in the plurality of expression images to the server by using the user terminal. The target expression image can be any expression image in the plurality of expression images, the server can receive the facial image of the target person and the target expression image uploaded by the user terminal, and the expression label corresponding to the target expression image is determined to serve as the expression label corresponding to the facial image according to the preset corresponding relation between the expression image and the expression label. In an application scenario, a user may click an image composition button, a user terminal may send an image composition instruction to a server in response to a click operation on the composition button, the image composition instruction may carry a face image of the target person and a target expression image, and the server may obtain the face image of the target person and the target expression image carried by the image composition instruction when receiving the image composition instruction.

In one embodiment, the server may provide a plurality of emoticon information to the user terminal. Each expression identification information may be used to identify an expression. The user can upload the facial image of the target person and the target expression identification information in the plurality of expression identification information to the server by using the user terminal. The target expression identification information may be any expression identification information in the plurality of expression identification information. The server can receive a facial image of a target character and target expression identification information (which can be target expression names such as sadness, anger and happiness) uploaded by a user, and can determine an expression label corresponding to the target expression identification information as an expression label corresponding to the facial image according to a preset corresponding relationship between the expression image and the expression identification information. In an application scenario, a user may click an image synthesis button, a user terminal may send an image synthesis instruction to a server in response to a click operation on the synthesis button, where the image synthesis instruction carries a face image of the target person and target expression identification information, and the server may obtain the face image of the target person and the target expression identification information carried by the image synthesis instruction when receiving the image synthesis instruction.

The two modes can realize facial expression synthesis by setting expression related data, and besides the two modes, the embodiment of the application can also achieve the process of fine editing of the expression by setting the value of the expression unit.

In one embodiment, the server may provide the user terminal with a setting item for each of a plurality of emoticons. The user may set a value of each expression unit based on the setting item of each expression unit, and may upload the face image of the target person and the value of each expression unit to the server using the user terminal. The server can receive the facial image of the target person uploaded by the user and the value of each expression unit, and can construct an expression label corresponding to the facial image according to the value of each expression unit. In an application scenario, a user may click an image composition button, the user terminal may send an image composition instruction to the server in response to a click operation on the composition button, where the image composition instruction carries a face image of the target person and a value of each expression unit, and the server may obtain the face image of the target person and the value of each expression unit carried by the image composition instruction when receiving the image composition instruction. Here, according to different actual application scenarios, a user may set a value of each expression unit in the above manner, and may also set values of some expression units in a plurality of expression units.

S102, carrying out face detection on the face image to obtain a standard face image of the face image.

In order to obtain a relatively standard face image, the server may perform face detection on the face image to obtain a standard face image of the face image. The face image is a face image of a target person. The standard face image described here is a standard face image of a target person.

In one embodiment, the server may call an image detection library, such as a dlib library, to perform face detection on the face image, so as to obtain a standard face image of the face image.

In an embodiment, the server may specifically invoke an image detection library to perform face detection on the face image to obtain an original face image of the face image, and perform face alignment on the original face image to obtain a standard face image of the face image. Namely, the server can intercept and correct the face image through the image detection library, so as to obtain a standard face image corresponding to the face image. The original face image is obtained after the face detection of the face image is carried out by calling an image detection library.

In one embodiment, the server performs face alignment on the original face image to obtain a standard face image of the face image in the following specific manner: the server determines the coordinates of a plurality of key points included in the original face image, and performs rigid transformation on the original face image based on the coordinates of the key points and the coordinates of a reference key point corresponding to each key point in the key points to obtain a transformed face image as a standard face image corresponding to the face image. Where rigid transformations may be referred to as global transformations. Rigid transformations may include translation, rotation, scaling, and the like. The rigid transformation is just a transformation of position and orientation, and does not change the shape of the face.

For example, referring to fig. 1B, when the plurality of key points are 5 key points shown as S1 in fig. 1B, the effect graph using the image adjustment process can be referred to as S2 in fig. 1B.

S103, performing expression synthesis by using an expression generation model according to the standard facial image and the expression label to obtain a first synthesized facial image.

In this embodiment of the application, the server may use the standard facial image and the expression tag as input data of an expression generation model, and perform expression synthesis through the expression generation model to obtain a first synthesized facial image. The expression corresponding to the first composite facial image may be the expression indicated by the emoji label. Also, the first synthesized face image may have facial features of the standard face image. The standard face image is a standard face image corresponding to a face image of a target person. The expression label is an expression label corresponding to the face image of the target person.

In one embodiment, the expression generation model may be obtained by training a generative countermeasure network model. When the expression generation model is obtained by training an opposing network model, the server can specifically perform expression synthesis through a generator of the expression generation model to obtain a first synthesized facial image.

In one embodiment, when the expression generation model is obtained by training an opposing network model, the expression generation model may be obtained specifically by: the method comprises the steps that a server obtains a training data set, wherein the training data set comprises a plurality of face images, and the face images carry corresponding expression labels; the server carries out face detection on each face image in the training data set to obtain a face image set, wherein the face image set comprises standard face images corresponding to the face images; the server trains the confrontation network model by using each standard facial image in the facial image set and the expression label carried by the facial image corresponding to the standard facial image, and the trained confrontation network model is obtained and used as an expression generation model.

The plurality of facial images mentioned above may be divided into at least one facial image set, for example, a first facial image set, a second facial image set, and a third facial image set, and so on. The first face image set may be, for example, a first high-quality face image set, such as FFHQ, the second face image set may be, for example, a second high-quality face image set, such as celebA-HQ, and the third face image set may be, for example, a face expression image set, such as Emotion Net. Alternatively, the first face image data set may be a data set formed from face images selected from the first high-quality face image set. The second facial image data set may be a data set constructed from facial images screened from the second high quality facial image set. The third facial image data set may be a data set formed from facial images screened from the facial expression image set.

In one embodiment, the corresponding emoji label of the face image can be obtained by the following steps: and the server carries out algorithm annotation on the face image to obtain an expression label corresponding to the face image. Specifically, the server may call the face recognition tool library to label each face image in the training data set, to obtain first label data corresponding to each face image, and obtain second label data corresponding to each face image according to the first label data corresponding to each face image, as an expression label corresponding to each face image. The Face recognition tool library may be an Open source Face multifunctional tool library, such as an Open Face Open source Face multifunctional tool library. Above-mentioned process can realize automatic expression label mark, has promoted the mark efficiency to the expression label.

In an embodiment, the process of obtaining the first annotation data corresponding to each face image by the server calling the face recognition tool library to label each face image in the training data set may be that the server obtains coordinates of each key point in a plurality of key points included in each face image in the training data set through a Graphics Processing Unit (GPU), inputs the coordinates of each key point in the plurality of key points included in each face image and each face image into the face recognition tool library, and obtains the first annotation data corresponding to each face image by labeling the face recognition tool library according to the coordinates of each key point in the plurality of key points included in each face image and each face image.

In an embodiment, the process that the server obtains, according to the first annotation data corresponding to each of the face images, the second annotation data corresponding to each of the face images as the expression label corresponding to each of the face images is specifically that the server determines the first annotation data corresponding to each of the face images as the second annotation data corresponding to each of the face images, and uses the second annotation data corresponding to each of the face images as the expression label corresponding to each of the face images.

In an embodiment, the process that the server obtains, according to the first annotation data corresponding to each of the face images, the second annotation data corresponding to each of the face images as the expression label corresponding to each of the face images may further be a process that the server performs normalization processing on the first annotation data corresponding to each of the face images to obtain the second annotation data corresponding to each of the face images as the expression label corresponding to each of the face images. Because the value range of the value of the expression unit included in the first label data corresponding to each face image is large, generally 0-5, and an excessively large value may result in a face image with an exaggerated expression, in order to control the degree of the exaggeration of the expression, the normalization processing may be adopted to reduce the value of the expression unit in the first label data, so that the value range is controlled in a smaller interval, for example, 0-1.

In an embodiment, the process of performing face detection on each face image in the training data set by the server to obtain the face image set may be to call an image detection library for the server to perform face detection on each face image in the training data set to obtain an original face image corresponding to each face image, perform face alignment on the original face image corresponding to each face image to obtain a standard face image corresponding to each face image, and generate the face image set including the standard face image corresponding to each face image. The manner of performing face alignment on the original face image corresponding to each face image to obtain the standard face image corresponding to each face image may refer to the manner of performing face alignment on the original face image of the face image (the face image of the target person) to obtain the standard face image of the face image, which is not described herein in detail in this embodiment of the present application. Because the generation of the confrontation network model is generally sensitive, the face image is corrected in the training stage in such a way so as to improve the stability of the model.

In an embodiment, the server trains the confrontation network model by using each standard facial image in the facial image set and the expression label carried by the facial image corresponding to the standard facial image, and the rough flow of obtaining the trained confrontation network model as the expression generation model may be as follows: the server randomly selects two standard face images such as face _ a and face _ b from the face image set each time, acquires the value of an expression unit of the face _ a, namely an AU coefficient such as AU _ a, and acquires the value of an expression unit of the face _ b, namely an AU coefficient such as AU _ b (the face _ a and the face _ b may not be the same person); the server generates the fake _ b according to au _ b and fake _ a by using a generator included in the generation of the countermeasure network model; the server trains according to the face _ b and the face _ a through a discriminator included in the generation of the confrontation network model; the server restores the fake _ b into rec _ a through the generator by using au _ a, and calculates restoring losses of rec _ a and face _ a, namely L1-loss; and the server performs model training by using L1-loss until the model converges, and the trained confrontation network generation model is obtained and used as the expression generation model.

In one embodiment, in addition to obtaining the fake _ b, a feature parameter corresponding to the fake _ b, such as a mask parameter, may also be obtained. The value range of the mask parameter can be 0-1. The size of the mask parameter can reflect the importance of the region.

And S104, carrying out face synthesis according to the standard face image and the first synthesized face image to obtain a second synthesized face image.

In order to make the synthesized face image more realistic, the server may further perform face synthesis based on the standard face image and the first synthesized face image to obtain a second synthesized face image. The expression corresponding to the second composite facial image may be the expression indicated by the expression label. Also, the second composite face image may have facial features of the standard face image.

In an embodiment, after the server performs expression synthesis according to the standard facial image and the expression label by using the expression generation model, the server may further obtain feature parameters corresponding to the first synthesized facial image. The server may perform face synthesis according to the standard face image, the first synthesized face image, and the feature parameters corresponding to the first synthesized face image to obtain a second synthesized face image.

In one embodiment, if the feature parameter corresponding to the first synthesized face image is a mask parameter, the face synthesis process may be as follows:

target map + map 2 (1-mask) formula 1.1;

here, the first synthesized face image may be shown in fig. 1 in formula 1.1, the standard face image may be shown in fig. 2 in formula 1.1, and the second synthesized face image may be a target image in formula 1.1.

Taking the face synthesis process diagram of fig. 1C as an example, the top left image of fig. 1C is a standard face image, the bottom left image of fig. 1C is a first synthesized face image, and the right image of fig. 1C is a second synthesized face image.

According to the method for synthesizing the face, the generated micro-expression and the face image are synthesized according to a reasonable proportion through an image attention mechanism, and the generated result is more stable. And the method can edit the facial expression in the natural environment without considering factors such as background and the like.

And S105, generating a face image comprising the second synthetic face image.

In the embodiment of the present application, the server may generate a face image including the second synthesized face image from the face image of the target person. This process, in brief, corresponds to the server replacing the original face image of the target person with the second synthesized face image to obtain the face image including the second synthesized face image, that is, the server restoring the face image including the second synthesized face image to the position of the original face image of the target person.

In one embodiment, the server may output a face image including the second composite face image, such as may be sent to a user terminal for viewing by a user. For the user, it is possible to see the face image after the expression of the target person is changed.

In one embodiment, at least one of the following improvements may be added in training the generative countermeasure network model:

1. spectral normalization spectral _ norm was introduced to limit the Lipschitz constraint (L constraint for short) of the model. In particular, the parameter w of the network of discriminators can be replaced by

So that the generated confrontation network model meets the L constraint, thereby improving the generalization capability and stability of the model. Wherein | w | purple₂Is the spectral norm of w. The penalty function used by the arbiter may be change-loss, which may be used for the classifier's binary classification process.

2. The generator joins the residual network. The residual error network is introduced, so that the generated face image can better restore the details of the face. The generator can perform convolution operation on the input features by using the residual error network of each convolution layer to obtain output features. That is, the residual network may be used to perform convolution operations based on the input features, resulting in output features.

3. The generator may refer to the styleGAN, introduce an Adaptive instantiation Normalization (AdaIN) module and uniform random noise after each convolution layer performs convolution operation, and perform AdaIN processing on output features obtained after the convolution operation, so as to improve the quality of the generated image. The AdaIN processing procedure can be seen in the following formula:

where x denotes the input feature and y denotes the value of the expression unit, i.e. the AU coefficient.

4. The generator introduces exponential decay, and the decay coefficient may be 0.9995. That is, each iteration of the generator, the parameters of the generator are updated, and each step is: old parameter 0.9995+ new parameter 0.0005. Introducing exponential decay makes the generator update more slow and stable.

As can be seen, in the embodiment shown in fig. 1A, the server may perform image processing on the face image of the target person to obtain a standard face image of the face image, perform expression synthesis according to the standard face image and the expression label corresponding to the face image by using the expression generation model to obtain a first synthesized face image, perform face synthesis according to the standard face image and the first synthesized face image to obtain a second synthesized face image, and generate the face image including the second synthesized face image.

Please refer to fig. 2A, which is a flowchart illustrating another method for generating a face image according to an embodiment of the present application. The method may be applied to a server, which may be a server or a cluster of servers, and may comprise the steps of:

s201, obtaining a face image of a target person and an expression label corresponding to the face image.

S202, carrying out face detection on the face image to obtain a standard face image of the face image.

S203, performing expression synthesis by using an expression generation model according to the standard facial image and the expression label to obtain a first synthesized facial image.

And S204, carrying out face synthesis according to the standard face image and the first synthesized face image to obtain a second synthesized face image.

And S205, generating a face image comprising the second synthetic face image.

Steps S201 to S205 may refer to steps S101 to S105, which are not described herein again in this embodiment of the present application.

S206, carrying out image adjustment on the face image comprising the second synthetic face image by utilizing the enhanced super-resolution generation confrontation network model to obtain the adjusted face image comprising the second synthetic face image.

And S207, outputting the adjusted face image comprising the second synthetic face image.

In steps S206-S207, the server may process the face image including the second synthesized face image through an enhanced super-Resolution generation countermeasure network (ESRGAN) model, and then output the face image with a higher Resolution, thereby achieving the purpose of optimizing the face image. The generated face image is subjected to high-definition amplification by adopting a super-resolution technology, and the resolution of the generated expression is improved on the basis of not influencing the training to generate the confrontation network model. By practice, the resolution of the face image can be improved from 128x128 to 512x512 by adopting the process on the premise of effectively keeping the definition of the face.

In this embodiment, the server may specifically use the face image including the second synthesized face image as input data of the ESRGAN model, and adjust the face image including the second synthesized face image through the ESRGAN model to obtain the adjusted face image including the second synthesized face image. In one embodiment, the ESRGAN model can be obtained by artificially disturbing some high-quality face image training.

For example, referring to fig. 2B, the left image of fig. 2B shows a face image including the second synthesized face image, and the right image of fig. 2B shows an adjusted face image including the second synthesized face image. That is, the server may use the face image shown in the left image of fig. 2B as input data of the ESRGAN model, and adjust the face image shown in the left image of fig. 2B through the ESRGAN model to obtain the face image shown in the right image of fig. 2B.

In one embodiment, the server may generate first index information of a face image including the second synthesized face image, and write the first index information in the block chain. The first index information is written into the block chain, so that the corresponding face image can be conveniently indexed, and the corresponding face image can be effectively prevented from being used for illegal purposes. The first index information is index information of a face image including the second synthesized face image. The first index information may be, for example, a hash value obtained by performing hash calculation on the face image by the server or signature information obtained by performing signature processing on the face image by the server.

In one embodiment, the server may generate second index information of the adjusted face image including the second synthesized face image, and write the second index information into the block chain. And writing the second index information into the block chain. The second index information refers to the adjusted index information of the face image including the second synthesized face image. The second index information may be, for example, a hash value obtained by performing hash calculation on the face image by the server or signature information obtained by performing signature processing on the face image by the server.

Similarly, the method and the device can also be used for the construction of smart cities. With the development of the deep learning technology, the face images generated by the deep learning technology are more and more vivid, which undoubtedly brings challenges to the face recognition process, so that in the face recognition process, in order to avoid the situation that the face recognition is performed in a false and spurious manner by using the synthesized face images or related videos, the face images can be collected in the face recognition process, the collected face images are compared with a plurality of synthesized face images stored by a server, and then whether warning processing is performed or not is judged according to the comparison result.

It can be seen that, in the embodiment shown in fig. 2A, after obtaining the face image including the second synthesized face image, the server may further adjust the face image including the second synthesized face image through the enhanced super-resolution generation confrontation network model, so as to achieve the purpose of optimizing the face image, so that the image quality of the output face image is higher.

Please refer to fig. 3, which is a schematic structural diagram of a face generation apparatus according to an embodiment of the present application. The apparatus may be applied to a server. The apparatus may include:

the obtaining module 301 is configured to obtain a face image of a target person and an expression label corresponding to the face image.

The processing module 302 is configured to perform face detection on the face image to obtain a standard face image of the face image.

A synthesis module 303, configured to perform expression synthesis according to the standard facial image and the expression label by using an expression generation model to obtain a first synthesized facial image; the expression corresponding to the first synthesized facial image is the expression indicated by the expression label.

A synthesizing module 303, configured to perform face synthesis according to the standard face image and the first synthesized face image to obtain a second synthesized face image; and the expression corresponding to the second synthesized facial image is the expression indicated by the expression label.

The processing module 302 is further configured to generate a face image including the second synthetic face image.

In an alternative embodiment, the processing module 302 is further configured to, after generating the face image including the second synthetic face image, perform image adjustment on the face image including the second synthetic face image by using an enhanced super-resolution generation countermeasure network model to obtain an adjusted face image including the second synthetic face image, and output the adjusted face image including the second synthetic face image.

In an optional implementation manner, the processing module 302 performs face detection on the face image to obtain a standard face image of the face image, specifically, calls an image detection library to perform face detection on the face image to obtain an original face image of the face image; and carrying out face alignment on the original face image to obtain a standard face image of the face image.

In an optional implementation manner, the processing module 302 is further configured to obtain a training data set before performing expression synthesis by using an expression generation model according to the standard facial image and the expression labels to obtain a first synthesized facial image, where the training data set includes a plurality of facial images, and the facial images carry corresponding expression labels; carrying out face detection on each face image in the training data set to obtain a face image set, wherein the face image set comprises standard face images corresponding to the face images; training a countermeasure network model by using each standard facial image in the facial image set and an expression label carried by a facial image corresponding to the standard facial image, and obtaining the trained countermeasure network model as an expression generation model.

In an optional implementation manner, the processing module 302 is further configured to invoke a face recognition tool library to label each face image in the training data set, so as to obtain first label data corresponding to each face image; and according to the first labeling data corresponding to each face image, obtaining second labeling data corresponding to each face image as the expression label corresponding to each face image.

In an optional implementation manner, the processing module 302 obtains, according to the first annotation data corresponding to each face image, second annotation data corresponding to each face image as an expression label corresponding to each face image, specifically, performs normalization processing on the first annotation data corresponding to each face image, to obtain second annotation data corresponding to each face image as an expression label corresponding to each face image.

In an optional implementation manner, the synthesizing module 303 is further configured to obtain a feature parameter corresponding to the first synthesized facial image after performing expression synthesis according to the standard facial image and the expression label by using an expression generation model.

In an optional implementation manner, the synthesis module 303 performs face synthesis according to the standard face image and the first synthesized face image to obtain a second synthesized face image, specifically performs face synthesis according to the standard face image, the first synthesized face image and feature parameters corresponding to the first synthesized face image to obtain a second synthesized face image.

It can be seen that, in the embodiment shown in fig. 3, the facial image generation apparatus may perform image processing on a facial image of a target person to obtain a standard facial image of the facial image, and perform expression synthesis according to the standard facial image and an expression label corresponding to the facial image by using an expression generation model to obtain a first synthesized facial image, so as to perform facial synthesis according to the standard facial image and the first synthesized facial image to obtain a second synthesized facial image, and generate a facial image including the second synthesized facial image.

Please refer to fig. 4, which is a schematic structural diagram of a server according to an embodiment of the present disclosure. The server described in this embodiment may include: one or more processors 100, one or more input devices 200, one or more output devices 300, and memory 400. The processor 100, the input device 200, the output device 300, and the memory 400 may be connected by a bus. The input device 200 and the output device 300 are optional devices in the embodiment of the present application. The input device 200, output device 300 may be a standard wired or wireless communication interface.

The Processor 100 may be a Central Processing Unit (CPU), and may be other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The memory 400 may be a high-speed RAM memory or a non-volatile memory (e.g., a disk memory). The memory 400 is used to store a set of program codes, which the input device 200, the output device 300 and the processor 100 can call, and which are stored in the memory 400. Specifically, the method comprises the following steps:

the processor 100 is configured to obtain a face image of a target person and an expression label corresponding to the face image; carrying out face detection on the face image to obtain a standard face image of the face image; performing expression synthesis by using an expression generation model according to the standard facial image and the expression label to obtain a first synthesized facial image; the expression corresponding to the first synthesized facial image is the expression indicated by the expression label; performing face synthesis according to the standard face image and the first synthesized face image to obtain a second synthesized face image; the expression corresponding to the second synthesized facial image is the expression indicated by the expression label; generating a face image including the second synthesized face image.

In one embodiment, the processor 100 is further configured to, after generating the face image including the second synthetic face image, perform image adjustment on the face image including the second synthetic face image by using an enhanced super-resolution generation countermeasure network model to obtain an adjusted face image including the second synthetic face image; outputting the adjusted face image including the second synthesized face image through the output device 300.

In an embodiment, the processor 100 performs face detection on the face image to obtain a standard face image of the face image, specifically, invokes an image detection library to perform face detection on the face image to obtain an original face image of the face image; and carrying out face alignment on the original face image to obtain a standard face image of the face image.

In an embodiment, the processor 100 is further configured to obtain a training data set before performing expression synthesis by using an expression generation model according to the standard facial image and the expression labels to obtain a first synthesized facial image, where the training data set includes a plurality of facial images, and the facial images carry corresponding expression labels; carrying out face detection on each face image in the training data set to obtain a face image set, wherein the face image set comprises standard face images corresponding to the face images; training a countermeasure network model by using each standard facial image in the facial image set and an expression label carried by a facial image corresponding to the standard facial image, and obtaining the trained countermeasure network model as an expression generation model.

In an optional embodiment, the processor 100 is further configured to invoke a face recognition tool library to label each face image in the training data set, so as to obtain first label data corresponding to each face image; and according to the first labeling data corresponding to each face image, obtaining second labeling data corresponding to each face image as the expression label corresponding to each face image.

In an embodiment, the processor 100 obtains, according to the first annotation data corresponding to each face image, second annotation data corresponding to each face image as an expression label corresponding to each face image, specifically, performs normalization processing on the first annotation data corresponding to each face image, and obtains the second annotation data corresponding to each face image as an expression label corresponding to each face image.

In an embodiment, the processor 100 is further configured to obtain a feature parameter corresponding to the first synthesized facial image after performing expression synthesis according to the standard facial image and the expression label by using an expression generation model.

In an embodiment, the processor 100 performs face synthesis according to the standard face image and the first synthesized face image to obtain a second synthesized face image, specifically performs face synthesis according to the standard face image, the first synthesized face image and feature parameters corresponding to the first synthesized face image to obtain a second synthesized face image.

In a specific implementation, the processor 100, the input device 200, and the output device 300 described in this embodiment of the present application may execute the implementation described in the embodiment of fig. 1A and the embodiment of fig. 2A, and may also execute the implementation described in this embodiment of the present application, which is not described herein again.

The functional modules in the embodiments of the present application may be integrated into one processing module, or each module may exist alone physically, or two or more modules are integrated into one module. The integrated module can be realized in a form of sampling hardware, and can also be realized in a form of sampling software functional modules.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. The computer readable storage medium may be volatile or nonvolatile. For example, the computer storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), or the like. The computer-readable storage medium may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function, and the like; the storage data area may store data created according to the use of the blockchain node, and the like.

The block chain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism and an encryption algorithm. A block chain (Blockchain), which is essentially a decentralized database, is a series of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, so as to verify the validity (anti-counterfeiting) of the information and generate a next block. The blockchain may include a blockchain underlying platform, a platform product service layer, an application service layer, and the like.

While the invention has been described with reference to a preferred embodiment, it will be understood by those skilled in the art that various changes in form and detail may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

1. A face image generation method is characterized by comprising the following steps:

generating a face image including the second synthesized face image.

2. The method of claim 1, wherein after generating the face image comprising the second composite face image, the method further comprises:

3. The method according to claim 1, wherein the performing face detection on the face image to obtain a standard face image of the face image comprises:

4. The method according to any one of claims 1 to 3, wherein before performing expression synthesis by using the expression generation model according to the standard facial image and the expression label to obtain a first synthesized facial image, the method further comprises:

5. The method of claim 4, further comprising:

6. The method according to claim 5, wherein the obtaining, according to the first annotation data corresponding to each face image, the second annotation data corresponding to each face image as the expression label corresponding to each face image comprises:

7. The method of claim 1, wherein after performing expression synthesis from the standard facial image and the expression labels using an expression generation model, the method further comprises:

8. A face image generation apparatus, comprising:

9. A server, comprising a processor and a memory, the processor and the memory being interconnected, wherein the memory is configured to store a computer program comprising program instructions, the processor being configured to invoke the program instructions to perform the method of any one of claims 1-7.

10. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program which is executed by a processor to implement the method according to any one of claims 1-7.