AU2019430859A1

AU2019430859A1 - Generative adversarial mechanism and attention mechanism-based standard face generation method

Info

Publication number: AU2019430859A1
Application number: AU2019430859A
Authority: AU
Inventors: Chunwen PAN; Weilin Wu; Wei Xie; Xiaoyuan Yu; Langwen ZHANG
Original assignee: South China University of Technology SCUT
Current assignee: South China University of Technology SCUT
Priority date: 2019-02-19
Filing date: 2019-10-18
Publication date: 2021-04-29
Anticipated expiration: 2039-10-18
Also published as: CN109934116A; AU2019430859B2; WO2020168731A1; CN109934116B

Abstract

A generative adversarial mechanism and attention mechanism-based standard face generation method, comprising: a dataset design step, constructing, according to database-related annotation data, face code having a plurality of non-limiting factors for a face image, and taking the code and the face image as inputs of a model; a model design and training step, using a generative adversarial mechanism and an attention mechanism to design a corresponding network structure, and using the constructed data pair to perform model training, so as to obtain a network model weight; and a model prediction step, predicting the acquired face image by means of the model. The present invention applies deep learning network technology to standard face generation to generate a colour, front-facing, and standard face image under normal light illumination. The method using a deep learning network is capable of obtaining an accurate standard face photograph, reducing the difficulty of matching with data in a single-sample database, and laying a solid foundation for subsequent face feature extraction and single-sample facial recognition.

Description

A Standard Face Generation Method Based on a Generative Adversarial Mechanism and an Attention Mechanism

Technical Field The invention relates to the technical field of deep learning applications, in particular to a standard face generation method based on a generative adversarial mechanism and an attention mechanism.

Technical Background In recent years, video surveillance has been popularized in large and medium cities across the country, and has been widely used in the construction of social security prevention and control systems, and has become a powerful technical means for public security agencies to investigate and solve cases. Especially in mass incidents, major cases and robberies, the evidential clues obtained from video surveillance videos play a key role in the rapid solving of cases. At present, domestic public security agencies mainly use video surveillance videos to find after-crime clues and evidences, and locking in the suspect's identity by comparing the face information of key suspects with personal information in the public security bureau's database. However, there are many restrictive factors in the face information of the suspect in the surveillance video, such as expression information interference, posture interference or shooting illumination interference. Since most of the facial information images in the public security bureau's database are only a single sample of the ID photo, when the facial images interfered by the above-mentioned multiple restrictive factors are subjected to recognition processing, the success rate is greatly restricted, and it is easy to cause missed detections and wrong detections etc.

In recent years, the field of artificial intelligence has been mentioned in the scope of national key construction. This indicates that the combination of artificial intelligence and related industries is an inevitable trend of the country's development towards intelligence, and it is of great significance to promote the development of the industry towards intelligence and automation. The most important thing in the field of artificial intelligence is to design corresponding deep learning network models for different industry tasks. With the increase of computer computing power, the difficulty of network training has been greatly reduced, and the accuracy of network prediction has also been continuously improved. The basic characteristics of deep learning network are strong model fitting ability, large amount of information and high precision, which can meet the different needs of different industries. For the face recognition problem with multiple non-limiting factors, the key issue is how to generate a standard front-facing face image to meet the needs of subsequent face image feature extraction and recognition. At present, it is urgent to solve this problem, design a corresponding and reasonable deep learning network framework, use high-performance computer processing capabilities to train the network, and then generate more standard front-facing face images, improve the accuracy of face matching and reduce the occurrence of false detections during face recognition.

Summary of the Invention The object of the present invention is to solve the above-mentioned shortcomings in the prior art, and provide a standard face generation method based on a generative adversarial mechanism and an attention mechanism, and use a deep learning network framework to

design related models, thereby obtaining a more standard front-facing face image, laying a solid foundation for subsequent face feature extraction and single-sample face recognition.

The object of the present invention may be achieved by adopting the following technical solutions:

A standard face generation method based on a generative adversarial mechanism and an

attention mechanism. The generation method comprises: data set design steps, model design and training steps, and model prediction steps; the data set design steps are mainly based on the current mainstream RaFD data set and IAIR face data set, according to the relevant annotated data of the database, a face code with a various non-limiting factors is constructed for each face image, comprising face expression factors, face posture factors, and shooting illumination factors etc., and the code and face image are used as inputs to the model; the model design and training steps are mainly to use the related principles of the generative adversarial mechanism and the attention mechanism to design the corresponding network structure, and to use the constructed data pair for model training to obtain the network model weight; the model prediction step is mainly to predict the result after model processing is performed on the face image acquired in reality.

Specifically, the operation steps are as follows:

Si. a data construction, collecting face data in a RaFD face data set and a IAIR face data set, constructing a face code with multiple non-limiting factors for each face image, then classifying the face data; wherein the non-limiting factors comprise face expression factors, face posture factors and shooting illumination factors; an encoded face image forms an information unit U = {LE, A,}, comprising an 8-bit illumination code L, an 8-bit

expression code E and a 19-bit posture code A, ;

S2. establishing a network model based on the generative adversarial mechanism and the attention mechanism; the network model comprises three sub-networks, each correspond to an image generator sub-network for generating a standard face, a model discriminator sub network for discriminating generated results, and an image restoration network for restoring the generated results; first, using the image generator sub-network and the attention mechanism to generate a standard face on an input face image; then, using the model discriminator sub-network to discriminate the generated image, finally, constructing an image restoration network, restoring the generated image, and comparing a restoration result with the input image to optimize constraints of the network model;

S3. a model training, using an image unit generated in step Si, using an image with multiple non-limiting factors as input to optimize outputs of the image generator sub network, the model discriminator sub-network, and the image restoration sub-network, and labeling similarities, to achieve a convergence of the network model based on the generative adversarial mechanism and the attention mechanism;

S4. a model prediction, extracting an face in an actual image as an input of the model, finally obtaining a more standard front face image output by controlling a unified information unit.

Further, in step Si, the face information in the face data set is correspondingly encoded, and divided into two types: non-limiting factor face images and standard front natural face images;

The process of step Si is as follows;

S11. face information code. For different face data in the data set, a face code with multiple non-limiting factors is constructed for each face image, wherein the non-limiting factors comprise, but are not limited to, face expression factors, face posture factors, and shooting illumination factors etc.

The specific rules for coding face images are as follows:

A) the face expression factors are divided into eight situations, namely happy, angry, sad, contemptuous, disappointed, scared, surprised and natural; a face expression is encoded as

E =(E 1 , E 2 ,..., E ,), where E, represents the/-th expression, = 0, 1, 2, ... , 8, its value

is [0,1], E =(0,0,...,1) means a natural expression;

B) the face illumination factors are divided into eight situations, mainly front illumination, left illumination, right illumination and a combination of these three illuminations namely front illumination, left illumination, right illumination, front left illumination, front right illumination, left right illumination, no illumination, and full illumination; illumination information of the face is encoded as L =(L1, L, L, ),e. , where L represents the n-th

illumination situation, n=0, 1, 2, ... , 8, its value is [0,1], L, =(0,0,...,1) represents front

illumination image information;

C) the face posture factors are divided into 19 situations, comprising 9 poses of the left face at 100 intervals, 9 poses of the right face at 10 intervals, and front face posture images, that is left 900, left 800, left 700, left 600, left 500, left 400, left 30, left 20, left 10, front face, right 10, right 200, right 300, right 400, right 50, right 60, right 70, right

80, right 90; posture information of the face is encoded as , =(4 1, A 2 ,,., -,19

where A, represents the m-th face pose, m = 0, 1, 2,..., 19, its value is [0,1],

4 =(0,0,...,1) represents front posture information. Finally, the face information code is integrated into the unified information code U = {L, E , A,} , which is a 35-bit one

dimensional information.

S12. classifying face data, classifying the encoded face data into non-limiting factor face images and standard front natural clear face images, specifically:

Face images with unified code information UO =(L,(0,0,...,1), E(0,0,...,1), A,(0,0,...,1),)

are taken as the standard frontal natural clear face images, and used as target images of the

model; remaining face images are taken as the non-limiting factor face images, and used as input images of the model.

Further, in the step S2,

assuming that the input image is Y, its corresponding original unified information code is

Uy, the generated standard face image is I, the unified information code UO corresponds

to the standard face image I, the corresponding standard face image in the UO database is

I, the unified information code corresponding to the standard face image I is Uo.

In the image generator sub-network, the content of its inputs are image Y and a unified information code Uo. The invention designs two codec networks Gc and Gf, generating a color information mask C and an attention mask F by combining with the attention mechanism; then through the following synthesis mechanism to generate the standard face:

C=Gc(Y,Uo), F=Gf(Y,Uo)

1,=(1-F)DC+FD Y

wherein 0 represents element-wise multiplications of matrices.

Therefore, the codec network Gc mainly focuses on the color information and texture information of the face, and the codec network Gfmainly focuses on the areas that need to be changed in the face;

In the model discriminator sub-network, the content of its input is an image Io generated by the image generator sub-network. Similarly, the invention also designs two deep convolution networks: image discrimination sub-network Di and information code discrimination sub-network Du, to respectively distinguish a difference between a generated standard face image I, and a corresponding standard face image I in a database,

and a difference between a unified information code UO corresponding to the generated standard face image I, and a unified information code Uo corresponding to the corresponding standard face image I in the database;

In the image restoration sub-network, the content of its input is the original unified information code Uy corresponding to the generated standard face image o and an input image Y. The restoration sub-network is consistent with the image generator sub-network, and its network restoration result is Y. By comparing the restoration result with the input image Y of the overall network, the goal of a loop optimization network result is achieved.

Further, the processing flow of the network model based on the generative adversarial mechanism and the attention mechanism is as follows:

First, input the unified information code Uo corresponding to the input image Y and the standard face image Iinto the image generator sub-network to generate the standard face image I,; the image generator sub-network incorporates the attention mechanism;

Then, in order to distinguish between real images and generated images, send the generated standard face image o and the corresponding standard face image I(that is real image ]) in the database to the image discrimination sub-network Diin the model discrimination sub-network for discrimination, and at the same time, sending the unified information code UO corresponding to the generated standard face image I, and the

unified information code Uo corresponding to the standard face image I in the database to the information code discrimination sub-network Du in the model discriminator sub network for discrimination, optimizing through continuous loop so that the image generator sub-network and the model discriminator sub-network achieve common progress;

Finally, in order to achieve the purpose of loop optimizing the network model, the present invention designs an image restoration sub-network, the generated standard face image I,

is further restored according to the original unified information code Uy corresponding to the original input image Y, and the restoration result is compared with the input image Y. The entire network realizes the convergence of the overall network model by continuously optimizing the corresponding loss function. Finally, the removal of non-limiting environmental factors from the face image is realized.

Further, in the step S3, the model training achieves the convergence of the model by optimizing a loss function, wherein a design process of the loss function is as follows specifically:

1) optimizing a difference between the generated standard face image I, by discrimination

and the corresponding standard face image Iin the database: setting an image loss function

as shown in L, = D, (I)-D, (I)[, where H and W are a height and a width of an HxW2 output face image, D, (I,) and D, (I) are evaluation results of the image I, and I by the

image discrimination sub-network; then, considering an effectiveness of a gradient loss, add a gradient-based penalty to the image loss function, which may improve the efficiency of convergence and the quality of image generation, that is, the image loss function is

designed as L, = I D, (I")-_DI (I)1 2+A] I VD, (I")-1 2 , where V(-) HxW 2 HxW

represents a gradient operation of the image, and A, is a weight of the penalty;

2) optimizing a difference of a conditional unified information code: setting a conditional expression loss function, that is, distinguishing a difference between the generated standard face image I and the corresponding standard face image Iin the database, each

corresponding to unified information code UO and Uo; therefore, the conditional

expression loss function is designed as follows: Lu = Do (Ij )- U 0 , where N is a

length of an output uniform information code, and then, in the conditional expression loss function, adding a mapping relationship between the input image Y and a corresponding original unified information encode Uy, which may improve the discriminating ability of the discriminator, therefore, the conditional expression loss function is designed as 1 21 2 Lu = N D (I )-U D(Y)-U , where Uy is an original unified information

code corresponding to the input image Y, Uo is a unified information code corresponding to

the standard face image I, Du (I) and Du (Y) are discrimination results of the

information code discriminating sub-network on the images I and Y respectively;

3) optimizing a difference between a result of the image restoration sub-network and the original input image: restoring the image I generated by an input generator with the original unified information code Uy, and then compared with the original input image Y.

Therefore, a restoration loss function is designed as L, = I G(G(I, U)) -Y , where hxw 1

h and w represent a height and a width of the image.

Therefore, a loss function of the entire network model is as follows: L = L,+ Lu + L,.

By optimizing the loss function, the convergence of the network model is achieved, and a generator structure and weights for generating standard faces are obtained.

Further, for the generation of the actual face image in step S4, first use the face positioning method based on the face HOG image to obtain the face image in the actual image; then, the generator trained by the model and the manually set unified information code are used to realize the rapid standard face generation of the face in the actual image. In addition, it is foreseeable that by setting different unified information codes, it is possible to change other structures of the face, such as it is feasible to control other expressions, or further changing the face posture.

Compared with the prior art, the present invention has the following advantages and effects:

The present invention applies the deep learning network technology to the standard face generation task to generate colorful, forward-facing standard face images under normal illumination; using the deep learning network method, accurate standard front face photos may be obtained, the difficulty of matching data in a single-sample database is reduced, and a solid foundation is laid for subsequent face feature extraction and single-sample face recognition.

Brief Description of the Figures Figure 1 is a flowchart of a model training and a model application in an embodiment of the present invention;

Figure 2 is a flowchart of a data construction of a database in an embodiment of the present invention;

Figure 3 is an overall design diagram of a network model in an embodiment of the present invention;

Figure 4 is a specific structure diagram of an image generation network in an embodiment of the present invention;

Figure 5 is a specific structure diagram of an image discrimination network in an embodiment of the present invention.

Description In order to better clarify the objectives, technical solutions, and advantages of the embodiments of the present invention, the technical solutions of the embodiments of the present invention will be described clearly and completely in conjunction with the accompanying figures of the embodiments of the present invention. Obviously, the described embodiments are part of the embodiments of the present invention, rather than all of the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative work shall fall within the protection scope of the present invention.

Embodiments

This embodiment discloses a standard face generation method based on a generative adversarial mechanism and an attention mechanism, which mainly involves the following types of technologies: 1) design of training data: using existing data sets to design unified information codes; 2) design of network model structure: taking the generation of adversarial network framework and loop optimization network method as the basic network structure; 3) standard face generation method: adding an attention mechanism to

the generator to restrict the accuracy of standard face generation.

This embodiment is based on the TensorFlow framework and the Pycharm development environment: the TensorFlow framework is a development framework based on the python language, which can build a reasonable deep learning network conveniently and quickly, and has good cross-platform interaction capabilities. TensorFlow provides interfaces for many package functions and various image processing functions in the deep learning architecture, comprising image processing functions related to OpenCV. The TensorFlow framework can also use GPU to train and verify the model, which improves the efficiency of calculation.

The Pycharm development environment under the Windows platform or Linux platform becomes the integrated development environment (IDE), which is currently one of the first choices for deep learning network design and development. Pycharm provides customers with new templates, design tools, testing and debugging tools, and may provide customers with an interface to directly call remote servers.

The present embodiment discloses a standard face generation method based on a generative adversarial mechanism and an attention mechanism. The main process comprises two stages, model training and model application.

In the model training stage: first, processing the existing face data set, and generating a data set that meets the model training by designing a unified information code mechanism; then, using a cloud server with high computing power to train the network model, by optimizing a loss function and adjusting the network model parameters until the network model converges to obtain the generator structure and weights for generating standard faces.

In the model application stage: first, using the HOG face image processing method to extract the actual picture to obtain the actual face image; then, calling the trained network model to use a face image with non-limiting factors and the designed unified information code as inputs, performing standard face generation; finally obtaining a colorful, front facing face image.

Figure 1 is a flowchart of a standard face generation method based on a generative adversarial mechanism and an attention mechanism disclosed in this embodiment. Specific steps are as follows:

Step 1. Since the current face database mainly focuses on recognition tasks, there is no face image database with uniform information code required by the present invention. Therefore, it is necessary to integrate existing databases to construct a suitable database.

Figure 2 shows a construction process of a face image and a unified information code in the database.

Step 2. Figure 3 is an illustrative diagram of the overall architecture of the network model. The entire model framework mainly comprises three sub-networks, which correspond to an image generator sub-network for generating standard faces, a model discriminator sub network for discriminating the generated results, and an image restoration network for restoring the generated results, wherein parameter sharing is carried out between the image generator sub-network and the image restoration sub-network, the image generator sub network mainly combines the attention mechanism to generate face images. Figure 4 is a specific network structure of the image generator sub-network, and Figure 5 is a specific network structure of the model discriminator sub-network.

The main parameters are as follows:

1) The image generator sub-network has the same parameters as the image restoration sub network, and comprises two generators respectively, namely the color information generator and the attention mask generator, as follows specifically:

The color information generator comprises 8 convolution layers and 7 deconvolution layers. The convolution kernel size of all convolution layers is 5, the step size is 1, and finally a 3-channel color information image is generated;

The attention mask generator comprises 8 convolution layers and 7 deconvolution layers. The convolution kernel size of all convolution layers is 5, the step length is 1, and finally a 1-channel attention mask is generated.

2) The model discriminator sub-network comprises two parts, namely an information code discriminating sub-network and an image discriminating sub-network, as follows specifically: the information code discriminating sub-network comprises 6 convolution layers and 1 fully connected layer, the convolution kernel size of the convolution layer is 5, the step size is 1, and finally a one-dimensional unified information code of length N is generated; the image discrimination sub-network comprises 6 convolution layers, the convolution kernel size is 5, and the step size is 1.

Step 3. The model training is carried out on a high-performance GPU. The specific training parameters are designed as follows: Adam optimizer can be used, and the parameters are set to 0.9/0.999; the learning rate is set to 0.0001; the training epoch is set to 100; the batch setting for training depends on the training samples of the data.

Step 4. Model prediction, extracting the face in an actual image as the input of the model, by controlling the unified information unit, finally obtain a more standard front-facing image output.

The above-mentioned embodiments are preferred embodiments of the present invention, but the embodiments of the present invention are not limited by the above-mentioned embodiments. Any other changes, modifications, substitutions, combinations, simplifications made without departing from the spirit and principle of the present invention, all should be equivalent replacement methods, and they are all included in the protection scope of the present invention.

Claims

Claims 1. A standard face generation method based on a generative adversarial mechanism and an attention mechanism, characterized in that, the generation method comprises the following steps: Si. a data construction, collecting face data, constructing a face code with multiple non-limiting factors for each face image, then classifying the face data; wherein the non limiting factors comprise face expression factors, face posture factors and shooting illumination factors; an encoded face image forms an information unit U = {L., E, A,}

, comprising an 8-bit illumination code L, an 8-bit expression code E and a 19-bit posture

code A,;

S2. establishing a network model based on the generative adversarial mechanism and the attention mechanism; the network model comprises three sub-networks, each correspond to an image generator sub-network for generating a standard face, a model discriminator sub-network for discriminating generated results, and an image restoration network for restoring the generated results; first, using the image generator sub-network and the attention mechanism to generate a standard face on an input face image; then, using the model discriminator sub-network to discriminate the generated image, finally, constructing an image restoration network, restoring the generated image, and comparing a restoration result with the input image to optimize constraints of the network model; S3. a model training, using the information unit U = {L, E , A,} as an input to optimize outputs of the image generator sub-network, the model discriminator sub-network, and the image restoration sub-network, and labeling similarities, to achieve a convergence of the network model based on the generative adversarial mechanism and the attention mechanism; S4. a model prediction, extracting an face image in an actual image as an input of the network model, finally obtaining a standard front face image output by controlling the information unit U.
2. The standard face generation method based on a generative adversarial mechanism and an attention mechanism according to claim 1, characterized in that, the face expression factors are divided into eight situations, namely happy, angry, sad, contemptuous, disappointed, scared, surprised and natural; a face expression is encoded as E =(EI,E 2 ,..., E ), where E, represents the -th expression, =0, 1, 2, ..., 8, its value is [0,1], E =(0,0,...,1) means a natural expression; the face illumination factors are divided into eight situations, namely front illumination, left illumination, right illumination, front left illumination, front right illumination, left right illumination, no illumination, and full illumination; illumination information of the face is encoded as L,=(LI, ... L, ),where L, represents the n-th illumination situation, its value is [0,1], L=(0,0,...,1) represents full illumination image information; the face posture factors are divided into 19 situations, namely left 900, left 80°, left 70°, left 60, left 50°, left 40°, left 30°, left 20, left 10, front face, right 10, right 20, right 30, right 400, right 500, right 60, right 70, right 80, right 90; posture information of the face is encoded as A, =(AI, A,1 ... IAm,---, Ai9),where Am represents the m-th face pose, m = 0, 1, 2, ... , 19, its value is [0,1], A, =(0,0,...,1) represents front posture information.
3. The standard face generation method based on a generative adversarial mechanism and an attention mechanism according to claim 2, characterized in that, the process of classifying face data in step Si is as follows: classifying the encoded face data into non limiting factor face images and standard front natural clear face images, wherein, face images with unified code information UO = r(L,(0,0,...,1), E(0,0,...,1), A,(0,0,...,1),) are taken as the standard frontal natural

clear face images, and used as target images of the model, and remaining face images are taken as the non-limiting factor face images, and used as input images of the model.
4. The standard face generation method based on a generative adversarial mechanism and an attention mechanism according to claim 1, characterized in that, inputs of the image generator sub-network are image Y and a standard face unified information code Uo, the image generator sub-network comprises two codec networks Gc and Gf, wherein the codec network Gc focuses on face color information and texture information, the codec network Gf focuses on areas that need to be changed on the face, generating a color information mask C and an attention mask F by combining with the attention mechanism, then through the following synthesis mechanism to generate the standard face:

C=Gc(Y,UO), F=Gf(Y,UO)

I, =(1- F) C +F Y

wherein 0 represents element-wise multiplications of matrices;

in the model discriminator sub-network, its input is an image o generated by the image generator sub-network; the model discriminator sub-network comprises two deep convolution networks, image discrimination sub-network Di and information code discrimination sub-network Du, to respectively distinguish a difference between a generated standard face image I, and a corresponding standard face image I in a database,

and a difference between a unified information code UO corresponding to the generated

standard face image I, and a unified information code Uo corresponding to the

corresponding standard face image I in the database; an input of the image restoration sub-network is an original unified information code Uy corresponding to the generated standard face image Io and an input image Y, an output

of the image restoration sub-network is a network restoration result Y; by comparing the

restoration results Y with the input image Y of an overall network, a loop optimization network result is realized.
5. The standard face generation method based on a generative adversarial mechanism and an attention mechanism according to claim 4, characterized in that, the process of step S2 is as follows: first, input the unified information code Uo corresponding to the input image Y and the standard face image I into the image generator sub-network incorporating the attention mechanism to generate the standard face image I, ;

then, send the generated standard face image Io and the corresponding standard face image I in the database to the deep convolution network D in the model discriminator sub

network for discrimination, and at the same time, sending the unified information code UO

corresponding to the generated standard face image I, and the unified information code Uo

corresponding to the standard face image Iin the database to the deep convolution network Du in the model discriminator sub-network for discrimination, so that the image generator sub-network and the model discriminator sub-network are optimized simultaneously; finally, inputting the generated standard face image I, to the image restoration network, restoring based on the original unified information code Uy corresponding to the original input image Y, and comparing the restoration result Y with the input image Y, and continuously optimizing a corresponding loss function to achieve the convergence of the network model based on the generative adversarial mechanism and the attention mechanism.
6. The standard face generation method based on a generative adversarial mechanism and an attention mechanism according to claim 1, characterized in that, in the step S3, the model training achieves the convergence of the model by optimizing a loss function, wherein a design process of the loss function is as follows: optimizing a difference between the generated standard face image I, by

discrimination and the corresponding standard face image Iin the database: setting an

image loss function as shown in L, = D, (I,)-D, (I) , where H and W are a

height and a width of an output face image, D, (I) and D, (I) are evaluation results of

the image I, and I by the image discrimination sub-network; then, considering an effectiveness of a gradient loss, add a gradient-based penalty to the image loss function, that is, the image loss function is designed as

LI = D (I)-D (I)2+1 VD (I1)-12 , where V() represents a gradient HxW 2 HxW2 operation of the image, and A, is a weight of the penalty;

optimizing a difference of a conditional unified information code: setting a conditional expression loss function, that is, distinguishing a difference between the generated standard face image I, and the corresponding standard face image I in the

database, each corresponding to unified information code UO and Uo; the conditional

expression loss function is designed as follows: 12 Lu = Do (I )-UO , where Nis a length of an output uniform information code, N and then, in the conditional expression loss function, adding a mapping relationship between the input image Y and a corresponding original unified information encode Uy, therefore, the conditional expression loss function is designed as follows: 1 21 2 L=yNDo (J0 )- U0 2+Do()-U',hheeUy is an original unified information code corresponding to the input image Y, Uo is a unified information code corresponding to the standard face image I, Du (I) and Du (Y) are discrimination results of the information code discriminating sub-network on the images I, and Y respectively; optimizing a difference between a result of the image restoration sub-network and the original input image: restoring the image I generated by an input generator with the original unified information code Uy, and then compared with the original input image Y, therefore, a restoration loss function is designed as L, = G(G(I,, U) - Y, where hxw 1 h and w represent a height and a width of the image, and G represents the image generator sub-network; a loss function of the entire network model is as follows: L = L,+ Lu + L,.
7. The standard face generation method based on a generative adversarial mechanism and an attention mechanism according to claim 1, characterized in that, the process of step S4 is as follows: first, using a face positioning method based on a face HOG image to obtain a face image in the actual image; then, using a generator trained by the network model and a manually set unified information code to realize a rapid standard face generation of a face in the actual image.
8. The standard face generation method based on a generative adversarial mechanism and an attention mechanism according to claim 1, characterized in that, in the step Sl, collecting face data in a RaFD face data set and a IAIR face data set.