CN109934116B

CN109934116B - Standard face generation method based on confrontation generation mechanism and attention generation mechanism

Info

Publication number: CN109934116B
Application number: CN201910121233.XA
Authority: CN
Inventors: 谢巍; 余孝源; 潘春文
Original assignee: South China University of Technology SCUT
Current assignee: South China University of Technology SCUT
Priority date: 2019-02-19
Filing date: 2019-02-19
Publication date: 2020-11-24
Anticipated expiration: 2039-02-19
Also published as: CN109934116A; AU2019430859B2; WO2020168731A1; AU2019430859A1

Abstract

The invention discloses a standard face generation method based on a confrontation mechanism and an attention mechanism, which comprises the following steps: a data set design step, namely constructing a face code with various non-limiting factors for a face image according to related annotation data of a database, and taking the code and the face image as input of a model; designing and training a model, namely designing a corresponding network structure by using a generation countermeasure mechanism and an attention mechanism, and performing model training by using the constructed data pair to further obtain the weight of the network model; and model prediction, namely predicting the acquired face image through a model. The invention applies the deep learning network technology to the standard face generation for generating the colorful, positive and normal lighting standard face images; by using the deep learning network method, accurate standard face photos can be obtained, the difficulty in matching with data in a single sample database is reduced, and a solid foundation is laid for the subsequent feature extraction of the face and the single sample face recognition.

Description

Standard face generation method based on confrontation generation mechanism and attention generation mechanism

Technical Field

The invention relates to the technical field of deep learning application, in particular to a standard face generation method based on a confrontation generation mechanism and an attention generation mechanism.

Background

In recent years, video monitoring is popularized in large and medium-sized cities in China, is widely applied to the construction of a social security prevention and control system, and becomes a powerful technical means for detecting and solving the case by public security organs. Particularly, in group events, particularly in very big cases and two robbery cases, evidence clues acquired from video surveillance videos play a key role in rapidly detecting cases. At present, the domestic public security administration mainly uses a video surveillance video to search for a crime clue and crime evidence afterwards, and the identity of a suspect is locked by comparing the face information of the important suspect with the personnel information in the database of the public security administration. However, there are many limiting factors in the face information of the suspect in the surveillance video, such as facial expression information interference, posture interference or interference of shooting illumination. Because most of face information images of people in the public security bureau database only have single identification photo samples, when the face images interfered by the various restrictive factors are identified, the success rate is greatly restricted, and the situations of missing detection, error detection and the like are often caused.

In recent years, the field of artificial intelligence has been referred to as the scope of national emphasis. The combination of artificial intelligence and related industries is a necessary trend of developing towards intellectualization in China, and the method has important significance for promoting the development of industries towards intellectualization and automation. The most important thing in the artificial intelligence field is to design a corresponding deep learning network model aiming at different industry tasks. With the improvement of computer computing power, the difficulty of network training is greatly reduced, and the network prediction precision is continuously improved. The deep learning network has the basic characteristics of strong model fitting capability, large information amount and high precision, and can meet different requirements in different industries. For the face recognition problem with various non-limiting factors, the key problem is how to generate a standard front face image to meet the requirement of subsequent face image feature extraction and recognition. At present, a corresponding and reasonable deep learning network framework is urgently needed to be designed, a high-performance computer processing capacity is utilized to train a network, and then a standard front face image can be generated, the accuracy of face matching is improved, and the occurrence of false detection during face recognition is reduced.

Disclosure of Invention

The invention aims to solve the defects in the prior art, and provides a standard face generation method based on a confrontation generation mechanism and an attention generation mechanism.

The purpose of the invention can be achieved by adopting the following technical scheme:

a standard human face generation method based on a confrontation mechanism and an attention mechanism generation method comprises the following steps: designing a data set, designing and training a model and predicting the model; the data set design step is mainly that a face code with various non-limiting factors is constructed for each face image according to related labeling data of a database through a current mainstream RaFD data set and an IAIR face data set, wherein the face code comprises a face expression factor, a face posture factor, a shooting illumination factor and the like, and the code and the face image are used as the input of a model; the model design and training step mainly comprises the steps of designing a corresponding network structure by utilizing a correlation principle of a countermeasure generation mechanism and an attention generation mechanism, and performing model training by utilizing the constructed data pair to further obtain network model weight; the model prediction step is mainly to carry out model processing on a face image obtained in reality to obtain a predicted result.

Specifically, the operation steps are as follows:

s1, data construction, wherein face data in a RaFD face data set and an IAIR face data set are collected, face codes with various non-limiting factors are constructed for each face image, then the face data are classified, the non-limiting factors comprise face expression factors, face posture factors and shooting illumination factors, and the coded face image constitutes an information unit U ═ L_u,E_u,A_uComprises an 8-bit illumination code L_u8-bit expression code E_uAnd 19-bit attitude code A_u；

S2, establishing a network model based on a confrontation generating mechanism and an attention generating mechanism, wherein the network model comprises three sub-networks, namely an image generator sub-network for generating a standard human face, a model discriminator sub-network for discriminating a generated result and an image restoring sub-network for restoring through the generated result; firstly, standard face generation is carried out on an input human face image by utilizing an image generator sub-network and an attention mechanism; then, utilizing a model discriminator subnetwork to discriminate the generated image, finally constructing an image reduction subnetwork, reducing the generated image, comparing a reduction result with an input image, and carrying out optimization constraint on a network model;

s3, model training, namely optimizing the similarity between the output and the label of the image generator sub-network, the model discriminator sub-network and the image restoration sub-network by using the image unit generated in the step S1 and taking the image with various non-limiting factors as input, and realizing the convergence of the network model based on the generation countermeasure mechanism and the attention mechanism;

and S4, model prediction is carried out, the human face in the actual image is extracted and used as the input of the model, and the standard front face image output is finally obtained by controlling the unified information unit.

Further, in step S1, the face information in the face data set is correspondingly encoded and divided into two types, i.e., a non-limiting face image and a standard front natural face image;

the step S1 procedure is as follows;

and S11, encoding the face information. And constructing a face code with a plurality of non-limiting factors for each face image according to different face data in the data set, wherein the non-limiting factors comprise but are not limited to facial expression factors, facial posture factors, shooting illumination factors and the like.

The rule for coding the face image is as follows:

A) the facial expression factors are divided into eight conditions, namely distraction, anger, sadness, slight, disappointment, fear, surprise and nature, and the facial expression is coded into E_u＝(E_u1,E_u2,...,E_u8) In which E_ulRepresents the expression of the first expression, wherein l is 0,1,2, …,8, and the value is [0,1]，E_u1, expressed as a natural expression;

B) the face illumination factor is divided into eight conditions, mainly including front illumination, left illumination, right illumination and the combination of the three types of illumination, namely front illumination, left illumination, right illumination, left illumination, no illumination and full illumination, and the illumination information of the face is encoded into L_u＝(L_u1,L_u2,...,L_u8) Wherein L is_unRepresents the nth lighting situation, n is 0,1,2, …,8, and takes the value of [0,1]，L_uExpressed as front side illumination image information, (0, 0.., 1);

C) the face pose factors are divided into 19 cases including 9 poses of the left face at intervals of 10 °, 9 poses of the right face at intervals of 10 °, and face pose images, i.e., left 90 °, left 80 °, left 70 °, left 60 °, left 50 °, left 40 °, left 30 °, left 20 °, left 10 °, front face, right 10 °, right 20 °, right 30 °, right 40 °, right 50 °, right 60 °, right 70 °, right 80 °, right 90 °, and encoding the pose information of the face into a_u＝(A_u1,A_u2,...,A_um,...,A_u19) Wherein A is_umRepresents the mth human face pose, and m is 0,1,2, …,19, and takes the value of [0,1]，A_uAs the frontal posture information, (0, 0.., 1) is expressed. Finally, the face information code is integrated into a unified information code U ═ L_u,E_u,A_uWhich is a 35-bit one-dimensional information.

S12, classifying the face data, namely classifying the encoded face data into a non-limiting face image and a standard front natural clear face image, wherein the specific steps are as follows:

unify the coding information into U₀＝(L_u(0,0,...,1),E_u(0,0,...,1),A_u(0, 0., 1),) as a standard front natural clear face image, and as a target image of the model; the rest of the face image is used as the non-limiting face image and is used as the input image of the model.

Further, in step S2,

suppose the input image is Y and its corresponding original unified information is encoded as U_yThe generated standard face image is I_oStandard face image I_oCorresponding unified information coding

The corresponding standard face image in the database is I, and the unified information code corresponding to the standard face image I is U₀。

In the image generator subnetwork, the input content is the image Y and the uniform information code U₀. The invention designs two codec networks G_cAnd G_fRespectively generating a color information mask C and an attention mask F by combining an attention mechanism; then, a standard human face is generated through the following synthesis mechanism:

C＝G_c(Y,U₀)，F＝G_f(Y,U₀)

I_o＝(1-F)⊙C+F⊙Y

wherein |, indicates an operation of element-by-element multiplication of the matrix.

Thus, the codec network G_cOf primary interest are color information and texture information of human faces, codec network G_fThe main concern is the area of the face that needs to be changed;

in the model discriminator subnetwork, the content of its input is the image I generated by the image generator subnetwork_o. Similarly, the invention also designs two deep convolution networks, namely an image discrimination subnetwork D_ISum information coding discrimination sub-network D_URespectively used for judging the generated standard face image I_oThe difference with the corresponding standard face image I in the database, and the generated standard face image I_oCorresponding unified information coding

Unified information code U corresponding to standard face image I in database₀The difference between them;

in the image restoring sub-network, the input content is the generated standard face image I_oOriginal unified information encoding U corresponding to input image Y_y. The restore sub-network is identical to the image generator sub-network, the network restore result is

The aim of circularly optimizing the network result is achieved by comparing the reduction result with the input image Y of the whole network.

Further, the process flow of the network model based on the generation of the countermeasure mechanism and the attention mechanism is as follows:

firstly, unified information coding U corresponding to an input image Y and a standard face image I₀Input into a sub-network of image generators for generating a standard face image I_oWherein the image is generatedThe device sub-network integrates an attention mechanism;

then, in order to distinguish the real image from the generated image, the generated standard face image I_oSending the standard face image I (namely the real image I) corresponding to the database into the image discrimination sub-network D in the model discriminator sub-network_IThe standard face image I is generated at the same time of discrimination_oCorresponding unified information coding

Unified information coding U corresponding to standard face image I in database₀Information coding discrimination subnetwork D in model discriminator subnetwork_UJudging, and making the image generator sub-network and the model discriminator sub-network progress together through continuous circulation optimization;

finally, in order to realize the purpose of circularly optimizing the network model, the invention designs an image reduction sub-network, and generates a standard face image I_oFurther encoding U according to original unified information corresponding to original input image Y_yRestoration is performed and the restoration result is compared with the input image Y. The whole network realizes the convergence of the whole network model by continuously optimizing the corresponding loss function. And finally, removing the non-limiting environmental factors of the face image.

Further, the model training in step S3 realizes the convergence of the model by optimizing the loss function, wherein the loss function design process specifically includes:

1) standard face image I generated by optimization discrimination_oDifference between standard face image I corresponding to the database: the image loss function is set as follows

Where H and W are the height and width of the output face image, respectively, D_I(I_o) And D_I(I) Discriminating sub-network pairs of images I for images respectively_oThe judgment result of I; then, considering the effectiveness of the gradient loss, adding gradient-based to the image loss functionPenalty term capable of improving convergence efficiency and image generation quality, i.e. image loss function is designed as

Wherein

Representing gradient operations of the image, λ_IIs a penalty term weight;

2) optimizing the difference of conditional uniform information coding: setting conditional expression loss function, i.e. discriminating generated standard face image I_oUnified information coding corresponding to standard face image I in database

And U₀The difference between them, therefore, the conditional expression loss function is designed as follows:

wherein N is the output uniform information coding length. Then, the input image Y and the corresponding original uniform information code U are added in the conditional expression loss function_yThe conditional expression loss function is designed to be such that the discrimination ability of the discriminator is improved by the mapping relation between them

U_yEncoding the original unified information corresponding to the input image Y, U₀For uniform information coding, D, corresponding to the standard face image I_U(I_o) And D_U(Y) discrimination sub-network pair image I for information coding_oThe result of discrimination with Y;

3) optimizing the difference between the result of the image restoration subnetwork and the original input image: image I generated by an input generator_oAnd original unified information coding U_yAnd restoring and comparing with the original input image Y. The reduction loss function is thus designed as

Where h and w represent the height and width of the image.

Thus, the training loss function for the entire network is as follows:

L＝L_I+L_U+L_r

by optimizing the loss function, the convergence of the network model is realized, and the generator structure and the weight for generating the standard human face are obtained.

Further, for the generation of the actual face image in step S4, firstly, a face positioning method based on the face HOG image is used to obtain the face image in the actual image; and then, a generator for model training and artificially set unified information coding are utilized to realize the rapid standard face generation of the human face in the actual image. Furthermore, it is anticipated that other configurations of the face can be changed by setting different unified message codes, such as controlling other expressions, or further changes in the pose of the face, which should all be possible.

Compared with the prior art, the invention has the following advantages and effects:

the deep learning network technology is applied to a standard face generation task and is used for generating colorful, forward and normal-illumination standard face images; by using the deep learning network method, accurate standard face photos can be obtained, the difficulty in matching with data in a single sample database is reduced, and a solid foundation is laid for the subsequent feature extraction of the face and the single sample face recognition.

Drawings

FIG. 1 is a flow chart of model training and model application according to an embodiment of the present invention;

FIG. 2 is a flow chart of data construction of a database according to an embodiment of the present invention;

FIG. 3 is a diagram of the overall design of a network model in an embodiment of the invention;

FIG. 4 is a detailed block diagram of an image generation network in an embodiment of the present invention;

fig. 5 is a specific structural diagram of an image discrimination network in the embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Examples

The embodiment discloses a standard face generation method based on a confrontation generation mechanism and an attention generation mechanism, which mainly relates to the following technologies: 1) design of training data: designing a unified information code by using the existing data set; 2) and (3) designing a network model structure: generating a countermeasure network framework and a cycle optimization network method as a basic network structure; 3) the standard face generation method comprises the following steps: and an attention mechanism is added into the generator to restrict the generation accuracy of the standard human face.

This example is based on the TensorFlow framework and Pycharm development environment: the TensorFlow framework is a development framework based on python language, can conveniently and quickly build a reasonable deep learning network, and has good cross-platform interaction capability. TensorFlow provides interfaces for a plurality of encapsulation functions and various image processing functions in the deep learning architecture, including OpenCV related image processing functions. The TensorFlow framework can use the GPU to train and verify the model at the same time, and calculation efficiency is improved.

The development environment (IDE) is a development environment of Pycharm under Windows platform or Linux platform, which is one of the first choices in deep learning network design and development. Pycharm provides new templates, design tools and testing and debugging tools for clients, and simultaneously can provide an interface for the clients to directly call a remote server.

The embodiment discloses a standard face generation method based on a confrontation generation mechanism and an attention generation mechanism.

In the model training phase: firstly, processing the existing face data set, and generating a data set which accords with model training by designing a unified information coding mechanism; and then, training the network model by using a cloud server with high computational power, and obtaining a generator structure and weight for generating the standard human face by optimizing the loss function and adjusting network model parameters until the network model is converged.

In the model application stage: firstly, extracting an actual picture by using an HOG (human eye group) face image processing method to obtain an actual face image; then, calling a trained network model, and taking a face image with non-limiting factors and a designed unified information code as input to generate a standard face; finally, a colorful and frontal face image is obtained.

Fig. 1 is a flowchart of a standard face generation method based on a confrontation mechanism and an attention mechanism disclosed in this embodiment. The method comprises the following specific steps:

step one, because the current face database mainly takes recognition tasks and does not meet the face image database with unified information coding required by the invention, the existing database needs to be integrated to construct a proper database.

Fig. 2 is a process of constructing a face image and unified information code in a database.

Step two, fig. 3 is an overall architecture diagram of the network model. The whole model framework mainly comprises three sub-networks, namely an image generator sub-network for generating a standard human face, a model discriminator sub-network for discriminating a generated result and an image restoring sub-network for restoring the generated result; the image generator sub-network and the image restoration sub-network share parameters, and the image generator sub-network mainly combines an attention mechanism to generate the face image. Fig. 4 is a specific network structure of the image generator sub-network, and fig. 5 is a specific network structure of the model discriminator sub-network.

The main parameters are as follows:

1) the image generator sub-network and the image restoration sub-network have the same parameters and respectively comprise two generators, namely a color information generator and an attention mask generator, and the parameters are as follows:

the color information generator comprises 8 convolutional layers and 7 deconvolution layers, the size of a convolutional kernel of each convolutional layer is 5, the step length is 1, and finally a color information image of 3 channels is generated;

the attention mask generator contains 8 convolutional layers and 7 deconvolution layers, the convolutional kernel size of all convolutional layers is 5 steps 1, and finally 1-channel attention masks are generated.

2) The model discriminator subnetwork comprises two parts, namely an information coding discrimination subnetwork and an image discrimination subnetwork, and the specific steps are as follows: the information coding discrimination sub-network comprises 6 convolutional layers and 1 full-connection layer, the size of a convolutional kernel of each convolutional layer is 5, the step length is 1, and finally one-dimensional unified information codes with the length of N are generated; the image discrimination subnetwork contains 6 convolution layers, the convolution kernel size is 5, and the step size is 1.

Step three, training the model on a high-performance GPU, wherein specific training parameters are designed as follows: an Adam optimizer with parameters set to 0.9/0.999 can be used; the learning rate was set to 0.0001; the epoch for training is set to 100; the training batch setting depends on the training sample of data.

And step four, model prediction, namely extracting the human face in the actual image to be used as the input of the model, and finally obtaining a relatively standard front face image to be output by controlling the unified information unit.

The above embodiments are preferred embodiments of the present invention, but the present invention is not limited to the above embodiments, and any other changes, modifications, substitutions, combinations, and simplifications which do not depart from the spirit and principle of the present invention should be construed as equivalents thereof, and all such changes, modifications, substitutions, combinations, and simplifications are intended to be included in the scope of the present invention.

Claims

1. A standard human face generation method based on a confrontation mechanism and an attention mechanism is characterized by comprising the following steps:

s1, data structureThe method comprises the steps of establishing and collecting face data, establishing face codes with various non-limiting factors for each face image, and then classifying the face data, wherein the non-limiting factors comprise face expression factors, face posture factors and shooting illumination factors, and the coded face images form an information unit U ═ { L ═ L { (L) }_u,E_u,A_uComprises an 8-bit illumination code L_u8-bit expression code E_uAnd 19-bit attitude code A_u；

s3, model training, using information unit U ═ L_u,E_u,A_uOptimizing the output and label similarity of the image generator sub-network, the model discriminator sub-network and the image restoration sub-network by taking the images as input, and realizing the convergence of the network model based on a generation countermeasure mechanism and an attention mechanism;

s4, model prediction is carried out, a face image in an actual image is extracted to be used as the input of a network model, and finally a standard front face image is obtained through a control information unit U to be output;

the facial expression factors are divided into eight conditions, namely distraction, anger, sadness, slight, disappointment, fear, surprise and nature, and the facial expression is coded into E_u＝(E_u1,E_u2,...,E_u8) In which E_ulRepresents the expression of the first expression, wherein l is 0,1,2, …,8, and the value is [0,1]，E_u1, expressed as a natural expression;

the human face illumination factors are divided into eight conditions, namely front illumination, left side illumination, right left illumination, right side illumination, left and right illumination, no illumination and full illumination, and the illumination information of the human face is encoded into L_u＝(L_u1,L_u2,...,L_u8) Wherein L is_unRepresents the nth lighting situation and takes the value of [0, 1%]，L_uExpressed as full-illumination image information, (0, 0.., 1);

the human face posture factor is divided into 19 conditions, namely, 90 degrees on the left side, 80 degrees on the left side, 70 degrees on the left side, 60 degrees on the left side, 50 degrees on the left side, 40 degrees on the left side, 30 degrees on the left side, 20 degrees on the left side, 10 degrees on the right side, 20 degrees on the right side, 30 degrees on the right side, 40 degrees on the right side, 50 degrees on the right side, 60 degrees on the right side, 70 degrees on the right side, 80 degrees on_u＝(A_u1,A_u2,...,A_um,...,A_u19) Wherein A is_umRepresents the mth human face pose, and m is 0,1,2, …,19, and takes the value of [0,1]，A_uAs the frontal posture information, (0, 0.., 1) is expressed.

2. The method for generating a standard human face based on a confrontation mechanism and an attention mechanism according to claim 1, wherein the classifying process of the human face data in step S1 is as follows: the encoded face data is classified into a non-limiting factor face image and a standard front natural sharp face image, wherein,

unify the coding information into U₀＝(L_u(0,0,...,1),E_u(0,0,...,1),A_u(0, 0., 1),) as a standard front natural sharp face image, and using this as a target image of the model, and the remaining face images as non-limiting face images, and using this as input images of the model.

3. The standard face generation method based on the generation of confrontation mechanism and attention mechanism as claimed in claim 1,

the image generator sub-network having as its inputImage Y and standard face unified information coding U₀The image generator subnetwork comprises two codec networks G_cAnd G_fWherein the codec network G_cCodec network G focusing on color information and texture information of human face_fPaying attention to the area needing to be changed in the human face, respectively generating a color information mask C and an attention mask F by combining an attention mechanism, and then generating a standard human face by the following synthesis mechanism:

C＝G_c(Y,U₀)，F＝G_f(Y,U₀)

I_o＝(1-F)⊙C+F⊙Y

wherein |, indicates an element-by-element multiplication operation of the matrix;

the model discriminator subnetwork has the input of the image I generated by the image generator subnetwork_oThe model discriminator subnetwork comprises two deep convolution network image discrimination subnetworks D_ISum information coding discrimination sub-network D_URespectively used for judging the generated standard face image I_oThe difference with the corresponding standard face image I in the database, and the generated standard face image I_oCorresponding unified information coding

the image restoring sub-network inputs the generated standard face image I_oOriginal unified information encoding U corresponding to input image Y_yThe output is the network restoration result

By subjecting the reduction result to

And comparing the image with the input image Y of the whole network to realize the result of the circular optimization network.

4. The method for generating a standard human face based on a confrontation mechanism and an attention mechanism as claimed in claim 3, wherein said step S2 is as follows:

firstly, unified information coding U corresponding to an input image Y and a standard face image I₀Inputting the image into an image generator sub-network of a fusion attention mechanism for generating a standard face image I_o；

Then, the generated standard face image I_oSending the standard face image I corresponding to the database into a deep convolution network D in a model discriminator subnetwork_IThe standard face image I is generated at the same time of discrimination_oCorresponding unified information coding

Unified information coding U corresponding to standard face image I in database₀Deep convolutional network D in model arbiter subnetwork_UCarrying out discrimination to simultaneously optimize the image generator sub-network and the model discriminator sub-network;

finally, the generated standard face image I_oInputting the image into an image recovery subnetwork, and coding U according to the original uniform information corresponding to the original input image Y_yCarrying out reduction and obtaining the reduction result

And comparing the image with the input image Y, and realizing the convergence of the network model based on the generation countermeasure mechanism and the attention mechanism by continuously optimizing the corresponding loss function.

5. The method as claimed in claim 1, wherein the model training in step S3 optimizes the loss function to achieve convergence of the model, wherein the loss function design process is as follows:

standard face image I generated by optimization discrimination_oDifference between standard face image I corresponding to the database: the image loss function is set as follows

Where H and W are the height and width of the output face image, respectively, D_I(I_o) And D_I(I) Discriminating sub-network pairs of images I for images respectively_oThe judgment result of I; then, considering the effectiveness of gradient loss, a penalty term based on gradient is added to the image loss function, namely the image loss function is designed to be

Wherein

Representing gradient operations of the image, λ_IIs a penalty term weight;

optimizing the difference of conditional uniform information coding: setting conditional expression loss function, i.e. discriminating generated standard face image I_oUnified information coding corresponding to standard face image I in database

And U₀The difference between them, the conditional expression loss function is designed as follows:

wherein N is the length of the outputted unified information code, and then, in the conditional expression loss function, the input image Y and the corresponding original unified information code U are added_yAnd therefore, the conditional expression loss function is designed as follows:

wherein U is_yCompiling original unified information corresponding to input image YCode, U₀For uniform information coding, D, corresponding to the standard face image I_U(I_o) And D_U(Y) discrimination sub-network pair image I for information coding_oThe result of discrimination with Y;

optimizing the difference between the result of the image restoration subnetwork and the original input image: image I generated by an input generator_oAnd original unified information coding U_yThe restoration is performed and compared with the original input image Y, and therefore, the restoration loss function is designed to

Wherein h and w represent the height and width of the image, and G represents the image generator subnetwork;

the loss function of the entire network model is as follows:

L＝L_I+L_U+L_r。

6. the method for generating a standard human face based on a confrontation mechanism and an attention mechanism as claimed in claim 1, wherein the procedure of step S4 is as follows:

firstly, acquiring a face image in an actual image by using a face positioning method based on a face HOG image;

and then, a generator for network model training and artificially set unified information coding are utilized to realize the rapid standard face generation of the human face in the actual image.

7. The standard face generation method based on the confrontation mechanism and attention mechanism generation of claim 1, wherein in step S1, face data in the RaFD face data set and the IAIR face data set are collected.