CN111523406A

CN111523406A - Deflection face correcting method based on generation of confrontation network improved structure

Info

Publication number: CN111523406A
Application number: CN202010269281.6A
Authority: CN
Inventors: 达飞鹏; 胡惠雅
Original assignee: Southeast University
Current assignee: Southeast University
Priority date: 2020-04-08
Filing date: 2020-04-08
Publication date: 2020-08-11
Anticipated expiration: 2040-04-08
Also published as: CN111523406B

Abstract

The invention discloses a deflection human face correcting method based on an improved structure of a generation confrontation network. The processing steps are as follows: (1) detecting the characteristic points of the human face, and extracting human face region blocks (eyes, a nose and a mouth) with fixed sizes; (2) respectively inputting the human face region blocks and the whole human face in the step 1 into a local channel and a whole channel to obtain a result of correcting the local channel and the whole channel; (3) fusing the outputs of the local channel and the whole channel, and setting the pixel of the overlapped area as the maximum value on the area to obtain the final generated face; (4) inputting the generated face and the front face into a discriminator and a classifier to ensure the accuracy and identity consistency of the generated face; (5) and saving the trained network model for testing. The BEGAN network adopted by the invention has simple and efficient structure, and improves the accuracy and speed of the deflection face correction to a certain extent.

Description

Deflection face correcting method based on generation of confrontation network improved structure

Technical Field

The invention relates to a deflection human face correcting method based on an improved structure of a generation confrontation network, and belongs to the field of computer vision.

Background

With the continuous development of deep learning, the research on face recognition has made many breakthrough progresses. Recognition algorithms based on deep learning even exceed the level of human eyes, however, most of the researches are under the premise of a front face or a similar front face, and therefore, the researches have certain limitations. There is evidence that even the most well-behaved frontal face recognition method has a greatly reduced recognition rate at large angular deflections. For face recognition under posture change, the existing methods can be roughly classified into the following three types: a feature extraction method based on pose robustness, a method based on front face generation and a method based on subspace analysis.

For the first method, the conventional feature extraction method mainly utilizes some robust local descriptors, such as Gabor features, SIFT features, LBP features, and the like, and the latest improved method utilizes deep learning to extract features, such as LightCNN and Facenet structure extraction features, but both of these two methods for extracting features cannot effectively handle the situation of large attitude deflection. For the third method, the linear subspace is difficult to express the nonlinearity of the face pose change process, and the learning of the nonlinear subspace is often accompanied with the problem of complex training. Therefore, the present invention is primarily concerned with the second approach, the frontal face generation based approach. The early research is realized by establishing a three-dimensional face model, the requirement on the feature points is high, and particularly when the deflection angle of the face is slightly large, some feature points in a two-dimensional face picture are invisible, so that the method has certain limitation.

Compared with a method for generating a face by using a three-dimensional face model, the method for generating the confrontation network to carry out face conversion is a great trend, and exciting performance is obtained. The current methods for face alignment using a generated confrontation network can be divided into two-dimensional methods and three-dimensional methods. The two-dimensional method comprises TP-GAN and PIM, and the three-dimensional method mainly comprises the step of applying a three-dimensional human face deformation model (3DMM) to the generation of the confrontation network, and obtaining shape and texture parameters through the model to provide prior for accurate recovery of a human face structure. Because the face structure is a complex three-dimensional structure, and the two-dimensional method is difficult to implement due to lack of constraints, the two-dimensional method usually adopts a double-path network structure generated by face contours and face details, and establishes a series of supervisory loss functions to provide constraints on the face structure, such as constraints established by keeping symmetry of generated faces.

In addition, in both the two-dimensional method and the three-dimensional method, a feature extraction module is often added to maintain identity consistency before and after correction, wherein the face feature extraction structure Light-CNN has better performance in both time and space complexity, and thus is widely used.

Disclosure of Invention

The purpose of the invention is as follows: aiming at the problems, in a method TP-GAN for performing face correction by a two-dimensional method, aiming at the problems that a DCGAN structure adopted by the TP-GAN for generating an antagonistic network has a plurality of training difficulties and is easy to generate mode collapse, and the training process is relatively complicated because multi-scale images are collected on the face, the invention provides a method for correcting the deflected face based on an improved structure for generating the antagonistic network.

The method firstly maintains identity consistency of human faces before and after generation by introducing a third confrontation structure, namely a classifier, on the basis of a generation confrontation network structure of a traditional two-way confrontation of a double path. Practice proves that the method can better keep identity consistency before and after face correction, the generated front face has higher quality, and meanwhile, the method greatly reduces the difficulty of network training and improves the training efficiency.

The technical scheme is as follows: in order to realize the purpose of the invention, the technical scheme adopted by the invention is as follows: a method for correcting a deflected face based on an improved structure of a generation countermeasure network comprises the following steps:

step 1, deflecting face I for inputⁱⁿDetecting characteristic points, and extracting human face region blocks with fixed sizes, namely eyes, a nose and a mouth;

step 2, respectively inputting the human face region blocks and the whole human face obtained in the step 1 into a local channel and a whole channel to obtain a result of inverting the local channel and the whole channel;

step 3, converting the output of the local channel into the corrected human face area and the wholeFusing the output whole human face contour of the channel, setting the pixel of the overlapped region as the maximum value on the region, and obtaining the final generated human face I^pred；

Step 4, generating a face I^predWith a corresponding frontal face I^gtInputting the data into a discriminator in sequence to ensure the accuracy of face generation, wherein I^gtInputting a front face corresponding to a deflected face;

step 5, generating a human face I^predWith a corresponding frontal face I^gtMeanwhile, the identity consistency of the generated face and the input deflection face is ensured when the face is input into the classifier;

and 6, storing the generated confrontation network model obtained by training, and correcting the deflected face during testing.

As a further technical solution of the present invention, step 1 specifically comprises:

step 1.1, uniformly normalizing the size of the face to 128 multiplied by 128, and constructing a caffe deep learning environment;

step 1.2, under a caffe deep learning environment, detecting feature points of a face, and positioning five feature points of a center point of eyes, a nose tip point and two side points of a mouth corner;

and step 1.3, obtaining each region block in the face, namely the mouth, the nose and the eyes according to the five feature points obtained in the step 1.2, wherein in order to ensure that the network can train smoothly, the size of the region blocks among different faces is kept constant, the size of the eye region is set to be 40 × 40, and the size of the nose and the mouth are respectively set to be 32 × 40 and 32 × 48.

As a further technical solution of the present invention, step 2 specifically is:

step 2.1, to the front face I^gtExtracting face blocks of the eye region, the nose region and the mouth region in the same way to respectively obtain eye^gt、Eyer^gt、Nose^gt、Mouth^gt。

Step 2.2: aiming at a local channel and an overall channel, designing a generator structure into a U-Net structure so as to ensure that the sizes of pictures before and after generation are kept unchanged;

step 2.3: for local channels, inputting the face region blocks obtained in the step 1 into a local channel generator to obtain corrected face region blocks, namely eye^pred、Eyer^pred、Nose^pred、Mouth^pred(ii) a The loss function of the local channel is designed as an L1 loss function and consists of the following parts:

wherein, W_eyel、W_eyer、W_nose、W_mouWidth of left eye, right eye, nose and mouth regions, respectively, equal to 40, 32, H_eyel、H_eyer、H_nose、H_mouThen the corresponding is height, which is equal in value to 40, 40 and 48, respectively.

The gray values of the left eye, the right eye, the nose and the mouth which are just rotated and positioned at the (x, y) coordinate are respectively,

the gray values at (x, y) coordinates are blocked for each region in the corresponding frontal face.

The loss function of the local channel is the sum of the losses of each region block, and lambda is set₂Being the weight value of local channel loss, the loss function local _ loss of the local channel can be expressed as follows:

local_loss＝λ₂*(eyel_loss+eyer_loss+nose_loss+mouth_loss)

step 2.4: for the whole channel, likewise, the face I will be deflectedⁱⁿInput into the encoder structure with 3 × 3 convolution kernel, the 128 × 128 dimension face outline with the same size as the original face is obtained.

As a further technical solution of the present invention, step 3 specifically is:

step 3.1: based on the result of the correction of the face of each region block obtained in the step 2, expanding each region block by 0 to obtain the picture with the same size as the original face, namely the face pictures respectively corresponding to the left eye, the right eye, the nose and the mouth of the face;

step 3.2: and obtaining the face pictures corresponding to the local channels according to the four face pictures with the same size obtained in the step 3.1. The method comprises the following steps: comparing gray values of pixel points with the same coordinates in the four face pictures, selecting the maximum value of the gray values, taking the maximum value as the gray value of the coordinate point, and repeating the operation pixel by pixel to finally obtain the face picture corresponding to the local channel;

step 3.3: the local detail face picture obtained in the last step is fused with the front face contour obtained in the integral channel in the step 2.4, and a final generated picture I is obtained through a plurality of applied loss functions^pred；

The loss function applied in step 3.3 is specifically:

1) pixel-by-pixel level L1 penalty:

wherein W and H are width and height of the face picture, both of which are 128,

and

respectively represent and generate a face picture I^predAnd gray at coordinates (x, y) corresponding to the frontal faceA value of the metric;

2) generating the confrontation loss:

since the generator performs countercheck generation with the discriminator and the classifier respectively in the invention, L in the above formula_G-DAnd L_G-CThe loss function is established when the generator respectively confronts with the discriminator and the classifier, the confrontation loss of the generator is designed as the weighted sum of the two parts aiming at the quality of the generated picture and keeping identity consistency, and the weights are respectively set as lambda₀And λ₁. The two parts correspond to the following specific formula, wherein I^pred'Representing generated face picture I^predBy the output of the discriminator D, W and H are the width and height of the face picture, both 128,

and

each represents I^predAnd I^pred'The gray value at coordinate (x, y),

features extracted for the classifier:

3) face symmetry loss:

and

respectively represent and generate a face picture I^predGray values at coordinates (x, y) and coordinates (W- (x-1), y);

4) the regularization loss function is a function that accounts for the noise present in the generated face picture.

As a further technical solution of the present invention, step 4 specifically is:

step 4.1: the face picture I obtained in the step 3.3 is processed^predWith front face I^gtSuccessively inputting the images into a discriminator, wherein the discriminator is structurally a self-encoder, and the output of the discriminator is a face image;

step 4.2: will generate a face I^predWith front face I^gtThe output face pictures after passing through the discriminator are respectively marked as I^pred'And I^gt'The pixel-by-pixel errors are calculated separately, i.e. the L1 norm errors are calculated and summed for each pixel in the picture, the expression is as follows:

and

each represents I^gt、I^pred、I^gt'And I^pred'A gray value at coordinate (x, y);

step 4.3: the penalty function for design arbiter D is:

L_D＝L(I^gt)-k_tL(I^pred)

wherein k is_tThe degree of importance, k, to the discriminator can be controlled manually_tThe larger the value, the stronger the judgment ability of the representative setting discriminator. During the training of the arbiter, it is required to minimize L_DThe method requires that the pixel-by-pixel error of the face is generated maximally, and simultaneously the pixel-by-pixel error of the real face is minimized, so that the aim of distinguishing the two faces is fulfilled;

step 4.4: for the game process between the discriminator and the generator, the loss function of the generator is designed as follows:

L_G-D＝L(I^pred)

contrary to the discriminator, the generator aims to make the generated face as close as possible to the real face, so the loss function of the generator generates the face I for minimization^predThe quality of the generated face is ensured by the pixel-by-pixel error;

step 4.5: according to the loss functions in the step 4.3 and the step 4.4, the discriminator and the countermeasure are continuously generated in a countercheck mode, the expression capacities of the discriminator and the countermeasure are stronger and stronger, and the training data are more and more vivid.

As a further technical solution of the present invention, step 5 specifically is:

step 5.1: the face picture I obtained in the step 3.3 is processed^predCorresponding front face I^gtInputting the images into a classifier (assuming that a training set has N faces in total, then one image has 2N images), wherein the classifier C is a Light-CNN model trained in advance and is used as an independent structure to confront with a generator;

step 5.2: front face I^gtAnd generator generating face I^predThe features extracted by the classifier are respectively recorded as

And

defining the result label of the classifier as 1-2N, and the classification target of the classifier is to classify the front face picture I^gtCorresponding to the first N labels, generating a picture I^predCorrespond toThe last N tags. The cross entropy loss function is respectively as follows:

wherein j ∈ { 1.., 2N },

and

classifier labels corresponding to the face of the person and the generated face picture respectively,

representing the jth front face picture I^gtThe corresponding correct label is then used to identify the correct label,

representing the classification label of the front face picture by the classifier;

representing the jth generated face picture I^predCorresponding correct tag, { C (I)^pred)}_jOutput labels after the classifier classifies the output labels;

step 5.3: the classifier loss function is designed as the weighted sum of the two cross entropy loss functions, and alpha is a weight factor, because the generated face is relatively close to the face picture corresponding to the generated face in the feature space. The loss function of the classifier is formulated as follows:

step 5.4: aiming at the game process between the generator and the classifier, designing the loss function corresponding to the generator as follows:

wherein the content of the first and second substances,

the calculation formula of (a) is as follows:

namely, the cosine distance between the generated face picture and the corresponding front face picture characteristic is minimized, so that the aim of keeping identity consistency between the generated face picture and the corresponding front face picture is fulfilled;

step 5.5: weighting the generator in the step 4.4 and the generator in the step 5.4 with the confrontation loss of the discriminator and the classifier respectively to obtain the generated confrontation loss of the generator in the step 3.4, wherein a formula is rewritten as follows:

step 5.6: the classifier and the generator respectively have corresponding confrontation loss functions, the classifier and the generator continuously confront and generate, the expression capacities of the classifier and the generator are stronger and stronger, and the training data is closer to the identity information of the original face.

As a further technical solution of the present invention, step 6 specifically is:

step 6.1: storing the network model parameters during training;

step 6.2: extracting the characteristic points of the test set deflection human face picture, wherein the extraction method is the same as that of the training set and is completed in a caffe deep learning environment;

step 6.3: extracting face blocks (eyes, a nose and a mouth) of the deflected face region according to the feature points obtained in the step 5.2, wherein the sizes of the face blocks are consistent with the corresponding sizes in the training set, namely the sizes of the eyes, the nose and the mouth are respectively 40 × 40, 32 × 40 and 32 × 48;

step 6.4: the network model obtained in the training is used for correcting the test set face region blocks, and similarly, the face contour after the whole face of the deflected face is corrected can also be obtained;

step 6.5: and (4) fusing two outputs of the local channel and the whole channel according to the network model parameters in the training set, and keeping the loss function of the generator consistent with that in the step (3.4), thereby obtaining a generated front face picture corresponding to the test set deflection face.

Has the advantages that: compared with the prior art, the technical scheme of the invention has the following beneficial technical effects:

1) firstly, different from other single-channel network structures, the invention adopts a double-channel network structure to recover the face from two channels of local details and the whole face, compared with the problem of detail loss caused by independent input of the face into the network, the double-channel structure can better retain face detail information, thereby generating a more vivid front face picture.

2) Secondly, a special structure is used before and after the face is corrected to keep the consistency of the face identity characteristics, and the structure is used as a third structure for generating an antagonistic network to participate in network training, and experimental results show that the method can better accelerate network convergence and keep face identity information.

3) Finally, the discriminator is designed as a self-encoder, the output of the self-encoder is still a face picture, the Wassertein distance of pixel-by-pixel error distribution of the generated face and the original face is minimized, the loss function of the discriminator is an energy loss function, and the continuous energy value is different from a traditional generation countermeasure network which defines the output of a generator as a discrete value, and the face picture with high resolution can be better generated.

Drawings

FIG. 1 is a flow chart of the overall process of the present invention.

Fig. 2 shows distribution positions and serial numbers of 68 contour feature points extracted in a caffe environment.

Fig. 3 is a schematic diagram of the structure of the generator.

Fig. 4 is a schematic diagram of the structure of the discriminator.

Fig. 5 is a schematic diagram of a classifier structure.

Fig. 6 is a schematic diagram of the correction effect of the present invention on a Multi-PIE data set.

Fig. 7 is a schematic diagram of the positive conversion effect on the LFW test set, where (a) is the original face picture in the LFW test set, (b) is the positive conversion result obtained by the present invention, and (c) and (d) are both corresponding result pictures in other studies.

Detailed Description

The technical solution of the present invention will be further described in detail with reference to the following examples and accompanying drawings.

Spyder software is selected as a programming tool under a Linux operating system, and a confrontation network model is established and generated. This example was trained using 13 different pose pictures of 337 individuals of the Multi-PIE face library under the same lighting conditions and tested on an LFW deflection face data set.

Fig. 1 is a schematic diagram of the network structure of the present invention, and the specific steps are as follows:

step 1: the method comprises the following steps of detecting characteristic points of a human face, and extracting human face region blocks (eyes, a nose and a mouth) with fixed sizes, wherein the specific steps are as follows:

step 1.1, uniformly normalizing the size of the face to be 128 multiplied by 128, and constructing a caffe deep learning environment;

step 1.2, according to the combination Data-drive and Model-drive Methods for robust Facial Landmark Detection [ J ]. IEEE Transactions on information about features & Security, 2018:1-1, the proposed key store extraction method is used for detecting the feature points of the face under the condition of caffe deep learning, and positioning five feature points of the center point of the eyes, the nose tip point and the two side points of the mouth corner;

and step 1.3, obtaining each region block (mouth, nose and eyes) in the face according to the five feature points obtained in the step 1.2, wherein in order to ensure that the network can train smoothly, the size of the region blocks among different faces is kept constant, the size of the eye region is set to be 40 × 40, and the sizes of the nose and the mouth are respectively set to be 32 × 40 and 32 × 48.

Step 2: respectively inputting the human face region blocks and the whole human face obtained in the step 1 into a local channel and a whole channel to obtain a corrected result of the local channel and the whole channel, and specifically comprising the following steps:

step 2.3: for local channels, inputting the face region blocks obtained in the step 1 into a local channel generator to obtain corrected face region blocks, namely eye^pred、Eyer^pred、Nose^pred、Mouth^pred；

The loss function of the local channel is designed as an L1 loss function and consists of the following parts:

wherein, W_eyel、W_eyer、W_nose、W_mouWidth of left eye, right eye, nose and mouth region, respectively, equal to 40, 32, respectively、32，H_eyel、H_eyer、H_nose、H_mouThen the corresponding is height, which is equal in value to 40, 40 and 48, respectively.

local_loss＝λ₂*(eyel_loss+eyer_loss+nose_loss+mouth_loss)

And step 3: the output of the local channel, the corrected human face area and the output of the whole channel, the whole human face outline are fused, the pixel of the overlapping area is set as the maximum value on the area, and the final generated human face I is obtained^predThe method comprises the following specific steps:

The loss function applied in step 3.3 is specifically:

1) pixel-by-pixel level L1 penalty:

and

respectively represent and generate a face picture I^predAnd the gray value corresponding to the face at the coordinates (x, y);

2) generating the confrontation loss:

and

each represents I^predAnd I^pred'The gray value at coordinate (x, y),

features extracted for the classifier:

3) face symmetry loss:

and

Fig. 3 is a schematic diagram of the structure of the generator.

And 4, step 4: will generate a face I^predWith a corresponding frontal face I^gtThe method comprises the following steps of inputting the sequence into a discriminator to ensure the accuracy of face generation:

step 4.2: will generate a face I^predWith front face I^gtThrough the output of the discriminatorThe pictures of the face are respectively marked as I^pred'And I^gt'The pixel-by-pixel errors are calculated separately, i.e. the L1 norm errors are calculated and summed for each pixel in the picture, the expression is as follows:

and

step 4.3: the penalty function for design arbiter D is:

L_D＝L(I^gt)-k_tL(I^pred)

L_G-D＝L(I^pred)

Fig. 4 is a schematic diagram of the structure of the discriminator.

Step 5, generating a human face I^predWith a corresponding frontal face I^gtMeanwhile, the identity consistency of the generated face and the input deflection face is ensured by inputting the face into a classifier, and the specific steps are as follows:

And

defining the result label of the classifier as 1-2N, and the classification target of the classifier is to classify the front face picture I^gtCorresponding to the first N labels, generating a picture I^predAnd corresponding to the last N labels. The cross entropy loss function is respectively as follows:

wherein j ∈ { 1.., 2N },

and

representing the jth front face picture I^gtCorresponding correct tag, { C (I)^gt)}_jRepresenting the classification label of the front face picture by the classifier;

therein

The calculation formula of (a) is as follows:

Fig. 5 is a schematic diagram of the structure of the classifier.

And 6, storing the generated confrontation network model obtained by training, and correcting the deflected face during testing, wherein the specific steps are as follows:

step 6.1: storing the network model parameters during training;

Fig. 7 is a schematic diagram of the positive effects on the LFW test set.

The invention is based on the BEGAN in the generation countermeasure network, and the structure minimizes Wassertein distance of pixel-by-pixel error distribution of human faces before and after the generation of the BEGAN and can avoid a plurality of problems in the traditional generation countermeasure network; the method comprises the following steps of accurately recovering a face structure from a local layer and an integral layer by using a dual-channel network structure; furthermore, by using a network structure of the three countermeasures, the classifier keeping identity consistency is used as an independent structure to be countermeasures with the generator instead of being used as a part of the generation process of the supervision generator of the loss function, the accuracy and the speed of the method for correcting the deflected face are improved, and a good correcting effect can be obtained under the condition of large-angle deflection.

The above description is only an embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can understand that the modifications or substitutions within the technical scope of the present invention are included in the scope of the present invention, and therefore, the scope of the present invention should be subject to the protection scope of the claims.

Claims

1. A method for correcting a deflected face based on an improved structure of a generation countermeasure network is characterized by comprising the following steps:

step 3, fusing the corrected human face region output by the local channel and the whole human face contour output by the whole channel, setting the pixel of the overlapping region as the maximum value on the region, and obtaining the final generated human face I^pred；

Step 4, generating a face I^predWith a corresponding frontal face I^gtSequential input discriminationIn the device, the accuracy of generating the human face is ensured, wherein I^gtInputting a front face corresponding to a deflected face;

2. The method for correcting the deflected face based on the improved structure of the generated countermeasure network as claimed in claim 1, wherein step 1 specifically comprises:

and step 1.3, obtaining each region block in the human face, namely the mouth, the nose and the two eyes according to the five feature points obtained in the step 1.2, wherein the size of the region blocks among different human faces is kept constant, the size of the eye region is set to be 40 × 40, and the size of the nose and the mouth are respectively set to be 32 × 40 and 32 × 48.

3. The method for correcting the deflected face based on the improved structure of the generated countermeasure network as claimed in claim 2, wherein the step 2 is specifically:

step 2.1, to the front face I^gtExtracting face blocks of the eye region, the nose region and the mouth region in the same way to respectively obtain eye^gt、Eyer^gt、Nose^gt、Mouth^gt；

step 2.3: for local channels, the face area obtained in the step 1 is input into local channels in a blocking modeIn the channel generator, the face region blocks after correction are obtained and are eye respectively^pred、Eyer^pred、Nose^pred、Mouth^predWherein, the loss function of the local channel is designed as L1 loss function, which is composed of the following parts:

wherein, W_eyel、W_eyer、W_nose、W_mouWidth of left eye, right eye, nose and mouth regions, respectively, equal to 40, 32, H_eyel、H_eyer、H_nose、H_mouCorresponding is a height, equal in value to 40, 40 and 48 respectively,

dividing each region in the face corresponding to the front into blocks and dividing the gray values at the (x, y) coordinates;

local_loss＝λ₂*(eyel_loss+eyer_loss+nose_loss+mouth_loss)

step 2.4: for the whole channel, the human face I will be deflectedⁱⁿInput into the encoder structure with 3 × 3 convolution kernel, the 128 × 128 dimension face outline with the same size as the original face is obtained.

4. The method for correcting the deflected face based on the improved structure of the generated countermeasure network as claimed in claim 3, wherein step 3 is specifically:

step 3.2: obtaining the face pictures corresponding to the local channels according to the four face pictures with the same size obtained in the step 3.1, wherein the specific method comprises the following steps: comparing gray values of pixel points with the same coordinates in the four face pictures, selecting the maximum value of the gray values, taking the maximum value as the gray value of the coordinate point, and repeating the operation pixel by pixel to finally obtain the face picture corresponding to the local channel;

step 3.3: fusing the local detail face picture obtained in the last step with the front face contour obtained in the integral channel in the step 2.4, and training through a plurality of applied loss functions to obtain a final generated picture I^pred；

The loss function applied in step 3.3 is specifically:

1) pixel-by-pixel level L1 penalty:

and

2) generating the confrontation loss:

l in the above formula_G-DAnd L_G-CThe loss function is established when the generator respectively confronts with the discriminator and the classifier, the confrontation loss of the generator is designed as the weighted sum of the two parts aiming at the quality of the generated picture and keeping identity consistency, and the weights are respectively set as lambda₀And λ₁The two parts correspond to the following specific formula, wherein I^pred' Generation of face Picture I^predBy the output of the discriminator D, W and H are the width and height of the face picture, both 128,

and

each represents I^predAnd I^pred' gray value at coordinates (x, y),

features extracted for the classifier:

3) face symmetry loss:

and

5. The method for generating a deflected face correction based on an improved structure of a confrontation network as claimed in claim 4, wherein the step 4 is specifically:

and

step 4.3: the penalty function for design arbiter D is:

L_D＝L(I^gt)-k_tL(I^pred)

wherein k is_tThe degree of attention to the discriminator can be manually controlled, and the minimum L is required in the training process of the discriminator_DThe method requires that the pixel-by-pixel error of the face is generated maximally, and simultaneously the pixel-by-pixel error of the real face is minimized, so that the aim of distinguishing the two faces is fulfilled;

L_G-D＝L(I^pred)

loss function of generator for generating face I for minimization^predThe quality of the generated face is ensured by the pixel-by-pixel error;

step 4.5: and (4) continuously carrying out countercheck generation on the discriminator and the countercheck according to the loss functions in the step 4.3 and the step 4.4 until the training reaches a preset condition.

6. The method for correcting the deflected face based on the improved structure of the generated countermeasure network as claimed in claim 5, wherein the step 5 is specifically:

step 5.1: the face picture I obtained in the step 3.3 is processed^predCorresponding front face I^gtInputting the images into a classifier together, wherein a training set is assumed to have N faces in total, then 2N pictures are in total, and the classifier C is a Light-CNN model trained in advance and is used as an independent structure to confront with a generator;

And

defining the result label of the classifier as 1-2N, and the classification target of the classifier is to classify the front face picture I^gtCorresponding to the first N labels, generating a picture I^predCorresponding to the last N labels, the cross entropy loss functions of the N labels are respectively corresponding to:

wherein j ∈ { 1.., 2N },

and

step 5.3: the classifier loss function is designed as the weighted sum of the two cross entropy loss functions, alpha is a weight factor, and the loss function formula of the classifier is expressed as follows:

wherein the content of the first and second substances,

the calculation formula of (a) is as follows:

namely, the cosine distance between the generated face picture and the corresponding front face picture characteristic is minimized;

step 5.6: the classifier and the generator respectively have corresponding confrontation loss functions, and the classifier and the generator continuously confront and generate until the training reaches a preset condition.

7. The method for generating a deflected face correction based on an improved structure of a confrontation network as claimed in claim 6, wherein step 6 is specifically:

step 6.1: storing the network model parameters during training;

step 6.3: extracting the face blocks of the deflected face region, namely the eyes, the nose and the mouth, according to the feature points obtained in the step 5.2, wherein the sizes of the eyes, the nose and the mouth are consistent with the corresponding sizes in the training set, namely the sizes of the eyes, the nose and the mouth are respectively 40 × 40, 32 × 40 and 32 × 48;