CN111523406B

CN111523406B - Deflection face correcting method based on generation confrontation network improved structure

Info

Publication number: CN111523406B
Application number: CN202010269281.6A
Authority: CN
Inventors: 达飞鹏; 胡惠雅
Original assignee: Southeast University
Current assignee: Southeast University
Priority date: 2020-04-08
Filing date: 2020-04-08
Publication date: 2023-04-18
Anticipated expiration: 2040-04-08
Also published as: CN111523406A

Abstract

The invention discloses a deflection human face correcting method based on an improved structure of a generation confrontation network. The processing steps are as follows: (1) Detecting characteristic points of the human face, and extracting human face region blocks (eyes, a nose and a mouth) with fixed sizes; (2) Respectively inputting the human face region blocks and the whole human face in the step 1 into a local channel and a whole channel to obtain a result after the local channel and the whole channel are corrected; (3) Fusing the outputs of the local channel and the whole channel, and setting the pixel of the overlapped area as the maximum value on the area to obtain the final generated face; (4) Inputting the generated face and the front face into a discriminator and a classifier to ensure the accuracy and identity consistency of the generated face; (5) And saving the trained network model for testing. The BEGAN network adopted by the invention has simple and efficient structure, and improves the accuracy and speed of the deflection face correction to a certain extent.

Description

Deflection face correcting method based on generation of confrontation network improved structure

Technical Field

The invention relates to a deflection human face correcting method based on an improved structure of a generation confrontation network, and belongs to the field of computer vision.

Background

With the continuous development of deep learning, the research on face recognition has made many breakthrough progresses. Recognition algorithms based on deep learning even exceed the level of human eyes, however, most of the researches are under the premise of a front face or a similar front face, and therefore, the researches have certain limitations. There is evidence that even the most well-behaved frontal face recognition method has a greatly reduced recognition rate at large angular deflections. For face recognition under posture change, the existing methods can be roughly classified into the following three types: a feature extraction method based on pose robustness, a method based on front face generation and a method based on subspace analysis.

For the first method, the conventional feature extraction method mainly utilizes some robust local descriptors, such as Gabor features, SIFT features, LBP features, and the like, and the latest improved method utilizes deep learning to extract features, such as Light CNN and Facenet structure extraction features, but both of these two methods for extracting features cannot effectively handle the situation of large attitude deflection. For the third method, the linear subspace is difficult to express the nonlinearity of the face pose change process, and the learning of the nonlinear subspace is often accompanied with the problem of complex training. Therefore, the present invention is primarily concerned with the second approach, the frontal face generation based approach. The early research is realized by establishing a three-dimensional face model, the requirement on the feature points is high, and particularly when the deflection angle of the face is slightly large, some feature points in a two-dimensional face picture are invisible, so that the method has certain limitation.

Compared with a method for generating a face by using a three-dimensional face model, the method for generating the confrontation network to carry out face conversion is a great trend, and exciting performance is obtained. The current methods for face alignment using a generated confrontation network can be divided into two-dimensional methods and three-dimensional methods. The two-dimensional method comprises TP-GAN and PIM, and the three-dimensional method mainly comprises the step of applying a three-dimensional human face deformation model (3 DMM) to the generation of the confrontation network, and obtaining shape and texture parameters through the model to provide prior for accurate recovery of a human face structure. Because the face structure is a complex three-dimensional structure, and the two-dimensional method is difficult to implement due to lack of constraints, the two-dimensional method usually adopts a double-path network structure generated by face contours and face details, and establishes a series of supervisory loss functions to provide constraints on the face structure, such as constraints established by keeping symmetry of generated faces.

In addition, in both the two-dimensional method and the three-dimensional method, a feature extraction module is often added to maintain identity consistency before and after correction, wherein the face feature extraction structure Light-CNN has better performance in both time and space complexity, and thus is widely used.

Disclosure of Invention

The purpose of the invention is as follows: aiming at the problems, in a method TP-GAN for performing face correction by a two-dimensional method, aiming at the problems that a DCGAN structure adopted by the TP-GAN for generating an antagonistic network has a plurality of training difficulties and is easy to generate mode collapse, and the training process is relatively complicated because multi-scale images are collected on the face, the invention provides a method for correcting the deflected face based on an improved structure for generating the antagonistic network.

The method firstly maintains identity consistency of human faces before and after generation by introducing a third confrontation structure, namely a classifier, on the basis of a generation confrontation network structure of a traditional two-way confrontation of a double path. Practice proves that the method can better keep identity consistency before and after face correction, the generated front face has higher quality, and meanwhile, the method greatly reduces the difficulty of network training and improves the training efficiency.

The technical scheme is as follows: in order to realize the purpose of the invention, the technical scheme adopted by the invention is as follows: a method for correcting a deflected face based on an improved structure of a generation countermeasure network comprises the following steps:

step 1, deflecting face I for input ⁱⁿ Detecting characteristic points, and extracting human face region blocks with fixed sizes, namely eyes, a nose and a mouth;

step 2, respectively inputting the human face region blocks and the whole human face obtained in the step 1 into a local channel and a whole channel to obtain a result of inverting the local channel and the whole channel;

step 3, fusing the corrected human face region output by the local channel and the whole human face contour output by the whole channel, setting the pixel of the overlapping region as the maximum value on the region, and obtaining the final generated human face I ^pred ；

Step 4, generating a face I ^pred With a corresponding frontal face I ^gt Inputting the data into a discriminator in sequence to ensure the accuracy of face generation, wherein I ^gt Inputting a front face corresponding to a deflected face;

step 5, generating a human face I ^pred With a corresponding frontal face I ^gt Meanwhile, the identity consistency of the generated face and the input deflection face is ensured when the generated face and the input deflection face are input into the classifier;

and 6, storing the generated confrontation network model obtained by training, and correcting the deflected face during testing.

As a further technical solution of the present invention, step 1 specifically comprises:

step 1.1, uniformly normalizing the size of the face to 128 multiplied by 128, and constructing a caffe deep learning environment;

step 1.2, under a caffe deep learning environment, detecting feature points of a face, and positioning five feature points of a center point of eyes, a nose tip point and two side points of a mouth corner;

and step 1.3, obtaining each region block in the face, namely the mouth, the nose and the eyes according to the five feature points obtained in the step 1.2, wherein in order to ensure that the network can train smoothly, the size of the region blocks among different faces is kept constant, the size of the eye region is set to be 40 × 40, and the size of the nose and the mouth are respectively set to be 32 × 40 and 32 × 48.

As a further technical solution of the present invention, step 2 specifically is:

step 2.1, to the front face I ^gt Extracting face blocks of the eye region, the nose region and the mouth region in the same way to respectively obtain eye ^gt 、Eyer ^gt 、Nose ^gt 、Mouth ^gt 。

Step 2.2: aiming at a local channel and an integral channel, designing a generator structure into a U-Net structure to ensure that the sizes of pictures before and after generation are kept unchanged;

step 2.3: for local channels, inputting the face region blocks obtained in the step 1 into a local channel generator to obtain corrected face region blocks, namely eye ^pred 、Eyer ^pred 、Nose ^pred 、Mouth ^pred (ii) a The loss function of the local channel is designed as an L1 loss function and consists of the following parts:

/>

wherein, W _eyel 、W _eyer 、W _nose 、W _mou Corresponding to the width of the left, right, nose and mouth regions, respectively, equal to 40, 32, H, respectively _eyel 、H _eyer 、H _nose 、H _mou Then the corresponding is height, which is equal in value to 40, 40 and 48, respectively.

The gray values of the corrected left eye, right eye, nose and mouth at the (x, y) coordinate are determined, and the gray values are based on the gray values>

The gray values at (x, y) coordinates are blocked for each region in the corresponding frontal face.

The loss function of the local channel is the sum of the losses of each region block, and lambda is set ₂ Being the weight value of local channel loss, the loss function local _ loss of the local channel can be expressed as follows:

local_loss＝λ ₂ *(eyel_loss+eyer_loss+nose_loss+mouth_loss)

step 2.4: for the whole channel, likewise, the face I will be deflected ⁱⁿ And inputting the data into an encoder structure containing a 3 x 3 convolution kernel to obtain a 128 x 128-dimensional frontal face contour with the same size as the original face.

As a further technical solution of the present invention, step 3 specifically is:

step 3.1: based on the result of the correction of the face of each region block obtained in the step 2, expanding each region block by 0 to obtain the picture with the same size as the original face, namely the face pictures respectively corresponding to the left eye, the right eye, the nose and the mouth of the face;

step 3.2: and obtaining the face pictures corresponding to the local channels according to the four face pictures with the same size obtained in the step 3.1. The method comprises the following steps: comparing gray values of pixel points with the same coordinates in the four face pictures, selecting the maximum value of the gray values, taking the maximum value as the gray value of the coordinate point, and repeating the operation pixel by pixel to finally obtain the face picture corresponding to the local channel;

step 3.3: the local detail face picture obtained in the last step is fused with the front face contour obtained in the overall channel in the step 2.4, and a final generated picture I is obtained through a plurality of applied loss functions ^pred ；

The loss function applied in step 3.3 is specifically:

1) Pixel-by-pixel level L1 penalty:

wherein W and H are width and height of the face picture, both of which are 128,

and &>

Respectively represent and generate a face picture I ^pred And the gray value corresponding to the face at the coordinates (x, y);

2) Generating the antagonistic loss:

since the generator performs countercheck generation with the discriminator and the classifier respectively in the invention, L in the above formula _G-D And L _G-C The loss function is established when the generator respectively confronts with the discriminator and the classifier, the confrontation loss of the generator is designed as the weighted sum of the two parts aiming at the quality of the generated picture and keeping identity consistency, and the weights are respectively set as lambda ₀ And λ ₁ . The two parts correspond to the following specific formula, wherein I ^pred' Representational generation of face picture I ^pred Output via discriminator D, W andh is the width and height of the face picture, both 128,

and &>

Each represents I ^pred And I ^pred' The gray value at the coordinates (x, y),

features extracted for the classifier: />

3) Face symmetry loss:

and &>

Respectively represent and generate face pictures I ^pred Gray values at coordinates (x, y) and coordinates (W- (x-1), y);

4) The regularization loss function is a function that accounts for the noise present in the generated face picture.

As a further technical scheme of the present invention, step 4 specifically comprises:

step 4.1: the face picture I obtained in the step 3.3 is processed ^pred With front face I ^gt Successively inputting the images into a discriminator, wherein the discriminator is structurally a self-encoder, and the output of the discriminator is a face image;

and 4.2: will generate a face I ^pred With front face I ^gt The output face pictures after passing through the discriminator are respectively marked as I ^pred' And I ^gt' The pixel-by-pixel errors are calculated separately, i.e. the L1 norm error is calculated for each pixel in the picture and summed, as follows:

and &>

Each represents I ^gt 、I ^pred 、I ^gt' And I ^pred' A gray value at coordinates (x, y);

step 4.3: the penalty function for discriminator D is designed as:

L _D ＝L(I ^gt )-k _t L(I ^pred )

wherein k is _t The degree of importance, k, to the discriminator can be controlled manually _t The larger the value, the stronger the judgment ability of the representative setting discriminator. During the training of the arbiter, it is required to minimize L _D The method requires that the pixel-by-pixel error of the face is generated maximally, and simultaneously the pixel-by-pixel error of the real face is minimized, so that the aim of distinguishing the two faces is fulfilled;

step 4.4: for the game process between the discriminator and the generator, the loss function of the generator is designed as follows:

L _G-D ＝L(I ^pred )

contrary to the discriminator, the generator aims to make the generated face as close as possible to the real face, so the loss function of the generator generates the face I for minimization ^pred The quality of the generated face is ensured by the pixel-by-pixel error;

step 4.5: according to the loss functions in the step 4.3 and the step 4.4, the discriminator and the countermeasure are continuously generated in a countercheck mode, the expression capacities of the discriminator and the countermeasure are stronger and stronger, and the training data are more and more vivid.

As a further technical solution of the present invention, step 5 specifically is:

step 5.1: the face picture I obtained in the step 3.3 is processed ^pred Corresponding front face I ^gt Inputting the images into a classifier (assuming that a training set has N faces, and then a training set has 2N pictures), wherein the classifier C is a Light-CNN model which is trained in advance and is used as a separate structure to confront with a generator;

step 5.2: front face I ^gt And generator generating face I ^pred The features extracted by the classifier are respectively recorded as

And

defining the result label of the classifier as 1-2N, and the classification target of the classifier is to classify the front face picture I ^gt Corresponding to the first N labels, generating a picture I ^pred And corresponding to the last N labels. The cross entropy loss functions thereof respectively correspond to:

wherein j is in the range of { 1.,. 2N },

and &>

Classifier labels respectively representing the front face and corresponding to the generated face picture>

Represents the jth front face picture I ^gt Corresponding correct label, in conjunction with a suitable key, on the key of the key reader>

Representing the classification label of the front face picture by the classifier; />

Representing the jth generated face picture I ^pred Corresponding correct tag, { C (I) ^pred )} _j Output labels after the representative classifier classifies the output labels;

step 5.3: the classifier loss function is designed as the weighted sum of the two cross entropy loss functions, and alpha is a weight factor, because the generated face is relatively close to the face picture corresponding to the generated face in the feature space. The loss function of the classifier is formulated as follows:

step 5.4: for the game process between the generator and the classifier, designing the loss function corresponding to the generator as follows:

wherein the content of the first and second substances,

the calculation formula of (a) is as follows:

namely, the cosine distance between the generated face picture and the corresponding front face picture characteristic is minimized, so that the aim of keeping identity consistency between the generated face picture and the corresponding front face picture is fulfilled;

step 5.5: weighting the generator in the step 4.4 and the generator in the step 5.4 with the confrontation loss of the discriminator and the classifier respectively to obtain the generated confrontation loss of the generator in the step 3.4, wherein a formula is rewritten as follows:

step 5.6: the classifier and the generator respectively have corresponding confrontation loss functions, the classifier and the generator continuously confront and generate, the expression capacities of the classifier and the generator are stronger and stronger, and the training data are closer and closer to the identity information of the original face.

As a further technical solution of the present invention, step 6 specifically is:

step 6.1: storing the network model parameters during training;

step 6.2: extracting feature points of the test set deflection human face picture, wherein the feature points are extracted in the same method as the feature points of the training set under a caffe deep learning environment;

step 6.3: extracting face blocks (eyes, a nose and a mouth) of the deflected face region according to the feature points obtained in the step 5.2, wherein the sizes of the face blocks are consistent with the corresponding sizes in the training set, namely the sizes of the eyes, the nose and the mouth are 40 multiplied by 40, 32 multiplied by 40 and 32 multiplied by 48 respectively;

step 6.4: the network model obtained in the training is used for correcting the test set face region blocks, and similarly, the face contour after the whole face of the deflected face is corrected can also be obtained;

step 6.5: and (4) fusing two outputs of the local channel and the whole channel according to the network model parameters in the training set, and keeping the loss function of the generator consistent with that in the step (3.4), thereby obtaining a generated front face picture corresponding to the test set deflection face.

Has the advantages that: compared with the prior art, the technical scheme of the invention has the following beneficial technical effects:

1) Firstly, different from other single-channel network structures, the invention adopts a double-channel network structure to recover the face from two channels of local details and the whole face, compared with the problem of detail loss caused by independent input of the face into the network, the double-channel structure can better retain face detail information, thereby generating a more vivid front face picture.

2) Secondly, a special structure is used before and after the face is corrected to keep the identity feature consistency of the face, the structure is used as a third structure for generating an antagonistic network to participate in network training, and experimental results show that the method can better accelerate network convergence and keep face identity information.

3) Finally, the discriminator is designed as a self-encoder, the output of the self-encoder is still a face picture, the Wassertein distance of pixel-by-pixel error distribution of the generated face and the original face is minimized, the loss function of the discriminator is an energy loss function, and the continuous energy value is different from a traditional generation countermeasure network which defines the output of a generator as a discrete value, and the face picture with high resolution can be better generated.

Drawings

FIG. 1 is a flow chart of the overall process of the present invention.

Fig. 2 shows distribution positions and serial numbers of 68 contour feature points extracted in a caffe environment.

Fig. 3 is a schematic diagram of the structure of the generator.

Fig. 4 is a schematic diagram of the structure of the discriminator.

Fig. 5 is a schematic diagram of a classifier structure.

Fig. 6 is a schematic diagram of the correction effect of the present invention on a Multi-PIE data set.

Fig. 7 is a schematic diagram of the positive transfer effect on the LFW test set, wherein (a) is an original face image in the LFW test set, (b) is a positive transfer result obtained by the present invention, and (c) and (d) are corresponding result images in other studies.

Detailed Description

The technical solution of the present invention will be further described in detail with reference to the following examples and accompanying drawings.

Spyder software is selected as a programming tool under a Linux operating system, and a confrontation network model is established and generated. This example was trained using 13 different pose pictures of 337 individuals of the Multi-PIE face library under the same lighting conditions and tested on an LFW deflection face data set.

Fig. 1 is a schematic diagram of the network structure of the present invention, and the specific steps are as follows:

step 1: the method comprises the following steps of detecting characteristic points of a human face, and extracting human face region blocks (eyes, a nose and a mouth) with fixed sizes, wherein the specific steps are as follows:

step 1.1, uniformly normalizing the size of the face to be 128 multiplied by 128, and constructing a caffe deep learning environment;

step 1.2, according to the combination Data-drive and Model-drive Methods for Robust Facial Landmark Detection [ J ]. IEEE Transactions on Information forces & Security,2018, the proposed key store extraction method is used for detecting the feature points of the face under the condition of caffe deep learning, and positioning five feature points of the center point of the eyes, the nose tip point and the two side points of the mouth corner;

and step 1.3, obtaining each region block (mouth, nose and eyes) in the face according to the five feature points obtained in the step 1.2, wherein in order to ensure that the network can train smoothly, the size of the region blocks among different faces is kept constant, the size of the eye region is set to be 40 × 40, and the sizes of the nose and the mouth are respectively set to be 32 × 40 and 32 × 48.

Step 2: respectively inputting the human face region blocks and the whole human face obtained in the step 1 into a local channel and a whole channel to obtain a corrected result of the local channel and the whole channel, and specifically comprising the following steps:

Step 2.2: aiming at a local channel and an overall channel, designing a generator structure into a U-Net structure so as to ensure that the sizes of pictures before and after generation are kept unchanged;

step 2.3: for local channels, the face region blocks obtained in the step 1 are input into a local channel generator to obtain correctionThe later face region is divided into eye regions ^pred 、Eyer ^pred 、Nose ^pred 、Mouth ^pred ；

The loss function of the local channel is designed as an L1 loss function and consists of the following parts:

local_loss＝λ ₂ *(eyel_loss+eyer_loss+nose_loss+mouth_loss)

And step 3: the output of the local channel, the corrected human face area and the output of the whole channel, the whole human face outline are fused, the pixel of the overlapping area is set as the maximum value on the area, and the final generated human face I is obtained ^pred The method comprises the following specific steps:

step 3.3: the local detail face picture obtained in the last step is fused with the front face contour obtained in the integral channel in the step 2.4, and a final generated picture I is obtained through a plurality of applied loss functions ^pred ；

The loss function applied in step 3.3 is specifically:

1) Pixel-level L1 penalty:

and &>

2) Generating the confrontation loss:

since the generator performs the antagonistic generation with the discriminator and the classifier respectively in the present invention, L in the above formula _G-D And L _G-C The loss function is established when the generator respectively confronts with the discriminator and the classifier, the confrontation loss of the generator is designed as the weighted sum of the two parts aiming at the quality of the generated picture and keeping identity consistency, and the weights are respectively set as lambda ₀ And λ ₁ . The two parts correspond to the following specific formula, wherein I ^pred' Representing generated face picture I ^pred The output of the discriminator D shows that W and H are the width and height of the face picture, which are both 128,

and &>

Each represents I ^pred And I ^pred' The gray value at coordinate (x, y),

features extracted for the classifier:

3) Face symmetry loss:

wherein W and H are width and height of the face pictureThe number of the channels, both 128,

and &>

Respectively represent and generate a face picture I ^pred Gray values at coordinates (x, y) and coordinates (W- (x-1), y);

Fig. 3 is a schematic diagram of the structure of the generator.

And 4, step 4: will generate a face I ^pred With corresponding frontal face I ^gt The method comprises the following steps of inputting the sequence into a discriminator to ensure the accuracy of face generation:

step 4.2: will generate a face I ^pred With front face I ^gt The output face pictures after passing through the discriminator are respectively marked as I ^pred' And I ^gt' The pixel-by-pixel errors are calculated separately, i.e. the L1 norm error is calculated for each pixel in the picture and summed, as follows:

and &>

Each represents I ^gt 、I ^pred 、I ^gt' And I ^pred' A gray value at coordinate (x, y);

step 4.3: the penalty function for design arbiter D is:

L _D ＝L(I ^gt )-k _t L(I ^pred )

wherein k is _t The degree of importance of the discriminator, k, can be controlled manually _t The larger the value, the stronger the judgment ability of the representative setting discriminator. During the training of the arbiter, it is required to minimize L _D That is, the pixel-by-pixel error of the face is required to be generated maximally, and the pixel-by-pixel error of the real face is minimized, so that the aim of distinguishing the two faces is fulfilled;

L _G-D ＝L(I ^pred )

step 4.5: according to the loss functions in the step 4.3 and the step 4.4, the discriminator and the confrontator continuously confront and generate, the expression capacities of the discriminator and the confrontator are stronger and stronger, and the training data are more and more vivid.

Fig. 4 is a schematic diagram of the structure of the discriminator.

Step 5, generating a human face I ^pred With a corresponding frontal face I ^gt Meanwhile, the identity consistency of the generated face and the input deflection face is ensured by inputting the face into a classifier, and the specific steps are as follows:

step 5.1: the face picture I obtained in the step 3.3 is processed ^pred Corresponding to the front face I ^gt Inputting the images into a classifier (assuming that a training set has N faces, and then one image has 2N pictures), wherein the classifier C is a Light-CNN model which is trained in advance and is used as an independent structure to confront with a generator;

and step 5.2: front face I ^gt And generator generating face I ^pred The features extracted by the classifier are respectively recorded as

And

defining the result label of the classifier as 1-2N, wherein the classification target of the classifier is to classify the front face picture I ^gt Corresponding to the first N labels to generate a picture I ^pred And corresponding to the last N labels. The cross entropy loss function is respectively as follows:

wherein j belongs to { 1., 2N },

and &>

Representing the jth front face picture I ^gt Corresponding correct tag, { C (I) ^gt )} _j Representing the classification label of the front face picture by the classifier; />

Representing the jth generated face picture I ^pred Corresponding correct tag, { C (I) ^pred )} _j Output labels after the classifier classifies the output labels;

therein are

The calculation formula of (a) is as follows:

step 5.5: weighting the generator in the step 4.4 and the generator in the step 5.4 with the confrontation loss of the discriminator and the classifier respectively to obtain the generation confrontation loss of the generator in the step 3.4, wherein a formula is rewritten as follows:

Fig. 5 is a schematic diagram of the structure of the classifier.

Step 6, storing the generated confrontation network model obtained by training, and correcting the deflected human face during testing, wherein the specific steps are as follows:

step 6.1: storing the network model parameters during training;

step 6.4: the network model obtained in training is used for correcting the test set face region blocks, and similarly, the face contour after the whole face of the deflected face is corrected can also be obtained;

step 6.5: and (5) fusing two outputs of the local channel and the whole channel according to the network model parameters in the training set, and keeping the loss function of the generator consistent with that in the step (3.4), thereby obtaining a generated front face picture corresponding to the deflection face of the test set.

Fig. 7 is a schematic diagram of the positive effects on the LFW test set.

The invention is based on the BEGAN in the generation countermeasure network, and the structure minimizes Wassertein distance of pixel-by-pixel error distribution of human faces before and after the generation of the BEGAN and can avoid a plurality of problems in the traditional generation countermeasure network; the method comprises the following steps of accurately recovering a face structure from a local layer and an integral layer by using a dual-channel network structure; furthermore, by using a network structure of the three countermeasures, the classifier keeping identity consistency is used as an independent structure to be countermeasures with the generator instead of being used as a part of the generation process of the supervision generator of the loss function, the accuracy and the speed of the method for correcting the deflected face are improved, and a good correcting effect can be obtained under the condition of large-angle deflection.

The above description is only an embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can understand that the modifications or substitutions within the technical scope of the present invention are included in the scope of the present invention, and therefore, the scope of the present invention should be subject to the protection scope of the claims.

Claims

1. A method for correcting a deflected face based on an improved structure of a generation countermeasure network is characterized by comprising the following steps:

step 2, inputting the face region blocks and the whole face obtained in the step 1 into a local channel and a whole channel respectively to obtain a corrected result of the local channel and the whole channel;

Step 4, generating a face I ^pred With corresponding frontal face I ^gt Inputting the data into a discriminator in sequence to ensure the accuracy of face generation, wherein I ^gt Inputting a front face corresponding to a deflected face;

step 5, generating a human face I ^pred With corresponding frontal face I ^gt Meanwhile, the identity consistency of the generated face and the input deflection face is ensured when the generated face and the input deflection face are input into the classifier;

and 6, storing the generated confrontation network model obtained by training, and correcting the deflected human face during testing.

2. The method for correcting the deflected face based on the improved structure of the generative confrontation network as claimed in claim 1, wherein the step 1 specifically comprises:

step 1.2, detecting feature points of a face under a mask deep learning environment, and positioning five feature points including a central point of two eyes, a nose tip point and two side points of a mouth corner;

step 1.3, obtaining each region block in the human face, namely the mouth, the nose and the eyes, according to the five feature points obtained in step 1.2, wherein the size of the region blocks between different human faces is kept constant, the size of the eye region is set to be 40 × 40, and the size of the nose and the mouth are respectively set to be 32 × 40 and 32 × 48.

3. The method for correcting the deflected face based on the improved structure of the generated countermeasure network as claimed in claim 2, wherein the step 2 is specifically:

step 2.1, face to face I ^gt Extracting face blocks of the eye region, the nose region and the mouth region in the same way to respectively obtain eye ^gt 、Eyer ^gt 、Nose ^gt 、Mouth ^gt ；

step 2.3: for local channels, inputting the face region blocks obtained in the step 1 into a local channel generator to obtain corrected face region blocks, namely eye ^pred 、Eyer ^pred 、Nose ^pred 、Mouth ^pred The loss function of the local channel is designed as an L1 loss function and consists of the following parts:

/>

wherein, W _eyel 、W _eyer 、W _nose 、W _mou Corresponding to the width of the left, right, nose and mouth regions, respectively, equal to 40, 32, H, respectively _eyel 、H _eyer 、H _nose 、H _mou Corresponding is a height, equal in value to 40, 40 and 48 respectively,

the gray values of the corrected left eye, right eye, nose and mouth at the (x, y) coordinate position are determined in the manner of combining the gray values in the left eye, the right eye, the nose and the mouth in the right direction, and the gray values are combined in the device>

Dividing each region in the face corresponding to the front into gray values at the (x, y) coordinates;

the loss function of the local channel is the sum of the losses of each region block, and lambda is set ₂ Which is the weight value of the local channel loss, the loss function local _ loss of the local channel can be expressed as follows:

local_loss＝λ ₂ *(eyel_loss+eyer_loss+nose_loss+mouth_loss)

step 2.4: for the whole channel, the human face I will be deflected ⁱⁿ And inputting the data into an encoder structure containing a 3 x 3 convolution kernel to obtain a 128 x 128-dimensional frontal face contour with the same size as the original face.

4. The method for correcting the deflected face based on the improved structure of the generative confrontation network as claimed in claim 3, wherein step 3 is specifically:

step 3.2: according to the four face pictures with the same size obtained in the step 3.1, the face picture corresponding to the local channel is obtained, and the specific method is as follows: comparing gray values of pixel points with the same coordinates in the four face pictures, selecting the maximum value of the gray values, taking the maximum value as the gray value of the coordinate point, and repeating the operation pixel by pixel to finally obtain the face picture corresponding to the local channel;

step 3.3: fusing the local detail face picture obtained in the last step with the front face contour obtained in the overall channel in the step 2.4, and training through a plurality of applied loss functions to obtain a final generated picture I ^pred ；

The loss function applied in step 3.3 is specifically:

1) Pixel-by-pixel level L1 penalty:

wherein W and H are the width and height of the face picture, both of which are 128,

and &>

2) Generating the confrontation loss:

l in the above formula _G-D And L _G-C The loss function is established when the generator respectively confronts the discriminator and the classifier, the confrontation loss of the generator is designed as the weighted sum of the two parts aiming at the quality of the generated picture and keeping identity consistency, and the weights are respectively set as lambda ₀ And λ ₁ The two parts correspond to the following specific formula, wherein I ^pred ' Generation of face Picture I ^pred The output of the discriminator D shows that W and H are the width and height of the face picture, which are both 128,

and &>

Each represents I ^pred And I ^pred ' Gray value at coordinate (x, y)>

Features extracted for the classifier: />

3) Face symmetry loss:

and &>

5. The method for generating a deflected face correction based on an improved structure of a confrontation network as claimed in claim 4, wherein the step 4 is specifically:

step 4.1: the face picture I obtained in the step 3.3 ^pred With front face I ^gt Successively inputting the images into a discriminator, wherein the discriminator is structurally a self-encoder, and the output of the discriminator is a face image;

step 4.2: will generate a face I ^pred With front face I ^gt The output face pictures after the discriminator are respectively marked as I ^pred' And I ^gt' The pixel-by-pixel errors are calculated separately, i.e. the L1 norm error is calculated for each pixel in the picture and summed, as follows:

and &>

step 4.3: the penalty function for design arbiter D is:

L _D ＝L(I ^gt )-k _t L(I ^pred )

wherein k is _t The degree of attention to the arbiter can be manually controlled, and the minimum L is required in the training process of the arbiter _D That is, the pixel-by-pixel error of the face is required to be generated maximally, and the pixel-by-pixel error of the real face is minimized, so that the aim of distinguishing the two faces is fulfilled;

L _G-D ＝L(I ^pred )

loss function of generator for generating face I for minimization ^pred The quality of the generated face is ensured by the pixel-by-pixel error;

step 4.5: and (4) continuously carrying out countercheck generation on the discriminator and the countercheck according to the loss functions in the step 4.3 and the step 4.4 until the training reaches a preset condition.

6. The method for correcting a deflected face based on an improved structure of a generative confrontation network as claimed in claim 5, wherein the step 5 is specifically:

step 5.1: the face picture I obtained in the step 3.3 is processed ^pred Corresponding front face I ^gt Inputting the images into a classifier together, wherein a training set is assumed to have N faces in total, then 2N pictures are in total, and the classifier C is a Light-CNN model trained in advance and is used as an independent structure to confront with a generator;

And &>

Defining the result label of the classifier as 1-2N, wherein the classification target of the classifier is to classify the front face picture I ^gt Corresponding to the first N labels, generating a picture I ^pred Corresponding to the last N labels, the cross entropy loss functions of the N labels are respectively corresponding to:

wherein j is in the range of { 1.,. 2N },

and &>

Classifier labels corresponding to the face of the person and the generated face picture respectively,

step 5.3: the classifier loss function is designed as the weighted sum of the two cross entropy loss functions, alpha is a weight factor, and the loss function formula of the classifier is expressed as follows:

step 5.4: aiming at the game process between the generator and the classifier, designing the loss function corresponding to the generator as follows:

wherein the content of the first and second substances,

the calculation formula of (a) is as follows:

namely, the cosine distance between the generated face picture and the corresponding front face picture characteristic is minimized;

step 5.6: the classifier and the generator respectively have corresponding confrontation loss functions, and the classifier and the generator continuously confront and generate until the training reaches a preset condition.

7. The method for correcting a deflected face based on an improved structure of a generative confrontation network as claimed in claim 6, wherein step 6 specifically comprises:

step 6.1: storing the network model parameters during training;

step 6.2: extracting the characteristic points of the test set deflection human face picture, wherein the extraction method is the same as that of the training set and is completed in a caffe deep learning environment;

step 6.3: extracting the face blocks of the deflected face region, namely the eyes, the nose and the mouth, according to the feature points obtained in the step 5.2, wherein the sizes of the eyes, the nose and the mouth are consistent with the corresponding sizes in the training set, namely the sizes of the eyes, the nose and the mouth are respectively 40 × 40, 32 × 40 and 32 × 48;