CN110543846B

CN110543846B - Multi-pose face image obverse method based on generation countermeasure network

Info

Publication number: CN110543846B
Application number: CN201910806159.5A
Authority: CN
Inventors: 张星明; 容昌乐; 林育蓓
Original assignee: South China University of Technology SCUT
Current assignee: South China University of Technology SCUT
Priority date: 2019-08-29
Filing date: 2019-08-29
Publication date: 2021-12-17
Anticipated expiration: 2039-08-29
Also published as: CN110543846A

Abstract

The invention discloses a multi-pose face obverse method based on generation of a confrontation network. In the testing stage after training, the invention can correct the input various pose face pictures into the front face image. The corrected image is clear, the identity characteristics of the original face are kept, and the face correction method can be used for face recognition. The invention can effectively slow down the negative influence of the posture factor on the face recognition, and is beneficial to the development of the practical application of the face recognition under the non-limiting condition.

Description

Multi-pose face image obverse method based on generation countermeasure network

Technical Field

The invention relates to the technical field of image processing, in particular to a multi-pose face image obverse method based on a generation countermeasure network.

Background

At present, the face recognition technology is widely applied to a plurality of fields such as entrance guard security, social networking, finance and the like. However, in most practical scenes, the face recognition technology needs to be used efficiently under strict standard environment. Generally, a detected person needs to be in a scene with sufficient and uniform illumination, so that a calm expression is kept, and a standard posture is adjusted by matching with an image acquisition device. However, in many practical application fields, such as suspects tracking, the above conditions are often difficult to satisfy, which causes a great reduction in performance of many face recognition techniques, and face recognition techniques are difficult to popularize in these fields. Among the adverse factors affecting the performance of face recognition technology, the pose of a photographic face is the most important. The gesture problem is well processed, and the application of the face recognition technology in the non-limiting environment is greatly stepped.

One of the methods for dealing with the pose problem is to perform a positive correction on an input side face image, that is, correct one side face image into a front face image of the same person, and then recognize the person identity of the synthesized front face image. At present, most of multi-pose face image orthogonalization methods lack the processing capability of face images with deflection angles exceeding 60 degrees, and the face images synthesized by the methods have serious deformation and lose the identity characteristics of people, so that the subsequent face recognition work is difficult to perform. The positive method of the multi-pose face image with better effect is based on the generation of the confrontation network.

Compared with other multi-pose face image obverse methods based on the generation of the confrontation network, the method adopts different network structures and loss functions. Even if the face image with the deflection angle exceeding 60 degrees is input, the model can also synthesize vivid face images and retain more person identity information, and the efficiency of subsequent face recognition work is greatly improved.

Disclosure of Invention

The invention aims to overcome the defects of the existing multi-pose face image obverse method, provides a multi-pose face image obverse method based on a generated countermeasure network, solves the problem of poor effect of similar methods under the condition that the deflection angle of an input face exceeds 60 degrees, improves the fidelity of a synthesized face, and retains the identity information of more image faces.

In order to achieve the purpose, the technical scheme provided by the invention is as follows: a multi-pose face image frontal method based on a generation countermeasure network comprises the following steps:

1) collecting the face images of all postures as a training set and a test set, wherein each input face image I of any posture needs to be ensured^aCan find the non-synthesized front face image I of the same person in the data set^g；

2) In the training stage, the training is focused on any postureFace image of state I^aInput into generator G to obtain corrected code X₂And a synthesized front face image I^fA non-synthesized front face image I^gInputting into a generator G to obtain a face-up code X₃；

3) Combining the synthesized front face image I^fOr non-synthesized frontal face image I^gInputting a discrimination network D for discriminating whether the input face image is synthesized or not, and synthesizing the synthesized face image I^fOr non-synthesized frontal face image I^gInputting a human face identity characteristic extractor F, and extracting the character identity characteristic of the human face image through the F;

4) combining the discrimination result of the step 3) with the extracted person identity characteristics and the synthesized front face image I^fNon-synthesized front face image I^gCorrecting code X₂And face-up coding X₃Carrying in each pre-designed loss function, and alternately training the generator G and the discrimination network D until the training is finished;

5) in the testing stage, the human face image I with any posture is taken^aInputting the trained generator G to obtain a synthesized front face image I^fBy direct observation of the resultant frontal image I^fThe effect is verified.

In step 1), all face images of the respective poses used by the data set are derived from the data set Multi _ Pie; the number of pictures in the data set is more than 75 ten thousand, and the images under 20 kinds of illumination and 15 kinds of postures of 337 individuals are included; the illumination of the picture is changed from dark to light by illumination numbers 01 to 20, wherein the illumination number 07 is a standard illumination condition; marking all face images of a data set as I^aFor each image I^aIn the data set, the images of the same person with a face deflection angle of 0 ° and an illumination index of 07 are found, and they are marked as I^g。

In step 2), the generator G consists of an attitude estimator P, an encoder En, an encoding residual error network R and a decoder De; the pose estimator P estimates the pose of the face by adopting a PnP algorithm, further obtains the deflection angle of the face in the yaw direction, and is realized by a function cv2.solvePnP () of an open-source opencv library; the encoder En is a convolutional neural network, the encoding residual error network R is a two-layer fully-connected neural network, and the decoder De is a deconvolution neural network;

the process of the generator G for synthesizing the picture is as follows: inputting human face image I with any posture to generator G^aThe encoder En converts it to the initial code X₁The coding residual network R is formed by initially coding X₁Estimating a coded residual R (X)₁) The pose estimator P calculates the accurate deflection angle gamma of the face in the yaw direction in the input image, the deflection angle gamma inputs the function Y to obtain the weight Y (gamma) of the coding residual error, and the initial coding X₁Fusing with the coding residual to obtain a modified code X₂Wherein X is₂＝X₁+Y(γ)×R(X₁) Encoding the correction X₂Input to a decoder De, and generate a front face image I by deconvolution^f。

In step 3), the discriminating network D is a convolutional neural network-based classifier for determining whether the input image is from the generator G or the original image data; the face identity characteristic extractor F adopts the Light-CNN-29 with an opened source, the Light-CNN29 is a lightweight convolutional neural network, the depth is 29 layers, and the number of parameters is 1200 ten thousand.

In step 4), the objective of the loss function is to minimize the synthesized frontal face image I^fAnd a non-synthesized frontal face image I^gThereby making a composite front face image I^fMore identity information of the input face image can be reserved; the loss function used in the step 4) comprises a self-created coding loss function besides a pixel loss function, an identity loss function, a symmetrical loss function and a counter loss function which are commonly used in the same type of method, wherein the first target of the coding loss function is to input a face image I^aAnd a non-synthesized frontal face image I^gThe codes obtained by the encoders En are closer, and because the effect of the existing face posture correction method is better under the condition that the deflection angle of the input face is smaller, the closer the code of the input face image is to the code of the same face under the condition of 0-degree deflection angle, the better the synthesized face correcting effect is;thus, the first part of the coding loss function is formulated as follows:

where N is the dimension of the respective code, En is the encoder, En (I)^a)ⁱIs the initial coding En (I)^a) The value of the ith dimension of (a); r is the coding residual network, R (En (I)^a))ⁱIs the coded residual R (En (I)^a) A value of the ith dimension of); en (I)^a)ⁱ+R(En(I^a))ⁱA value equal to the ith dimension of the modified code; x₃ ⁱIs a front face code X obtained by a coder from a non-synthesized front face image₃The value of the ith dimension of (a); the first part of the coding loss function is the Manhattan distance between the modified code and the front face code;

the second objective of the coding loss function is to differentiate between different characters in the modified code En (I)^a)+R(En(I^a) A full-connected layer C is constructed after the full-connected layer C, the number of neurons of the full-connected layer C is equal to the number M of people in the training set, and the second part of the coding loss function adopts a cross-entropy loss function:

wherein M is the number of people in the training set; y is_iIs the value of the ith dimension of a one-hot vector y indicating the input face image I^aWhich person belongs to the training set, if image I^aBelonging to the jth person, the jth dimension of the vector y is 1, the remaining dimensions are 0, and the y dimension is M; en (I)^a)+R(En(I^a) Is the correction code; c (En (I)^a)+R(En(I^a)))_iIs to correct the feature vector C (En (I) obtained by encoding through the full connection layer C^a)+R(En(I^a) ) a value of the ith dimension;

thus, the complete coding loss function is:

L_code＝L_code1+λL_code2

in the formula, λ is a constant having a value of 0.1, and represents a weight.

In the step 4), the generator G and the discrimination network D are alternately trained to enable the two to be mutually optimized and improved in confrontation; in the initial stage, the face image generated by the generator G is blurred, and the judgment network D can easily judge the source of the input image, so that the generator G is stimulated to generate a clearer image, and the quality of the generator G is improved; in the subsequent stage, the image generated by the generator G is clearer and is close to the original image data, so that the judgment network is excited to judge the input image more accurately, and the judgment capability of the judgment network D is improved.

Compared with the prior art, the invention has the following advantages and beneficial effects:

1. the invention fully utilizes the rotation angle information of the input face image, defines the related coding loss function and is beneficial to synthesizing the front face image with higher quality.

2. When the deflection angle of the input face image exceeds 60 degrees, the invention can also generate clear and vivid front face images without deformation.

3. The front face image synthesized by the invention can keep the identity information of the input face image, thereby being beneficial to reducing the adverse effect of face posture transformation on face identification and bringing convenience to the subsequent face identity identification work.

4. From the analysis of the actual application scene, the method is expected to promote the development of the fields of suspects tracking and the like. The efficiency of the related work is improved by rectifying the side face image of the target person into the front face image.

Drawings

FIG. 1 is a flow chart of the method of the present invention.

Fig. 2 is a generator composite image flow diagram.

Fig. 3 is a diagram of an encoder neural network architecture.

Fig. 4 is a diagram of a decoder neural network architecture.

Fig. 5 is a diagram of a discrimination network structure.

FIG. 6 is a diagram showing the effect of the present invention.

Detailed Description

To describe the present invention more specifically, the following detailed description will explain the technical solution of the present invention in conjunction with the steps, the accompanying drawings and the detailed description.

As shown in fig. 1, the method for generating a multi-pose face image obverse based on a confrontation network provided by this embodiment includes the following steps:

1) collecting the face images of all postures as a training set and a test set, wherein each input face image I of any posture needs to be ensured^aThe non-synthesized front face image I with the same face and the face deflection angle of 0 degree can be found in the data set^g。

The images of the dataset used were derived from the Multi _ Pie face dataset, which contained 337 images of the individual under 20 illuminations and 15 poses, the number of pictures in the dataset being over 75 million, containing 337 images of the individual under 20 illuminations and 15 poses; the illumination of the picture is changed from dark to light by illumination numbers 01 to 20, wherein the illumination number 07 is a standard illumination condition; marking all face images of a data set as I^aFor each image I^aIn the data set, the images of the same person with a face deflection angle of 0 ° and an illumination index of 07 are found, and they are marked as I^g. The data set is subjected to preprocessing steps such as face detection, face interception and the like before use. Selecting 13 postures within 90 degrees of a deflection angle, taking images under all lighting conditions as a data set, dividing the images of the first 200 persons into a training set, and dividing the images of the remaining 137 persons into a testing set. All images of the dataset were normalized and resize operated. Wherein, the normalization is to divide the value of all pixels of the image by 255.0 to make the value range of all pixels of the image be [0,1]Resize refers to adjusting the dimensions of all images in a dataset to 128 × 128 × 3 using bilinear interpolation.

2) In the training stage, the face image I with any posture in the training set is processed^aInput into generator G to obtain corrected code X₂And the synthesized frontal face imageI^f. Non-synthesized face image I^gInputting into a generator G to obtain a face-up code X₃。

The generator G is used for inputting a face image I^aConverted into a composite frontal face image I^fThe system consists of an attitude estimator P, an encoder En, an encoding residual error network, an R and a decoder De. As shown in fig. 2, the process of the generator G synthesizing the picture is: inputting human face image I with any posture to generator G^aThe encoder En converts it to the initial code X₁The coding residual network R is formed by initially coding X₁Estimating a coded residual R (X)₁) The pose estimator P calculates the accurate deflection angle gamma of the face in the yaw direction in the input image, the deflection angle gamma inputs the function Y to obtain the weight Y (gamma) of the coding residual error, and the initial coding X₁Fusing with the coding residual to obtain a modified code X₂Wherein X is₂＝X₁+Y(γ)×R(X₁) Encoding the correction X₂Input to a decoder De, and generate a front face image I by deconvolution^f。

The pose estimator P is used for solving the accurate face deflection angle γ of the input face image in the yaw direction, and further obtaining the weight of the encoded residual error:

the pose estimator P estimates the pose of the face using the PnP algorithm. The cv2.solvepnp () function in opencv library is used directly at the time of implementation. The parameters required by the function comprise 2D characteristic points of the face image and 3D positions corresponding to the characteristic points. The 2D characteristic points of the human face are directly obtained by a human face characteristic point detection algorithm provided by an open source dlib library. The 3D positions of the individual feature points are the positions of the feature points on the average face model, and these 3D positions are fixed and provided by the relevant documents of the 2D feature point detection algorithm in the dlib library.

The encoder En is a convolutional neural network, the network structure of which is shown in fig. 3, and the role of the encoder En is to input a human face I^aConversion to initial code X₁. Wherein an image I is input^aDimension of (2) is 128 × 128 × 3, and the activation function of the last layer of the encoderMaxout, the initial code X of the output₁The dimension is 256.

The coding residual error network R is a fully-connected neural network with two layers, and the number of neurons in the two layers is 256. Let a non-synthesized front face image I^gThe code obtained by the encoder is X₃The effect of the coding residual network R is to estimate the initial code X₁And face-up coding X₃Coded residual R (X) between₁). Multiplying the encoded residual by a weight and fusing the initial encoding X₁To obtain a corrected code X₂Wherein X is₂＝X₁+Y(γ)×R(X₁)。

The decoder De belongs to a deconvolution neural network, the network structure of which is shown in fig. 4. Its effect is to correct the code X₂Decoding to obtain a composite frontal face image I by a deconvolution step^f. For the existing face correction method based on generation of the confrontation network, the smaller the input face deflection angle is, the higher the quality of the synthesized face image is, and the more the personal identity information can be kept in the image. Due to the modified code X₂Than initial code X₁More closely to face-up code X₃By modifying the code X₂Instead of the initial code X₁And the input decoder can synthesize a human face image with higher quality.

3) Synthesized front face image I^fOr non-synthesized frontal face image I^gAnd inputting a judging network D, and judging whether the input face image is synthesized or not by the judging network D. Then combining the synthesized front face image I^fOr non-synthesized frontal face image I^gAnd inputting the human face identity characteristic extractor F to obtain the character identity characteristic of the input human face image.

The decision network D is a convolutional neural network-based classifier whose network structure is shown in fig. 5, and functions to determine whether the input image is from the generator G or from the original image data, and the final output value of the decision network is used to represent the probability that the input image is from the generator G, and the larger this value, the more likely the input image is from the generator G.

The face identity characteristic extractor F adopts the Light-CNN-29 with the open source. Light-CNN29 is a lightweight convolutional neural network, the depth is 29 layers, the number of parameters is about 1200 ten thousand, the identity features of the input face picture can be extracted, and the dimensionality of the finally extracted identity features is 256.

4) Combining the discrimination result of the step 3) with the extracted person identity characteristics and the synthesized front face image I^fNon-synthesized front face image I^gCorrecting code X₂And face-up coding X₃And (4) carrying in each pre-designed loss function, and alternately training the generator G and the discrimination network D until the training is finished.

The loss function of the training generator comprises a self-created coding loss function besides a pixel loss function, an identity loss function, a symmetric loss function and a confrontation loss function which are commonly used in the same type of method.

First is a pixel loss function that represents the composite frontal face image I^fAnd a non-synthesized frontal face image I^gThe pixel difference between them, the formula is as follows:

where W and H represent the width and height of the image, respectively.

And

respectively representing the synthesized front face image I^fAnd a non-synthesized frontal face image I^gPixel value at coordinate (x, y).

Then a symmetrical loss function is carried out, and a synthesized face image I is obtained in view of the symmetrical characteristic of the face^fShould be compared with the image I obtained after it is flipped left and right^symAs close as possible, the symmetric loss function is formulated as follows:

where W and H represent the width and height of the image, respectively.

And

respectively representing the synthesized front face image I^fAnd an image I obtained by left-right turning^symPixel value at coordinate (x, y).

Then, the face identity characteristic extractor F can efficiently obtain the identity characteristic of the front face image by an identity loss function. Respectively synthesizing the front face images I^fAnd a non-synthesized frontal face image I^gRespectively inputting F to obtain their identity characteristics F (I)^f) And F (I)^g). To ensure a composite frontal face image I^fCan contain non-synthesized frontal face image I^gThe identity information of (2) needs to minimize the manhattan distance between the last two layers of identity feature maps obtained after inputting them into F. The identity loss function is formulated as follows:

W_iand H_iRespectively representing the width and the height of the face identity characteristic map of the ith last layer. And F is a face identity feature extractor.

And

respectively representing a frontal face image I^fAnd a non-synthesized frontal face image I^gThe value of the coordinate (x, y) on the identity profile at the i-last level.

And finally, a loss resisting function, which aims to enable the synthesized image to confuse the discrimination network D, thereby enabling the synthesized image to be closer to a real image and enhancing the fidelity of the synthesized image. The formula for the penalty is as follows:

where N is the size of the current training batch, G and D are the generator and discrimination network, respectively, and I^aAnd G (I)^a) Respectively representing the input face image and the front face image synthesized by the generator. D (G (I)^a) The value of) reflects the composite image G (I)^a) The determined network D determines the possibility of synthesizing the picture. Minimizing the loss function L_advFor the purpose of the generator to synthesize a frontal face image G (I)^a) The composite front face image G (I) can be improved by the detection of the discrimination network^a) The degree of realism of the device.

The first goal of the coding loss function is to let the input face image I^aAnd a non-synthesized frontal face image I^gThe codes obtained by the encoders En are closer, and because the effect of the existing face posture correction method is better under the condition that the deflection angle of the input face is smaller, the closer the code of the input face image is to the code of the same face under the condition of 0-degree deflection angle, the better the synthesized face correcting effect is; thus, the first part of the coding loss function is formulated as follows:

where N is the dimension of the respective code, En is the encoder, En (I)^a)ⁱIs the initial coding En (I)^a) The value of the ith dimension of (a); r is the coding residual network, R (En (I)^a))ⁱIs the coded residual R (En (I)^a) A value of the ith dimension of); en (I)^a)ⁱ+R(En(I^a))ⁱA value equal to the ith dimension of the modified code; x₃i is the frontal face code X obtained by the coder of the non-synthesized frontal face image₃The value of the ith dimension of (a); the first part of the coding loss function is the manhattan distance between the modified code and the front face code.

Said coding loss functionThe second goal of the number is to differentiate between modified codes for different people, in which modified code En (I)^a)+R(En(I^a) A full-connected layer C is constructed, the number of neurons of the full-connected layer C is equal to the number M of people in the training set, and the second part of the coding loss function adopts a cross-entropy loss function:

wherein M is the number of people in the training set; y is_iIs the value of the ith dimension of a one-hot vector y indicating the input face image I^aWhich person belongs to the training set, if image I^aBelonging to the jth person, the jth dimension of the vector y is 1, the remaining dimensions are 0, and the y dimension is M; en (I)^a)+R(En(I^a) Is the correction code; c (En (I)^a)+R(En(I^a)))_iIs to correct the feature vector C (En (I) obtained by encoding through the full connection layer C^a)+R(En(I^a) ) of the ith dimension.

Thus, the complete coding loss function is:

L_code＝L_code1+λL_code2

In summary, the loss function of the training generator is:

L_total＝L_pixel+λ₁L_sym+λ₂L_id+λ₃L_adv+λ₄L_code

wherein L is_pixel、L_sym、L_id、L_advAnd L_codeRespectively representing a pixel loss function, a symmetric loss function, an identity loss function, an opponent loss function and an encoding loss function. Lambda [ alpha ]₁、λ₂、λ₃And λ₄Representing the weights of the different loss functions.

Referring to parameter settings of the same type of method and a large amount of experimental experience, each lossWeight of function lambda₁、λ₂、λ₃And λ₄Are set to 0.2, 0.003, 0.001 and 0.002, respectively. Training a generator and a discrimination network D, wherein the discrimination network D is required to be trained at the same time, the goal of the training D is to enable the discrimination network to distinguish whether the input frontal face image is from the generator G or the original data set, and the loss function of the training discrimination network is as follows:

where N is the size of the current training batch, G and D are the generator and discrimination network, respectively, and I^aIs the input image. G (I)^a) And I^gRespectively representing a composite frontal face image and a non-composite frontal face image. logD (G (I)^a) And logD (I)^a) Respectively reflect the synthesized front face image G (I)^a) And a non-synthesized frontal face image I^gThe determined network D determines the possibility of the synthesized frontal face image. Minimization of L_adv2The purpose of (1) is to enable the discrimination network D to accurately reflect the possibility that the input picture is a composite picture.

Training the generator G and the discrimination network D alternately can enable the two to be mutually optimized and improved in confrontation. In the initial stage, the face image generated by the generator G is blurred, and the judgment network D can easily judge the source of the input image, so that the generator G is stimulated to generate a clearer image, and the quality of the generator G is improved. In the subsequent stage, the image generated by the generator G is clearer and is close to the original image data, so that the judgment network is excited to judge the input image more accurately, and the judgment capability of the judgment network D is improved.

After the loss functions of the generator G and the discrimination network D are designed, parameters of the generated countermeasure network are minimized through an Adam descent method, the learning rate is set to be 0.0002, and the batch size is set to be 12. After each training of generator G, discriminant network D is trained once. With the increase of the training times, the quality of the image generated by the generator is continuously improved, the capability of the discrimination network for discriminating the input image is continuously enhanced, and the training is finally completed. The deep learning frame used in the experiment is Tensorflow, the video card of the computer is 1080ti, and the training is stopped when 2 ten thousand batches are trained.

5) In the testing stage, the human face image I with any input posture is subjected to^aThe training generator G can be used for synthesizing the front face image I of the same human face^fAnd then by direct observation of the resultant frontal face image I^fThe effect of the invention can be verified, and the effect of generating images is as shown in fig. 6, for each line of pictures, the first is a face image with an input deflection angle exceeding 45 °, the second is a front face image synthesized by the generator, and the third is a non-synthesized front face image of the same person in the data set. As can be seen from the figure, for the face images with the deflection angle exceeding 45 degrees, the invention can synthesize the front face images of the face images and can keep the identity information of the original face.

The above-mentioned embodiments are merely preferred embodiments of the present invention, and the scope of the present invention is not limited thereto, so that the changes in the shape and principle of the present invention should be covered within the protection scope of the present invention.

Claims

1. A multi-pose face image obverse method based on a generation countermeasure network is characterized by comprising the following steps:

2) In the training stage, the face image I with any posture in the training set is processed^aInput into generator G to obtain corrected code X₂And a synthesized front face image I^fA non-synthesized front face image I^gInputting into a generator G to obtain a face-up code X₃；

2. The method for generating the multi-pose face image obverse based on the confrontation network as claimed in claim 1, wherein: in step 1), all face images of the respective poses used by the data set are derived from the data set Multi _ Pie; the number of pictures in the data set is more than 75 ten thousand, and the images under 20 kinds of illumination and 15 kinds of postures of 337 individuals are included; the illumination of the picture is changed from dark to light by illumination numbers 01 to 20, wherein the illumination number 07 is a standard illumination condition; marking all face images of a data set as I^aFor each image I^aIn the data set, the images of the same person with a face deflection angle of 0 ° and an illumination index of 07 are found, and they are marked as I^g。

3. The method for generating the multi-pose face image obverse based on the confrontation network as claimed in claim 1, wherein: in step 2), the generator G consists of an attitude estimator P, an encoder En, an encoding residual error network R and a decoder De; the pose estimator P estimates the pose of the face by adopting a PnP algorithm, further obtains the deflection angle of the face in the yaw direction, and is realized by a function cv2.solvePnP () of an open-source opencv library; the encoder En is a convolutional neural network, the encoding residual error network R is a two-layer fully-connected neural network, and the decoder De is a deconvolution neural network;

4. The method for generating the multi-pose face image obverse based on the confrontation network as claimed in claim 1, wherein: in step 3), the discriminating network D is a convolutional neural network-based classifier for determining whether the input image is from the generator G or the original image data; the face identity characteristic extractor F adopts the Light-CNN-29 with an opened source, the Light-CNN29 is a lightweight convolutional neural network, the depth is 29 layers, and the number of parameters is 1200 ten thousand.

5. The method for generating the multi-pose face image obverse based on the confrontation network as claimed in claim 1, wherein: in step 4), the objective of the loss function is to minimize the synthesized frontal face image I^fAnd a non-synthesized frontal face image I^gThereby making a composite front face image I^fMore identity information of the input face image can be reserved; the loss function used in the step 4) comprises a self-created coding loss function besides a pixel loss function, an identity loss function, a symmetrical loss function and a counter loss function which are commonly used in the same type of method, wherein the first target of the coding loss function is to input a face image I^aAnd a non-synthesized frontal face image I^gThe codes obtained by the encoders En are more similar, because the present human face posture correction method is in the process of inputtingThe effect is better under the condition that the deflection angle of the human face is smaller, so that the closer the code of the input human face image is to the code of the same human face under the condition of 0-degree deflection angle, the better the synthesized front face effect is; thus, the first part of the coding loss function is formulated as follows:

where N is the dimension of the respective code, En is the encoder, En (I)^a)ⁱIs the initial coding En (I)^a) The value of the ith dimension of (a); r is the coding residual network, R (En (I)^a))ⁱIs the coded residual R (En (I)^a) A value of the ith dimension of); en (I)^a)ⁱ+R(En(I^a))ⁱA value equal to the ith dimension of the modified code;

is a front face code X obtained by a coder from a non-synthesized front face image₃The value of the ith dimension of (a); the first part of the coding loss function is the Manhattan distance between the modified code and the front face code;

wherein M is the number of people in the training set; y is_iIs the value of the ith dimension of a one-hot vector y indicating the input face image I^aWhich person belongs to the training set, if image I^aBelonging to the j-th person, the j-th dimension of the vector y has a value of 1, the remaining dimensions have values of 0, and the y-dimension has a value ofM；En(I^a)+R(En(I^a) Is the correction code; c (En (I)^a)+R(En(I^a)))_iIs to correct the feature vector C (En (I) obtained by encoding through the full connection layer C^a)+R(En(I^a) ) a value of the ith dimension;

thus, the complete coding loss function is:

L_code＝L_code1+λL_code2

6. The method for generating the multi-pose face image obverse based on the confrontation network as claimed in claim 1, wherein: in the step 4), the generator G and the discrimination network D are alternately trained to enable the two to be mutually optimized and improved in confrontation; in the initial stage, the face image generated by the generator G is blurred, and the judgment network D can easily judge the source of the input image, so that the generator G is stimulated to generate a clearer image, and the quality of the generator G is improved; in the subsequent stage, the image generated by the generator G is clearer and is close to the original image data, so that the judgment network is excited to judge the input image more accurately, and the judgment capability of the judgment network D is improved.