CN111523406B - Deflection face correcting method based on generation confrontation network improved structure - Google Patents

Deflection face correcting method based on generation confrontation network improved structure Download PDF

Info

Publication number
CN111523406B
CN111523406B CN202010269281.6A CN202010269281A CN111523406B CN 111523406 B CN111523406 B CN 111523406B CN 202010269281 A CN202010269281 A CN 202010269281A CN 111523406 B CN111523406 B CN 111523406B
Authority
CN
China
Prior art keywords
face
pred
picture
loss
classifier
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010269281.6A
Other languages
Chinese (zh)
Other versions
CN111523406A (en
Inventor
达飞鹏
胡惠雅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Southeast University
Original Assignee
Southeast University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Southeast University filed Critical Southeast University
Priority to CN202010269281.6A priority Critical patent/CN111523406B/en
Publication of CN111523406A publication Critical patent/CN111523406A/en
Application granted granted Critical
Publication of CN111523406B publication Critical patent/CN111523406B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T90/00Enabling technologies or technologies with a potential or indirect contribution to GHG emissions mitigation

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Human Computer Interaction (AREA)
  • Biomedical Technology (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a deflection human face correcting method based on an improved structure of a generation confrontation network. The processing steps are as follows: (1) Detecting characteristic points of the human face, and extracting human face region blocks (eyes, a nose and a mouth) with fixed sizes; (2) Respectively inputting the human face region blocks and the whole human face in the step 1 into a local channel and a whole channel to obtain a result after the local channel and the whole channel are corrected; (3) Fusing the outputs of the local channel and the whole channel, and setting the pixel of the overlapped area as the maximum value on the area to obtain the final generated face; (4) Inputting the generated face and the front face into a discriminator and a classifier to ensure the accuracy and identity consistency of the generated face; (5) And saving the trained network model for testing. The BEGAN network adopted by the invention has simple and efficient structure, and improves the accuracy and speed of the deflection face correction to a certain extent.

Description

Deflection face correcting method based on generation of confrontation network improved structure
Technical Field
The invention relates to a deflection human face correcting method based on an improved structure of a generation confrontation network, and belongs to the field of computer vision.
Background
With the continuous development of deep learning, the research on face recognition has made many breakthrough progresses. Recognition algorithms based on deep learning even exceed the level of human eyes, however, most of the researches are under the premise of a front face or a similar front face, and therefore, the researches have certain limitations. There is evidence that even the most well-behaved frontal face recognition method has a greatly reduced recognition rate at large angular deflections. For face recognition under posture change, the existing methods can be roughly classified into the following three types: a feature extraction method based on pose robustness, a method based on front face generation and a method based on subspace analysis.
For the first method, the conventional feature extraction method mainly utilizes some robust local descriptors, such as Gabor features, SIFT features, LBP features, and the like, and the latest improved method utilizes deep learning to extract features, such as Light CNN and Facenet structure extraction features, but both of these two methods for extracting features cannot effectively handle the situation of large attitude deflection. For the third method, the linear subspace is difficult to express the nonlinearity of the face pose change process, and the learning of the nonlinear subspace is often accompanied with the problem of complex training. Therefore, the present invention is primarily concerned with the second approach, the frontal face generation based approach. The early research is realized by establishing a three-dimensional face model, the requirement on the feature points is high, and particularly when the deflection angle of the face is slightly large, some feature points in a two-dimensional face picture are invisible, so that the method has certain limitation.
Compared with a method for generating a face by using a three-dimensional face model, the method for generating the confrontation network to carry out face conversion is a great trend, and exciting performance is obtained. The current methods for face alignment using a generated confrontation network can be divided into two-dimensional methods and three-dimensional methods. The two-dimensional method comprises TP-GAN and PIM, and the three-dimensional method mainly comprises the step of applying a three-dimensional human face deformation model (3 DMM) to the generation of the confrontation network, and obtaining shape and texture parameters through the model to provide prior for accurate recovery of a human face structure. Because the face structure is a complex three-dimensional structure, and the two-dimensional method is difficult to implement due to lack of constraints, the two-dimensional method usually adopts a double-path network structure generated by face contours and face details, and establishes a series of supervisory loss functions to provide constraints on the face structure, such as constraints established by keeping symmetry of generated faces.
In addition, in both the two-dimensional method and the three-dimensional method, a feature extraction module is often added to maintain identity consistency before and after correction, wherein the face feature extraction structure Light-CNN has better performance in both time and space complexity, and thus is widely used.
Disclosure of Invention
The purpose of the invention is as follows: aiming at the problems, in a method TP-GAN for performing face correction by a two-dimensional method, aiming at the problems that a DCGAN structure adopted by the TP-GAN for generating an antagonistic network has a plurality of training difficulties and is easy to generate mode collapse, and the training process is relatively complicated because multi-scale images are collected on the face, the invention provides a method for correcting the deflected face based on an improved structure for generating the antagonistic network.
The method firstly maintains identity consistency of human faces before and after generation by introducing a third confrontation structure, namely a classifier, on the basis of a generation confrontation network structure of a traditional two-way confrontation of a double path. Practice proves that the method can better keep identity consistency before and after face correction, the generated front face has higher quality, and meanwhile, the method greatly reduces the difficulty of network training and improves the training efficiency.
The technical scheme is as follows: in order to realize the purpose of the invention, the technical scheme adopted by the invention is as follows: a method for correcting a deflected face based on an improved structure of a generation countermeasure network comprises the following steps:
step 1, deflecting face I for input in Detecting characteristic points, and extracting human face region blocks with fixed sizes, namely eyes, a nose and a mouth;
step 2, respectively inputting the human face region blocks and the whole human face obtained in the step 1 into a local channel and a whole channel to obtain a result of inverting the local channel and the whole channel;
step 3, fusing the corrected human face region output by the local channel and the whole human face contour output by the whole channel, setting the pixel of the overlapping region as the maximum value on the region, and obtaining the final generated human face I pred
Step 4, generating a face I pred With a corresponding frontal face I gt Inputting the data into a discriminator in sequence to ensure the accuracy of face generation, wherein I gt Inputting a front face corresponding to a deflected face;
step 5, generating a human face I pred With a corresponding frontal face I gt Meanwhile, the identity consistency of the generated face and the input deflection face is ensured when the generated face and the input deflection face are input into the classifier;
and 6, storing the generated confrontation network model obtained by training, and correcting the deflected face during testing.
As a further technical solution of the present invention, step 1 specifically comprises:
step 1.1, uniformly normalizing the size of the face to 128 multiplied by 128, and constructing a caffe deep learning environment;
step 1.2, under a caffe deep learning environment, detecting feature points of a face, and positioning five feature points of a center point of eyes, a nose tip point and two side points of a mouth corner;
and step 1.3, obtaining each region block in the face, namely the mouth, the nose and the eyes according to the five feature points obtained in the step 1.2, wherein in order to ensure that the network can train smoothly, the size of the region blocks among different faces is kept constant, the size of the eye region is set to be 40 × 40, and the size of the nose and the mouth are respectively set to be 32 × 40 and 32 × 48.
As a further technical solution of the present invention, step 2 specifically is:
step 2.1, to the front face I gt Extracting face blocks of the eye region, the nose region and the mouth region in the same way to respectively obtain eye gt 、Eyer gt 、Nose gt 、Mouth gt
Step 2.2: aiming at a local channel and an integral channel, designing a generator structure into a U-Net structure to ensure that the sizes of pictures before and after generation are kept unchanged;
step 2.3: for local channels, inputting the face region blocks obtained in the step 1 into a local channel generator to obtain corrected face region blocks, namely eye pred 、Eyer pred 、Nose pred 、Mouth pred (ii) a The loss function of the local channel is designed as an L1 loss function and consists of the following parts:
Figure SMS_1
Figure SMS_2
Figure SMS_3
/>
Figure SMS_4
wherein, W eyel 、W eyer 、W nose 、W mou Corresponding to the width of the left, right, nose and mouth regions, respectively, equal to 40, 32, H, respectively eyel 、H eyer 、H nose 、H mou Then the corresponding is height, which is equal in value to 40, 40 and 48, respectively.
Figure SMS_5
The gray values of the corrected left eye, right eye, nose and mouth at the (x, y) coordinate are determined, and the gray values are based on the gray values>
Figure SMS_6
The gray values at (x, y) coordinates are blocked for each region in the corresponding frontal face.
The loss function of the local channel is the sum of the losses of each region block, and lambda is set 2 Being the weight value of local channel loss, the loss function local _ loss of the local channel can be expressed as follows:
local_loss=λ 2 *(eyel_loss+eyer_loss+nose_loss+mouth_loss)
step 2.4: for the whole channel, likewise, the face I will be deflected in And inputting the data into an encoder structure containing a 3 x 3 convolution kernel to obtain a 128 x 128-dimensional frontal face contour with the same size as the original face.
As a further technical solution of the present invention, step 3 specifically is:
step 3.1: based on the result of the correction of the face of each region block obtained in the step 2, expanding each region block by 0 to obtain the picture with the same size as the original face, namely the face pictures respectively corresponding to the left eye, the right eye, the nose and the mouth of the face;
step 3.2: and obtaining the face pictures corresponding to the local channels according to the four face pictures with the same size obtained in the step 3.1. The method comprises the following steps: comparing gray values of pixel points with the same coordinates in the four face pictures, selecting the maximum value of the gray values, taking the maximum value as the gray value of the coordinate point, and repeating the operation pixel by pixel to finally obtain the face picture corresponding to the local channel;
step 3.3: the local detail face picture obtained in the last step is fused with the front face contour obtained in the overall channel in the step 2.4, and a final generated picture I is obtained through a plurality of applied loss functions pred
The loss function applied in step 3.3 is specifically:
1) Pixel-by-pixel level L1 penalty:
Figure SMS_7
wherein W and H are width and height of the face picture, both of which are 128,
Figure SMS_8
and &>
Figure SMS_9
Respectively represent and generate a face picture I pred And the gray value corresponding to the face at the coordinates (x, y);
2) Generating the antagonistic loss:
Figure SMS_10
since the generator performs countercheck generation with the discriminator and the classifier respectively in the invention, L in the above formula G-D And L G-C The loss function is established when the generator respectively confronts with the discriminator and the classifier, the confrontation loss of the generator is designed as the weighted sum of the two parts aiming at the quality of the generated picture and keeping identity consistency, and the weights are respectively set as lambda 0 And λ 1 . The two parts correspond to the following specific formula, wherein I pred' Representational generation of face picture I pred Output via discriminator D, W andh is the width and height of the face picture, both 128,
Figure SMS_11
and &>
Figure SMS_12
Each represents I pred And I pred' The gray value at the coordinates (x, y),
Figure SMS_13
features extracted for the classifier: />
Figure SMS_14
Figure SMS_15
3) Face symmetry loss:
Figure SMS_16
wherein W and H are width and height of the face picture, both of which are 128,
Figure SMS_17
and &>
Figure SMS_18
Respectively represent and generate face pictures I pred Gray values at coordinates (x, y) and coordinates (W- (x-1), y);
4) The regularization loss function is a function that accounts for the noise present in the generated face picture.
As a further technical scheme of the present invention, step 4 specifically comprises:
step 4.1: the face picture I obtained in the step 3.3 is processed pred With front face I gt Successively inputting the images into a discriminator, wherein the discriminator is structurally a self-encoder, and the output of the discriminator is a face image;
and 4.2: will generate a face I pred With front face I gt The output face pictures after passing through the discriminator are respectively marked as I pred' And I gt' The pixel-by-pixel errors are calculated separately, i.e. the L1 norm error is calculated for each pixel in the picture and summed, as follows:
Figure SMS_19
Figure SMS_20
wherein W and H are width and height of the face picture, both of which are 128,
Figure SMS_21
and &>
Figure SMS_22
Each represents I gt 、I pred 、I gt' And I pred' A gray value at coordinates (x, y);
step 4.3: the penalty function for discriminator D is designed as:
L D =L(I gt )-k t L(I pred )
wherein k is t The degree of importance, k, to the discriminator can be controlled manually t The larger the value, the stronger the judgment ability of the representative setting discriminator. During the training of the arbiter, it is required to minimize L D The method requires that the pixel-by-pixel error of the face is generated maximally, and simultaneously the pixel-by-pixel error of the real face is minimized, so that the aim of distinguishing the two faces is fulfilled;
step 4.4: for the game process between the discriminator and the generator, the loss function of the generator is designed as follows:
L G-D =L(I pred )
contrary to the discriminator, the generator aims to make the generated face as close as possible to the real face, so the loss function of the generator generates the face I for minimization pred The quality of the generated face is ensured by the pixel-by-pixel error;
step 4.5: according to the loss functions in the step 4.3 and the step 4.4, the discriminator and the countermeasure are continuously generated in a countercheck mode, the expression capacities of the discriminator and the countermeasure are stronger and stronger, and the training data are more and more vivid.
As a further technical solution of the present invention, step 5 specifically is:
step 5.1: the face picture I obtained in the step 3.3 is processed pred Corresponding front face I gt Inputting the images into a classifier (assuming that a training set has N faces, and then a training set has 2N pictures), wherein the classifier C is a Light-CNN model which is trained in advance and is used as a separate structure to confront with a generator;
step 5.2: front face I gt And generator generating face I pred The features extracted by the classifier are respectively recorded as
Figure SMS_23
And
Figure SMS_24
defining the result label of the classifier as 1-2N, and the classification target of the classifier is to classify the front face picture I gt Corresponding to the first N labels, generating a picture I pred And corresponding to the last N labels. The cross entropy loss functions thereof respectively correspond to:
Figure SMS_25
Figure SMS_26
wherein j is in the range of { 1.,. 2N },
Figure SMS_27
and &>
Figure SMS_28
Classifier labels respectively representing the front face and corresponding to the generated face picture>
Figure SMS_29
Represents the jth front face picture I gt Corresponding correct label, in conjunction with a suitable key, on the key of the key reader>
Figure SMS_30
Representing the classification label of the front face picture by the classifier; />
Figure SMS_31
Representing the jth generated face picture I pred Corresponding correct tag, { C (I) pred )} j Output labels after the representative classifier classifies the output labels;
step 5.3: the classifier loss function is designed as the weighted sum of the two cross entropy loss functions, and alpha is a weight factor, because the generated face is relatively close to the face picture corresponding to the generated face in the feature space. The loss function of the classifier is formulated as follows:
Figure SMS_32
step 5.4: for the game process between the generator and the classifier, designing the loss function corresponding to the generator as follows:
Figure SMS_33
wherein the content of the first and second substances,
Figure SMS_34
the calculation formula of (a) is as follows:
Figure SMS_35
namely, the cosine distance between the generated face picture and the corresponding front face picture characteristic is minimized, so that the aim of keeping identity consistency between the generated face picture and the corresponding front face picture is fulfilled;
step 5.5: weighting the generator in the step 4.4 and the generator in the step 5.4 with the confrontation loss of the discriminator and the classifier respectively to obtain the generated confrontation loss of the generator in the step 3.4, wherein a formula is rewritten as follows:
Figure SMS_36
step 5.6: the classifier and the generator respectively have corresponding confrontation loss functions, the classifier and the generator continuously confront and generate, the expression capacities of the classifier and the generator are stronger and stronger, and the training data are closer and closer to the identity information of the original face.
As a further technical solution of the present invention, step 6 specifically is:
step 6.1: storing the network model parameters during training;
step 6.2: extracting feature points of the test set deflection human face picture, wherein the feature points are extracted in the same method as the feature points of the training set under a caffe deep learning environment;
step 6.3: extracting face blocks (eyes, a nose and a mouth) of the deflected face region according to the feature points obtained in the step 5.2, wherein the sizes of the face blocks are consistent with the corresponding sizes in the training set, namely the sizes of the eyes, the nose and the mouth are 40 multiplied by 40, 32 multiplied by 40 and 32 multiplied by 48 respectively;
step 6.4: the network model obtained in the training is used for correcting the test set face region blocks, and similarly, the face contour after the whole face of the deflected face is corrected can also be obtained;
step 6.5: and (4) fusing two outputs of the local channel and the whole channel according to the network model parameters in the training set, and keeping the loss function of the generator consistent with that in the step (3.4), thereby obtaining a generated front face picture corresponding to the test set deflection face.
Has the advantages that: compared with the prior art, the technical scheme of the invention has the following beneficial technical effects:
1) Firstly, different from other single-channel network structures, the invention adopts a double-channel network structure to recover the face from two channels of local details and the whole face, compared with the problem of detail loss caused by independent input of the face into the network, the double-channel structure can better retain face detail information, thereby generating a more vivid front face picture.
2) Secondly, a special structure is used before and after the face is corrected to keep the identity feature consistency of the face, the structure is used as a third structure for generating an antagonistic network to participate in network training, and experimental results show that the method can better accelerate network convergence and keep face identity information.
3) Finally, the discriminator is designed as a self-encoder, the output of the self-encoder is still a face picture, the Wassertein distance of pixel-by-pixel error distribution of the generated face and the original face is minimized, the loss function of the discriminator is an energy loss function, and the continuous energy value is different from a traditional generation countermeasure network which defines the output of a generator as a discrete value, and the face picture with high resolution can be better generated.
Drawings
FIG. 1 is a flow chart of the overall process of the present invention.
Fig. 2 shows distribution positions and serial numbers of 68 contour feature points extracted in a caffe environment.
Fig. 3 is a schematic diagram of the structure of the generator.
Fig. 4 is a schematic diagram of the structure of the discriminator.
Fig. 5 is a schematic diagram of a classifier structure.
Fig. 6 is a schematic diagram of the correction effect of the present invention on a Multi-PIE data set.
Fig. 7 is a schematic diagram of the positive transfer effect on the LFW test set, wherein (a) is an original face image in the LFW test set, (b) is a positive transfer result obtained by the present invention, and (c) and (d) are corresponding result images in other studies.
Detailed Description
The technical solution of the present invention will be further described in detail with reference to the following examples and accompanying drawings.
Spyder software is selected as a programming tool under a Linux operating system, and a confrontation network model is established and generated. This example was trained using 13 different pose pictures of 337 individuals of the Multi-PIE face library under the same lighting conditions and tested on an LFW deflection face data set.
Fig. 1 is a schematic diagram of the network structure of the present invention, and the specific steps are as follows:
step 1: the method comprises the following steps of detecting characteristic points of a human face, and extracting human face region blocks (eyes, a nose and a mouth) with fixed sizes, wherein the specific steps are as follows:
step 1.1, uniformly normalizing the size of the face to be 128 multiplied by 128, and constructing a caffe deep learning environment;
step 1.2, according to the combination Data-drive and Model-drive Methods for Robust Facial Landmark Detection [ J ]. IEEE Transactions on Information forces & Security,2018, the proposed key store extraction method is used for detecting the feature points of the face under the condition of caffe deep learning, and positioning five feature points of the center point of the eyes, the nose tip point and the two side points of the mouth corner;
and step 1.3, obtaining each region block (mouth, nose and eyes) in the face according to the five feature points obtained in the step 1.2, wherein in order to ensure that the network can train smoothly, the size of the region blocks among different faces is kept constant, the size of the eye region is set to be 40 × 40, and the sizes of the nose and the mouth are respectively set to be 32 × 40 and 32 × 48.
Fig. 2 shows distribution positions and serial numbers of 68 contour feature points extracted in a caffe environment.
Step 2: respectively inputting the human face region blocks and the whole human face obtained in the step 1 into a local channel and a whole channel to obtain a corrected result of the local channel and the whole channel, and specifically comprising the following steps:
step 2.1, to the front face I gt Extracting face blocks of the eye region, the nose region and the mouth region in the same way to respectively obtain eye gt 、Eyer gt 、Nose gt 、Mouth gt
Step 2.2: aiming at a local channel and an overall channel, designing a generator structure into a U-Net structure so as to ensure that the sizes of pictures before and after generation are kept unchanged;
step 2.3: for local channels, the face region blocks obtained in the step 1 are input into a local channel generator to obtain correctionThe later face region is divided into eye regions pred 、Eyer pred 、Nose pred 、Mouth pred
The loss function of the local channel is designed as an L1 loss function and consists of the following parts:
Figure SMS_37
Figure SMS_38
Figure SMS_39
Figure SMS_40
wherein, W eyel 、W eyer 、W nose 、W mou Corresponding to the width of the left, right, nose and mouth regions, respectively, equal to 40, 32, H, respectively eyel 、H eyer 、H nose 、H mou Then the corresponding is height, which is equal in value to 40, 40 and 48, respectively.
Figure SMS_41
The gray values of the corrected left eye, right eye, nose and mouth at the (x, y) coordinate are determined, and the gray values are based on the gray values>
Figure SMS_42
The gray values at (x, y) coordinates are blocked for each region in the corresponding frontal face.
The loss function of the local channel is the sum of the losses of each region block, and lambda is set 2 Being the weight value of local channel loss, the loss function local _ loss of the local channel can be expressed as follows:
local_loss=λ 2 *(eyel_loss+eyer_loss+nose_loss+mouth_loss)
step 2.4: for the whole channel, likewise, the face I will be deflected in And inputting the data into an encoder structure containing a 3 x 3 convolution kernel to obtain a 128 x 128-dimensional frontal face contour with the same size as the original face.
And step 3: the output of the local channel, the corrected human face area and the output of the whole channel, the whole human face outline are fused, the pixel of the overlapping area is set as the maximum value on the area, and the final generated human face I is obtained pred The method comprises the following specific steps:
step 3.1: based on the result of the correction of the face of each region block obtained in the step 2, expanding each region block by 0 to obtain the picture with the same size as the original face, namely the face pictures respectively corresponding to the left eye, the right eye, the nose and the mouth of the face;
step 3.2: and obtaining the face pictures corresponding to the local channels according to the four face pictures with the same size obtained in the step 3.1. The method comprises the following steps: comparing gray values of pixel points with the same coordinates in the four face pictures, selecting the maximum value of the gray values, taking the maximum value as the gray value of the coordinate point, and repeating the operation pixel by pixel to finally obtain the face picture corresponding to the local channel;
step 3.3: the local detail face picture obtained in the last step is fused with the front face contour obtained in the integral channel in the step 2.4, and a final generated picture I is obtained through a plurality of applied loss functions pred
The loss function applied in step 3.3 is specifically:
1) Pixel-level L1 penalty:
Figure SMS_43
wherein W and H are width and height of the face picture, both of which are 128,
Figure SMS_44
and &>
Figure SMS_45
Respectively represent and generate a face picture I pred And the gray value corresponding to the face at the coordinates (x, y);
2) Generating the confrontation loss:
Figure SMS_46
since the generator performs the antagonistic generation with the discriminator and the classifier respectively in the present invention, L in the above formula G-D And L G-C The loss function is established when the generator respectively confronts with the discriminator and the classifier, the confrontation loss of the generator is designed as the weighted sum of the two parts aiming at the quality of the generated picture and keeping identity consistency, and the weights are respectively set as lambda 0 And λ 1 . The two parts correspond to the following specific formula, wherein I pred' Representing generated face picture I pred The output of the discriminator D shows that W and H are the width and height of the face picture, which are both 128,
Figure SMS_47
and &>
Figure SMS_48
Each represents I pred And I pred' The gray value at coordinate (x, y),
Figure SMS_49
features extracted for the classifier:
Figure SMS_50
Figure SMS_51
3) Face symmetry loss:
Figure SMS_52
wherein W and H are width and height of the face pictureThe number of the channels, both 128,
Figure SMS_53
and &>
Figure SMS_54
Respectively represent and generate a face picture I pred Gray values at coordinates (x, y) and coordinates (W- (x-1), y);
4) The regularization loss function is a function that accounts for the noise present in the generated face picture.
Fig. 3 is a schematic diagram of the structure of the generator.
And 4, step 4: will generate a face I pred With corresponding frontal face I gt The method comprises the following steps of inputting the sequence into a discriminator to ensure the accuracy of face generation:
step 4.1: the face picture I obtained in the step 3.3 is processed pred With front face I gt Successively inputting the images into a discriminator, wherein the discriminator is structurally a self-encoder, and the output of the discriminator is a face image;
step 4.2: will generate a face I pred With front face I gt The output face pictures after passing through the discriminator are respectively marked as I pred' And I gt' The pixel-by-pixel errors are calculated separately, i.e. the L1 norm error is calculated for each pixel in the picture and summed, as follows:
Figure SMS_55
Figure SMS_56
wherein W and H are width and height of the face picture, both of which are 128,
Figure SMS_57
and &>
Figure SMS_58
Each represents I gt 、I pred 、I gt' And I pred' A gray value at coordinate (x, y);
step 4.3: the penalty function for design arbiter D is:
L D =L(I gt )-k t L(I pred )
wherein k is t The degree of importance of the discriminator, k, can be controlled manually t The larger the value, the stronger the judgment ability of the representative setting discriminator. During the training of the arbiter, it is required to minimize L D That is, the pixel-by-pixel error of the face is required to be generated maximally, and the pixel-by-pixel error of the real face is minimized, so that the aim of distinguishing the two faces is fulfilled;
step 4.4: for the game process between the discriminator and the generator, the loss function of the generator is designed as follows:
L G-D =L(I pred )
contrary to the discriminator, the generator aims to make the generated face as close as possible to the real face, so the loss function of the generator generates the face I for minimization pred The quality of the generated face is ensured by the pixel-by-pixel error;
step 4.5: according to the loss functions in the step 4.3 and the step 4.4, the discriminator and the confrontator continuously confront and generate, the expression capacities of the discriminator and the confrontator are stronger and stronger, and the training data are more and more vivid.
Fig. 4 is a schematic diagram of the structure of the discriminator.
Step 5, generating a human face I pred With a corresponding frontal face I gt Meanwhile, the identity consistency of the generated face and the input deflection face is ensured by inputting the face into a classifier, and the specific steps are as follows:
step 5.1: the face picture I obtained in the step 3.3 is processed pred Corresponding to the front face I gt Inputting the images into a classifier (assuming that a training set has N faces, and then one image has 2N pictures), wherein the classifier C is a Light-CNN model which is trained in advance and is used as an independent structure to confront with a generator;
and step 5.2: front face I gt And generator generating face I pred The features extracted by the classifier are respectively recorded as
Figure SMS_59
And
Figure SMS_60
defining the result label of the classifier as 1-2N, wherein the classification target of the classifier is to classify the front face picture I gt Corresponding to the first N labels to generate a picture I pred And corresponding to the last N labels. The cross entropy loss function is respectively as follows:
Figure SMS_61
Figure SMS_62
wherein j belongs to { 1., 2N },
Figure SMS_63
and &>
Figure SMS_64
Classifier labels respectively representing the front face and corresponding to the generated face picture>
Figure SMS_65
Representing the jth front face picture I gt Corresponding correct tag, { C (I) gt )} j Representing the classification label of the front face picture by the classifier; />
Figure SMS_66
Representing the jth generated face picture I pred Corresponding correct tag, { C (I) pred )} j Output labels after the classifier classifies the output labels;
step 5.3: the classifier loss function is designed as the weighted sum of the two cross entropy loss functions, and alpha is a weight factor, because the generated face is relatively close to the face picture corresponding to the generated face in the feature space. The loss function of the classifier is formulated as follows:
Figure SMS_67
step 5.4: for the game process between the generator and the classifier, designing the loss function corresponding to the generator as follows:
Figure SMS_68
therein are
Figure SMS_69
The calculation formula of (a) is as follows:
Figure SMS_70
namely, the cosine distance between the generated face picture and the corresponding front face picture characteristic is minimized, so that the aim of keeping identity consistency between the generated face picture and the corresponding front face picture is fulfilled;
step 5.5: weighting the generator in the step 4.4 and the generator in the step 5.4 with the confrontation loss of the discriminator and the classifier respectively to obtain the generation confrontation loss of the generator in the step 3.4, wherein a formula is rewritten as follows:
Figure SMS_71
step 5.6: the classifier and the generator respectively have corresponding confrontation loss functions, the classifier and the generator continuously confront and generate, the expression capacities of the classifier and the generator are stronger and stronger, and the training data are closer and closer to the identity information of the original face.
Fig. 5 is a schematic diagram of the structure of the classifier.
Fig. 6 is a schematic diagram of the correction effect of the present invention on a Multi-PIE data set.
Step 6, storing the generated confrontation network model obtained by training, and correcting the deflected human face during testing, wherein the specific steps are as follows:
step 6.1: storing the network model parameters during training;
step 6.2: extracting feature points of the test set deflection human face picture, wherein the feature points are extracted in the same method as the feature points of the training set under a caffe deep learning environment;
step 6.3: extracting face blocks (eyes, a nose and a mouth) of the deflected face region according to the feature points obtained in the step 5.2, wherein the sizes of the face blocks are consistent with the corresponding sizes in the training set, namely the sizes of the eyes, the nose and the mouth are 40 multiplied by 40, 32 multiplied by 40 and 32 multiplied by 48 respectively;
step 6.4: the network model obtained in training is used for correcting the test set face region blocks, and similarly, the face contour after the whole face of the deflected face is corrected can also be obtained;
step 6.5: and (5) fusing two outputs of the local channel and the whole channel according to the network model parameters in the training set, and keeping the loss function of the generator consistent with that in the step (3.4), thereby obtaining a generated front face picture corresponding to the deflection face of the test set.
Fig. 7 is a schematic diagram of the positive effects on the LFW test set.
The invention is based on the BEGAN in the generation countermeasure network, and the structure minimizes Wassertein distance of pixel-by-pixel error distribution of human faces before and after the generation of the BEGAN and can avoid a plurality of problems in the traditional generation countermeasure network; the method comprises the following steps of accurately recovering a face structure from a local layer and an integral layer by using a dual-channel network structure; furthermore, by using a network structure of the three countermeasures, the classifier keeping identity consistency is used as an independent structure to be countermeasures with the generator instead of being used as a part of the generation process of the supervision generator of the loss function, the accuracy and the speed of the method for correcting the deflected face are improved, and a good correcting effect can be obtained under the condition of large-angle deflection.
The above description is only an embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can understand that the modifications or substitutions within the technical scope of the present invention are included in the scope of the present invention, and therefore, the scope of the present invention should be subject to the protection scope of the claims.

Claims (7)

1. A method for correcting a deflected face based on an improved structure of a generation countermeasure network is characterized by comprising the following steps:
step 1, deflecting face I for input in Detecting characteristic points, and extracting human face region blocks with fixed sizes, namely eyes, a nose and a mouth;
step 2, inputting the face region blocks and the whole face obtained in the step 1 into a local channel and a whole channel respectively to obtain a corrected result of the local channel and the whole channel;
step 3, fusing the corrected human face region output by the local channel and the whole human face contour output by the whole channel, setting the pixel of the overlapping region as the maximum value on the region, and obtaining the final generated human face I pred
Step 4, generating a face I pred With corresponding frontal face I gt Inputting the data into a discriminator in sequence to ensure the accuracy of face generation, wherein I gt Inputting a front face corresponding to a deflected face;
step 5, generating a human face I pred With corresponding frontal face I gt Meanwhile, the identity consistency of the generated face and the input deflection face is ensured when the generated face and the input deflection face are input into the classifier;
and 6, storing the generated confrontation network model obtained by training, and correcting the deflected human face during testing.
2. The method for correcting the deflected face based on the improved structure of the generative confrontation network as claimed in claim 1, wherein the step 1 specifically comprises:
step 1.1, uniformly normalizing the size of the face to 128 multiplied by 128, and constructing a caffe deep learning environment;
step 1.2, detecting feature points of a face under a mask deep learning environment, and positioning five feature points including a central point of two eyes, a nose tip point and two side points of a mouth corner;
step 1.3, obtaining each region block in the human face, namely the mouth, the nose and the eyes, according to the five feature points obtained in step 1.2, wherein the size of the region blocks between different human faces is kept constant, the size of the eye region is set to be 40 × 40, and the size of the nose and the mouth are respectively set to be 32 × 40 and 32 × 48.
3. The method for correcting the deflected face based on the improved structure of the generated countermeasure network as claimed in claim 2, wherein the step 2 is specifically:
step 2.1, face to face I gt Extracting face blocks of the eye region, the nose region and the mouth region in the same way to respectively obtain eye gt 、Eyer gt 、Nose gt 、Mouth gt
Step 2.2: aiming at a local channel and an overall channel, designing a generator structure into a U-Net structure so as to ensure that the sizes of pictures before and after generation are kept unchanged;
step 2.3: for local channels, inputting the face region blocks obtained in the step 1 into a local channel generator to obtain corrected face region blocks, namely eye pred 、Eyer pred 、Nose pred 、Mouth pred The loss function of the local channel is designed as an L1 loss function and consists of the following parts:
Figure FDA0002442474380000021
Figure FDA0002442474380000022
Figure FDA0002442474380000023
/>
Figure FDA0002442474380000024
wherein, W eyel 、W eyer 、W nose 、W mou Corresponding to the width of the left, right, nose and mouth regions, respectively, equal to 40, 32, H, respectively eyel 、H eyer 、H nose 、H mou Corresponding is a height, equal in value to 40, 40 and 48 respectively,
Figure FDA0002442474380000025
the gray values of the corrected left eye, right eye, nose and mouth at the (x, y) coordinate position are determined in the manner of combining the gray values in the left eye, the right eye, the nose and the mouth in the right direction, and the gray values are combined in the device>
Figure FDA0002442474380000026
Dividing each region in the face corresponding to the front into gray values at the (x, y) coordinates;
the loss function of the local channel is the sum of the losses of each region block, and lambda is set 2 Which is the weight value of the local channel loss, the loss function local _ loss of the local channel can be expressed as follows:
local_loss=λ 2 *(eyel_loss+eyer_loss+nose_loss+mouth_loss)
step 2.4: for the whole channel, the human face I will be deflected in And inputting the data into an encoder structure containing a 3 x 3 convolution kernel to obtain a 128 x 128-dimensional frontal face contour with the same size as the original face.
4. The method for correcting the deflected face based on the improved structure of the generative confrontation network as claimed in claim 3, wherein step 3 is specifically:
step 3.1: based on the result of the correction of the face of each region block obtained in the step 2, expanding each region block by 0 to obtain the picture with the same size as the original face, namely the face pictures respectively corresponding to the left eye, the right eye, the nose and the mouth of the face;
step 3.2: according to the four face pictures with the same size obtained in the step 3.1, the face picture corresponding to the local channel is obtained, and the specific method is as follows: comparing gray values of pixel points with the same coordinates in the four face pictures, selecting the maximum value of the gray values, taking the maximum value as the gray value of the coordinate point, and repeating the operation pixel by pixel to finally obtain the face picture corresponding to the local channel;
step 3.3: fusing the local detail face picture obtained in the last step with the front face contour obtained in the overall channel in the step 2.4, and training through a plurality of applied loss functions to obtain a final generated picture I pred
The loss function applied in step 3.3 is specifically:
1) Pixel-by-pixel level L1 penalty:
Figure FDA0002442474380000031
wherein W and H are the width and height of the face picture, both of which are 128,
Figure FDA0002442474380000032
and &>
Figure FDA0002442474380000033
Respectively represent and generate a face picture I pred And the gray value corresponding to the face at the coordinates (x, y);
2) Generating the confrontation loss:
Figure FDA0002442474380000034
l in the above formula G-D And L G-C The loss function is established when the generator respectively confronts the discriminator and the classifier, the confrontation loss of the generator is designed as the weighted sum of the two parts aiming at the quality of the generated picture and keeping identity consistency, and the weights are respectively set as lambda 0 And λ 1 The two parts correspond to the following specific formula, wherein I pred ' Generation of face Picture I pred The output of the discriminator D shows that W and H are the width and height of the face picture, which are both 128,
Figure FDA0002442474380000035
and &>
Figure FDA0002442474380000036
Each represents I pred And I pred ' Gray value at coordinate (x, y)>
Figure FDA0002442474380000037
Features extracted for the classifier: />
Figure FDA0002442474380000038
Figure FDA0002442474380000039
3) Face symmetry loss:
Figure FDA00024424743800000310
wherein W and H are width and height of the face picture, both of which are 128,
Figure FDA00024424743800000311
and &>
Figure FDA00024424743800000312
Respectively represent and generate face pictures I pred Gray values at coordinates (x, y) and coordinates (W- (x-1), y);
4) The regularization loss function is a function that accounts for the noise present in the generated face picture.
5. The method for generating a deflected face correction based on an improved structure of a confrontation network as claimed in claim 4, wherein the step 4 is specifically:
step 4.1: the face picture I obtained in the step 3.3 pred With front face I gt Successively inputting the images into a discriminator, wherein the discriminator is structurally a self-encoder, and the output of the discriminator is a face image;
step 4.2: will generate a face I pred With front face I gt The output face pictures after the discriminator are respectively marked as I pred' And I gt' The pixel-by-pixel errors are calculated separately, i.e. the L1 norm error is calculated for each pixel in the picture and summed, as follows:
Figure FDA0002442474380000041
Figure FDA0002442474380000042
wherein W and H are width and height of the face picture, both of which are 128,
Figure FDA0002442474380000043
and &>
Figure FDA0002442474380000044
Each represents I gt 、I pred 、I gt' And I pred' A gray value at coordinate (x, y);
step 4.3: the penalty function for design arbiter D is:
L D =L(I gt )-k t L(I pred )
wherein k is t The degree of attention to the arbiter can be manually controlled, and the minimum L is required in the training process of the arbiter D That is, the pixel-by-pixel error of the face is required to be generated maximally, and the pixel-by-pixel error of the real face is minimized, so that the aim of distinguishing the two faces is fulfilled;
step 4.4: for the game process between the discriminator and the generator, the loss function of the generator is designed as follows:
L G-D =L(I pred )
loss function of generator for generating face I for minimization pred The quality of the generated face is ensured by the pixel-by-pixel error;
step 4.5: and (4) continuously carrying out countercheck generation on the discriminator and the countercheck according to the loss functions in the step 4.3 and the step 4.4 until the training reaches a preset condition.
6. The method for correcting a deflected face based on an improved structure of a generative confrontation network as claimed in claim 5, wherein the step 5 is specifically:
step 5.1: the face picture I obtained in the step 3.3 is processed pred Corresponding front face I gt Inputting the images into a classifier together, wherein a training set is assumed to have N faces in total, then 2N pictures are in total, and the classifier C is a Light-CNN model trained in advance and is used as an independent structure to confront with a generator;
step 5.2: front face I gt And generator generating face I pred The features extracted by the classifier are respectively recorded as
Figure FDA0002442474380000045
And &>
Figure FDA0002442474380000046
Defining the result label of the classifier as 1-2N, wherein the classification target of the classifier is to classify the front face picture I gt Corresponding to the first N labels, generating a picture I pred Corresponding to the last N labels, the cross entropy loss functions of the N labels are respectively corresponding to:
Figure FDA0002442474380000047
Figure FDA0002442474380000048
wherein j is in the range of { 1.,. 2N },
Figure FDA0002442474380000051
and &>
Figure FDA0002442474380000052
Classifier labels corresponding to the face of the person and the generated face picture respectively,
Figure FDA0002442474380000053
representing the jth front face picture I gt Corresponding correct tag, { C (I) gt )} j Representing the classification label of the front face picture by the classifier; />
Figure FDA0002442474380000054
Representing the jth generated face picture I pred Corresponding correct tag, { C (I) pred )} j Output labels after the representative classifier classifies the output labels;
step 5.3: the classifier loss function is designed as the weighted sum of the two cross entropy loss functions, alpha is a weight factor, and the loss function formula of the classifier is expressed as follows:
Figure FDA0002442474380000055
step 5.4: aiming at the game process between the generator and the classifier, designing the loss function corresponding to the generator as follows:
Figure FDA0002442474380000056
wherein the content of the first and second substances,
Figure FDA0002442474380000057
the calculation formula of (a) is as follows:
Figure FDA0002442474380000058
namely, the cosine distance between the generated face picture and the corresponding front face picture characteristic is minimized;
step 5.5: weighting the generator in the step 4.4 and the generator in the step 5.4 with the confrontation loss of the discriminator and the classifier respectively to obtain the generated confrontation loss of the generator in the step 3.4, wherein a formula is rewritten as follows:
Figure FDA0002442474380000059
step 5.6: the classifier and the generator respectively have corresponding confrontation loss functions, and the classifier and the generator continuously confront and generate until the training reaches a preset condition.
7. The method for correcting a deflected face based on an improved structure of a generative confrontation network as claimed in claim 6, wherein step 6 specifically comprises:
step 6.1: storing the network model parameters during training;
step 6.2: extracting the characteristic points of the test set deflection human face picture, wherein the extraction method is the same as that of the training set and is completed in a caffe deep learning environment;
step 6.3: extracting the face blocks of the deflected face region, namely the eyes, the nose and the mouth, according to the feature points obtained in the step 5.2, wherein the sizes of the eyes, the nose and the mouth are consistent with the corresponding sizes in the training set, namely the sizes of the eyes, the nose and the mouth are respectively 40 × 40, 32 × 40 and 32 × 48;
step 6.4: the network model obtained in the training is used for correcting the test set face region blocks, and similarly, the face contour after the whole face of the deflected face is corrected can also be obtained;
step 6.5: and (4) fusing two outputs of the local channel and the whole channel according to the network model parameters in the training set, and keeping the loss function of the generator consistent with that in the step (3.4), thereby obtaining a generated front face picture corresponding to the test set deflection face.
CN202010269281.6A 2020-04-08 2020-04-08 Deflection face correcting method based on generation confrontation network improved structure Active CN111523406B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010269281.6A CN111523406B (en) 2020-04-08 2020-04-08 Deflection face correcting method based on generation confrontation network improved structure

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010269281.6A CN111523406B (en) 2020-04-08 2020-04-08 Deflection face correcting method based on generation confrontation network improved structure

Publications (2)

Publication Number Publication Date
CN111523406A CN111523406A (en) 2020-08-11
CN111523406B true CN111523406B (en) 2023-04-18

Family

ID=71902548

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010269281.6A Active CN111523406B (en) 2020-04-08 2020-04-08 Deflection face correcting method based on generation confrontation network improved structure

Country Status (1)

Country Link
CN (1) CN111523406B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112288861B (en) * 2020-11-02 2022-11-25 湖北大学 Single-photo-based automatic construction method and system for three-dimensional model of human face

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110738161A (en) * 2019-10-12 2020-01-31 电子科技大学 face image correction method based on improved generation type confrontation network

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110738161A (en) * 2019-10-12 2020-01-31 电子科技大学 face image correction method based on improved generation type confrontation network

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
Rui Huang等.Beyond Face Rotation: Global and Local Perception GAN for Photorealistic and Identity Preserving Frontal View Synthesis.《017 IEEE International Conference on Computer Vision (ICCV)》.2017,全文. *

Also Published As

Publication number Publication date
CN111523406A (en) 2020-08-11

Similar Documents

Publication Publication Date Title
CN106815566B (en) Face retrieval method based on multitask convolutional neural network
Li et al. Robust visual tracking based on convolutional features with illumination and occlusion handing
WO2022111236A1 (en) Facial expression recognition method and system combined with attention mechanism
CN108629336B (en) Face characteristic point identification-based color value calculation method
CN110852182B (en) Depth video human body behavior recognition method based on three-dimensional space time sequence modeling
US20070242856A1 (en) Object Recognition Method and Apparatus Therefor
Tang et al. Facial landmark detection by semi-supervised deep learning
CN105335719A (en) Living body detection method and device
CN111783748A (en) Face recognition method and device, electronic equipment and storage medium
CN115830652B (en) Deep palm print recognition device and method
CN110991258B (en) Face fusion feature extraction method and system
CN111062328A (en) Image processing method and device and intelligent robot
Chen et al. Silhouette-based object phenotype recognition using 3D shape priors
CN112836680A (en) Visual sense-based facial expression recognition method
CN113312973A (en) Method and system for extracting features of gesture recognition key points
CN112036260A (en) Expression recognition method and system for multi-scale sub-block aggregation in natural environment
CN111914643A (en) Human body action recognition method based on skeleton key point detection
CN105912126A (en) Method for adaptively adjusting gain, mapped to interface, of gesture movement
CN112329516A (en) Method, device and medium for detecting wearing of mask of driver based on key point positioning and image classification
Zhou et al. MTCNet: Multi-task collaboration network for rotation-invariance face detection
CN115205933A (en) Facial expression recognition method, device, equipment and readable storage medium
Iqbal et al. Facial expression recognition with active local shape pattern and learned-size block representations
CN111523406B (en) Deflection face correcting method based on generation confrontation network improved structure
CN110598647B (en) Head posture recognition method based on image recognition
Lv et al. A spontaneous facial expression recognition method using head motion and AAM features

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant