CN110543846B - Multi-pose face image obverse method based on generation countermeasure network - Google Patents

Multi-pose face image obverse method based on generation countermeasure network Download PDF

Info

Publication number
CN110543846B
CN110543846B CN201910806159.5A CN201910806159A CN110543846B CN 110543846 B CN110543846 B CN 110543846B CN 201910806159 A CN201910806159 A CN 201910806159A CN 110543846 B CN110543846 B CN 110543846B
Authority
CN
China
Prior art keywords
face image
image
face
network
synthesized
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201910806159.5A
Other languages
Chinese (zh)
Other versions
CN110543846A (en
Inventor
张星明
容昌乐
林育蓓
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
South China University of Technology SCUT
Original Assignee
South China University of Technology SCUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by South China University of Technology SCUT filed Critical South China University of Technology SCUT
Priority to CN201910806159.5A priority Critical patent/CN110543846B/en
Publication of CN110543846A publication Critical patent/CN110543846A/en
Application granted granted Critical
Publication of CN110543846B publication Critical patent/CN110543846B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

The invention discloses a multi-pose face obverse method based on generation of a confrontation network. In the testing stage after training, the invention can correct the input various pose face pictures into the front face image. The corrected image is clear, the identity characteristics of the original face are kept, and the face correction method can be used for face recognition. The invention can effectively slow down the negative influence of the posture factor on the face recognition, and is beneficial to the development of the practical application of the face recognition under the non-limiting condition.

Description

Multi-pose face image obverse method based on generation countermeasure network
Technical Field
The invention relates to the technical field of image processing, in particular to a multi-pose face image obverse method based on a generation countermeasure network.
Background
At present, the face recognition technology is widely applied to a plurality of fields such as entrance guard security, social networking, finance and the like. However, in most practical scenes, the face recognition technology needs to be used efficiently under strict standard environment. Generally, a detected person needs to be in a scene with sufficient and uniform illumination, so that a calm expression is kept, and a standard posture is adjusted by matching with an image acquisition device. However, in many practical application fields, such as suspects tracking, the above conditions are often difficult to satisfy, which causes a great reduction in performance of many face recognition techniques, and face recognition techniques are difficult to popularize in these fields. Among the adverse factors affecting the performance of face recognition technology, the pose of a photographic face is the most important. The gesture problem is well processed, and the application of the face recognition technology in the non-limiting environment is greatly stepped.
One of the methods for dealing with the pose problem is to perform a positive correction on an input side face image, that is, correct one side face image into a front face image of the same person, and then recognize the person identity of the synthesized front face image. At present, most of multi-pose face image orthogonalization methods lack the processing capability of face images with deflection angles exceeding 60 degrees, and the face images synthesized by the methods have serious deformation and lose the identity characteristics of people, so that the subsequent face recognition work is difficult to perform. The positive method of the multi-pose face image with better effect is based on the generation of the confrontation network.
Compared with other multi-pose face image obverse methods based on the generation of the confrontation network, the method adopts different network structures and loss functions. Even if the face image with the deflection angle exceeding 60 degrees is input, the model can also synthesize vivid face images and retain more person identity information, and the efficiency of subsequent face recognition work is greatly improved.
Disclosure of Invention
The invention aims to overcome the defects of the existing multi-pose face image obverse method, provides a multi-pose face image obverse method based on a generated countermeasure network, solves the problem of poor effect of similar methods under the condition that the deflection angle of an input face exceeds 60 degrees, improves the fidelity of a synthesized face, and retains the identity information of more image faces.
In order to achieve the purpose, the technical scheme provided by the invention is as follows: a multi-pose face image frontal method based on a generation countermeasure network comprises the following steps:
1) collecting the face images of all postures as a training set and a test set, wherein each input face image I of any posture needs to be ensuredaCan find the non-synthesized front face image I of the same person in the data setg
2) In the training stage, the training is focused on any postureFace image of state IaInput into generator G to obtain corrected code X2And a synthesized front face image IfA non-synthesized front face image IgInputting into a generator G to obtain a face-up code X3
3) Combining the synthesized front face image IfOr non-synthesized frontal face image IgInputting a discrimination network D for discriminating whether the input face image is synthesized or not, and synthesizing the synthesized face image IfOr non-synthesized frontal face image IgInputting a human face identity characteristic extractor F, and extracting the character identity characteristic of the human face image through the F;
4) combining the discrimination result of the step 3) with the extracted person identity characteristics and the synthesized front face image IfNon-synthesized front face image IgCorrecting code X2And face-up coding X3Carrying in each pre-designed loss function, and alternately training the generator G and the discrimination network D until the training is finished;
5) in the testing stage, the human face image I with any posture is takenaInputting the trained generator G to obtain a synthesized front face image IfBy direct observation of the resultant frontal image IfThe effect is verified.
In step 1), all face images of the respective poses used by the data set are derived from the data set Multi _ Pie; the number of pictures in the data set is more than 75 ten thousand, and the images under 20 kinds of illumination and 15 kinds of postures of 337 individuals are included; the illumination of the picture is changed from dark to light by illumination numbers 01 to 20, wherein the illumination number 07 is a standard illumination condition; marking all face images of a data set as IaFor each image IaIn the data set, the images of the same person with a face deflection angle of 0 ° and an illumination index of 07 are found, and they are marked as Ig
In step 2), the generator G consists of an attitude estimator P, an encoder En, an encoding residual error network R and a decoder De; the pose estimator P estimates the pose of the face by adopting a PnP algorithm, further obtains the deflection angle of the face in the yaw direction, and is realized by a function cv2.solvePnP () of an open-source opencv library; the encoder En is a convolutional neural network, the encoding residual error network R is a two-layer fully-connected neural network, and the decoder De is a deconvolution neural network;
the process of the generator G for synthesizing the picture is as follows: inputting human face image I with any posture to generator GaThe encoder En converts it to the initial code X1The coding residual network R is formed by initially coding X1Estimating a coded residual R (X)1) The pose estimator P calculates the accurate deflection angle gamma of the face in the yaw direction in the input image, the deflection angle gamma inputs the function Y to obtain the weight Y (gamma) of the coding residual error, and the initial coding X1Fusing with the coding residual to obtain a modified code X2Wherein X is2=X1+Y(γ)×R(X1) Encoding the correction X2Input to a decoder De, and generate a front face image I by deconvolutionf
In step 3), the discriminating network D is a convolutional neural network-based classifier for determining whether the input image is from the generator G or the original image data; the face identity characteristic extractor F adopts the Light-CNN-29 with an opened source, the Light-CNN29 is a lightweight convolutional neural network, the depth is 29 layers, and the number of parameters is 1200 ten thousand.
In step 4), the objective of the loss function is to minimize the synthesized frontal face image IfAnd a non-synthesized frontal face image IgThereby making a composite front face image IfMore identity information of the input face image can be reserved; the loss function used in the step 4) comprises a self-created coding loss function besides a pixel loss function, an identity loss function, a symmetrical loss function and a counter loss function which are commonly used in the same type of method, wherein the first target of the coding loss function is to input a face image IaAnd a non-synthesized frontal face image IgThe codes obtained by the encoders En are closer, and because the effect of the existing face posture correction method is better under the condition that the deflection angle of the input face is smaller, the closer the code of the input face image is to the code of the same face under the condition of 0-degree deflection angle, the better the synthesized face correcting effect is;thus, the first part of the coding loss function is formulated as follows:
Figure GDA0003260741700000041
where N is the dimension of the respective code, En is the encoder, En (I)a)iIs the initial coding En (I)a) The value of the ith dimension of (a); r is the coding residual network, R (En (I)a))iIs the coded residual R (En (I)a) A value of the ith dimension of); en (I)a)i+R(En(Ia))iA value equal to the ith dimension of the modified code; x3 iIs a front face code X obtained by a coder from a non-synthesized front face image3The value of the ith dimension of (a); the first part of the coding loss function is the Manhattan distance between the modified code and the front face code;
the second objective of the coding loss function is to differentiate between different characters in the modified code En (I)a)+R(En(Ia) A full-connected layer C is constructed after the full-connected layer C, the number of neurons of the full-connected layer C is equal to the number M of people in the training set, and the second part of the coding loss function adopts a cross-entropy loss function:
Figure GDA0003260741700000042
wherein M is the number of people in the training set; y isiIs the value of the ith dimension of a one-hot vector y indicating the input face image IaWhich person belongs to the training set, if image IaBelonging to the jth person, the jth dimension of the vector y is 1, the remaining dimensions are 0, and the y dimension is M; en (I)a)+R(En(Ia) Is the correction code; c (En (I)a)+R(En(Ia)))iIs to correct the feature vector C (En (I) obtained by encoding through the full connection layer Ca)+R(En(Ia) ) a value of the ith dimension;
thus, the complete coding loss function is:
Lcode=Lcode1+λLcode2
in the formula, λ is a constant having a value of 0.1, and represents a weight.
In the step 4), the generator G and the discrimination network D are alternately trained to enable the two to be mutually optimized and improved in confrontation; in the initial stage, the face image generated by the generator G is blurred, and the judgment network D can easily judge the source of the input image, so that the generator G is stimulated to generate a clearer image, and the quality of the generator G is improved; in the subsequent stage, the image generated by the generator G is clearer and is close to the original image data, so that the judgment network is excited to judge the input image more accurately, and the judgment capability of the judgment network D is improved.
Compared with the prior art, the invention has the following advantages and beneficial effects:
1. the invention fully utilizes the rotation angle information of the input face image, defines the related coding loss function and is beneficial to synthesizing the front face image with higher quality.
2. When the deflection angle of the input face image exceeds 60 degrees, the invention can also generate clear and vivid front face images without deformation.
3. The front face image synthesized by the invention can keep the identity information of the input face image, thereby being beneficial to reducing the adverse effect of face posture transformation on face identification and bringing convenience to the subsequent face identity identification work.
4. From the analysis of the actual application scene, the method is expected to promote the development of the fields of suspects tracking and the like. The efficiency of the related work is improved by rectifying the side face image of the target person into the front face image.
Drawings
FIG. 1 is a flow chart of the method of the present invention.
Fig. 2 is a generator composite image flow diagram.
Fig. 3 is a diagram of an encoder neural network architecture.
Fig. 4 is a diagram of a decoder neural network architecture.
Fig. 5 is a diagram of a discrimination network structure.
FIG. 6 is a diagram showing the effect of the present invention.
Detailed Description
To describe the present invention more specifically, the following detailed description will explain the technical solution of the present invention in conjunction with the steps, the accompanying drawings and the detailed description.
As shown in fig. 1, the method for generating a multi-pose face image obverse based on a confrontation network provided by this embodiment includes the following steps:
1) collecting the face images of all postures as a training set and a test set, wherein each input face image I of any posture needs to be ensuredaThe non-synthesized front face image I with the same face and the face deflection angle of 0 degree can be found in the data setg
The images of the dataset used were derived from the Multi _ Pie face dataset, which contained 337 images of the individual under 20 illuminations and 15 poses, the number of pictures in the dataset being over 75 million, containing 337 images of the individual under 20 illuminations and 15 poses; the illumination of the picture is changed from dark to light by illumination numbers 01 to 20, wherein the illumination number 07 is a standard illumination condition; marking all face images of a data set as IaFor each image IaIn the data set, the images of the same person with a face deflection angle of 0 ° and an illumination index of 07 are found, and they are marked as Ig. The data set is subjected to preprocessing steps such as face detection, face interception and the like before use. Selecting 13 postures within 90 degrees of a deflection angle, taking images under all lighting conditions as a data set, dividing the images of the first 200 persons into a training set, and dividing the images of the remaining 137 persons into a testing set. All images of the dataset were normalized and resize operated. Wherein, the normalization is to divide the value of all pixels of the image by 255.0 to make the value range of all pixels of the image be [0,1]Resize refers to adjusting the dimensions of all images in a dataset to 128 × 128 × 3 using bilinear interpolation.
2) In the training stage, the face image I with any posture in the training set is processedaInput into generator G to obtain corrected code X2And the synthesized frontal face imageIf. Non-synthesized face image IgInputting into a generator G to obtain a face-up code X3
The generator G is used for inputting a face image IaConverted into a composite frontal face image IfThe system consists of an attitude estimator P, an encoder En, an encoding residual error network, an R and a decoder De. As shown in fig. 2, the process of the generator G synthesizing the picture is: inputting human face image I with any posture to generator GaThe encoder En converts it to the initial code X1The coding residual network R is formed by initially coding X1Estimating a coded residual R (X)1) The pose estimator P calculates the accurate deflection angle gamma of the face in the yaw direction in the input image, the deflection angle gamma inputs the function Y to obtain the weight Y (gamma) of the coding residual error, and the initial coding X1Fusing with the coding residual to obtain a modified code X2Wherein X is2=X1+Y(γ)×R(X1) Encoding the correction X2Input to a decoder De, and generate a front face image I by deconvolutionf
The pose estimator P is used for solving the accurate face deflection angle γ of the input face image in the yaw direction, and further obtaining the weight of the encoded residual error:
Figure GDA0003260741700000071
the pose estimator P estimates the pose of the face using the PnP algorithm. The cv2.solvepnp () function in opencv library is used directly at the time of implementation. The parameters required by the function comprise 2D characteristic points of the face image and 3D positions corresponding to the characteristic points. The 2D characteristic points of the human face are directly obtained by a human face characteristic point detection algorithm provided by an open source dlib library. The 3D positions of the individual feature points are the positions of the feature points on the average face model, and these 3D positions are fixed and provided by the relevant documents of the 2D feature point detection algorithm in the dlib library.
The encoder En is a convolutional neural network, the network structure of which is shown in fig. 3, and the role of the encoder En is to input a human face IaConversion to initial code X1. Wherein an image I is inputaDimension of (2) is 128 × 128 × 3, and the activation function of the last layer of the encoderMaxout, the initial code X of the output1The dimension is 256.
The coding residual error network R is a fully-connected neural network with two layers, and the number of neurons in the two layers is 256. Let a non-synthesized front face image IgThe code obtained by the encoder is X3The effect of the coding residual network R is to estimate the initial code X1And face-up coding X3Coded residual R (X) between1). Multiplying the encoded residual by a weight and fusing the initial encoding X1To obtain a corrected code X2Wherein X is2=X1+Y(γ)×R(X1)。
The decoder De belongs to a deconvolution neural network, the network structure of which is shown in fig. 4. Its effect is to correct the code X2Decoding to obtain a composite frontal face image I by a deconvolution stepf. For the existing face correction method based on generation of the confrontation network, the smaller the input face deflection angle is, the higher the quality of the synthesized face image is, and the more the personal identity information can be kept in the image. Due to the modified code X2Than initial code X1More closely to face-up code X3By modifying the code X2Instead of the initial code X1And the input decoder can synthesize a human face image with higher quality.
3) Synthesized front face image IfOr non-synthesized frontal face image IgAnd inputting a judging network D, and judging whether the input face image is synthesized or not by the judging network D. Then combining the synthesized front face image IfOr non-synthesized frontal face image IgAnd inputting the human face identity characteristic extractor F to obtain the character identity characteristic of the input human face image.
The decision network D is a convolutional neural network-based classifier whose network structure is shown in fig. 5, and functions to determine whether the input image is from the generator G or from the original image data, and the final output value of the decision network is used to represent the probability that the input image is from the generator G, and the larger this value, the more likely the input image is from the generator G.
The face identity characteristic extractor F adopts the Light-CNN-29 with the open source. Light-CNN29 is a lightweight convolutional neural network, the depth is 29 layers, the number of parameters is about 1200 ten thousand, the identity features of the input face picture can be extracted, and the dimensionality of the finally extracted identity features is 256.
4) Combining the discrimination result of the step 3) with the extracted person identity characteristics and the synthesized front face image IfNon-synthesized front face image IgCorrecting code X2And face-up coding X3And (4) carrying in each pre-designed loss function, and alternately training the generator G and the discrimination network D until the training is finished.
The loss function of the training generator comprises a self-created coding loss function besides a pixel loss function, an identity loss function, a symmetric loss function and a confrontation loss function which are commonly used in the same type of method.
First is a pixel loss function that represents the composite frontal face image IfAnd a non-synthesized frontal face image IgThe pixel difference between them, the formula is as follows:
Figure GDA0003260741700000091
where W and H represent the width and height of the image, respectively.
Figure GDA0003260741700000092
And
Figure GDA0003260741700000093
respectively representing the synthesized front face image IfAnd a non-synthesized frontal face image IgPixel value at coordinate (x, y).
Then a symmetrical loss function is carried out, and a synthesized face image I is obtained in view of the symmetrical characteristic of the facefShould be compared with the image I obtained after it is flipped left and rightsymAs close as possible, the symmetric loss function is formulated as follows:
Figure GDA0003260741700000094
where W and H represent the width and height of the image, respectively.
Figure GDA0003260741700000095
And
Figure GDA0003260741700000096
respectively representing the synthesized front face image IfAnd an image I obtained by left-right turningsymPixel value at coordinate (x, y).
Then, the face identity characteristic extractor F can efficiently obtain the identity characteristic of the front face image by an identity loss function. Respectively synthesizing the front face images IfAnd a non-synthesized frontal face image IgRespectively inputting F to obtain their identity characteristics F (I)f) And F (I)g). To ensure a composite frontal face image IfCan contain non-synthesized frontal face image IgThe identity information of (2) needs to minimize the manhattan distance between the last two layers of identity feature maps obtained after inputting them into F. The identity loss function is formulated as follows:
Figure GDA0003260741700000097
Wiand HiRespectively representing the width and the height of the face identity characteristic map of the ith last layer. And F is a face identity feature extractor.
Figure GDA0003260741700000101
And
Figure GDA0003260741700000102
respectively representing a frontal face image IfAnd a non-synthesized frontal face image IgThe value of the coordinate (x, y) on the identity profile at the i-last level.
And finally, a loss resisting function, which aims to enable the synthesized image to confuse the discrimination network D, thereby enabling the synthesized image to be closer to a real image and enhancing the fidelity of the synthesized image. The formula for the penalty is as follows:
Figure GDA0003260741700000103
where N is the size of the current training batch, G and D are the generator and discrimination network, respectively, and IaAnd G (I)a) Respectively representing the input face image and the front face image synthesized by the generator. D (G (I)a) The value of) reflects the composite image G (I)a) The determined network D determines the possibility of synthesizing the picture. Minimizing the loss function LadvFor the purpose of the generator to synthesize a frontal face image G (I)a) The composite front face image G (I) can be improved by the detection of the discrimination networka) The degree of realism of the device.
The first goal of the coding loss function is to let the input face image IaAnd a non-synthesized frontal face image IgThe codes obtained by the encoders En are closer, and because the effect of the existing face posture correction method is better under the condition that the deflection angle of the input face is smaller, the closer the code of the input face image is to the code of the same face under the condition of 0-degree deflection angle, the better the synthesized face correcting effect is; thus, the first part of the coding loss function is formulated as follows:
Figure GDA0003260741700000104
where N is the dimension of the respective code, En is the encoder, En (I)a)iIs the initial coding En (I)a) The value of the ith dimension of (a); r is the coding residual network, R (En (I)a))iIs the coded residual R (En (I)a) A value of the ith dimension of); en (I)a)i+R(En(Ia))iA value equal to the ith dimension of the modified code; x3i is the frontal face code X obtained by the coder of the non-synthesized frontal face image3The value of the ith dimension of (a); the first part of the coding loss function is the manhattan distance between the modified code and the front face code.
Said coding loss functionThe second goal of the number is to differentiate between modified codes for different people, in which modified code En (I)a)+R(En(Ia) A full-connected layer C is constructed, the number of neurons of the full-connected layer C is equal to the number M of people in the training set, and the second part of the coding loss function adopts a cross-entropy loss function:
Figure GDA0003260741700000111
wherein M is the number of people in the training set; y isiIs the value of the ith dimension of a one-hot vector y indicating the input face image IaWhich person belongs to the training set, if image IaBelonging to the jth person, the jth dimension of the vector y is 1, the remaining dimensions are 0, and the y dimension is M; en (I)a)+R(En(Ia) Is the correction code; c (En (I)a)+R(En(Ia)))iIs to correct the feature vector C (En (I) obtained by encoding through the full connection layer Ca)+R(En(Ia) ) of the ith dimension.
Thus, the complete coding loss function is:
Lcode=Lcode1+λLcode2
in the formula, λ is a constant having a value of 0.1, and represents a weight.
In summary, the loss function of the training generator is:
Ltotal=Lpixel1Lsym2Lid3Ladv4Lcode
wherein L ispixel、Lsym、Lid、LadvAnd LcodeRespectively representing a pixel loss function, a symmetric loss function, an identity loss function, an opponent loss function and an encoding loss function. Lambda [ alpha ]1、λ2、λ3And λ4Representing the weights of the different loss functions.
Referring to parameter settings of the same type of method and a large amount of experimental experience, each lossWeight of function lambda1、λ2、λ3And λ4Are set to 0.2, 0.003, 0.001 and 0.002, respectively. Training a generator and a discrimination network D, wherein the discrimination network D is required to be trained at the same time, the goal of the training D is to enable the discrimination network to distinguish whether the input frontal face image is from the generator G or the original data set, and the loss function of the training discrimination network is as follows:
Figure GDA0003260741700000121
where N is the size of the current training batch, G and D are the generator and discrimination network, respectively, and IaIs the input image. G (I)a) And IgRespectively representing a composite frontal face image and a non-composite frontal face image. logD (G (I)a) And logD (I)a) Respectively reflect the synthesized front face image G (I)a) And a non-synthesized frontal face image IgThe determined network D determines the possibility of the synthesized frontal face image. Minimization of Ladv2The purpose of (1) is to enable the discrimination network D to accurately reflect the possibility that the input picture is a composite picture.
Training the generator G and the discrimination network D alternately can enable the two to be mutually optimized and improved in confrontation. In the initial stage, the face image generated by the generator G is blurred, and the judgment network D can easily judge the source of the input image, so that the generator G is stimulated to generate a clearer image, and the quality of the generator G is improved. In the subsequent stage, the image generated by the generator G is clearer and is close to the original image data, so that the judgment network is excited to judge the input image more accurately, and the judgment capability of the judgment network D is improved.
After the loss functions of the generator G and the discrimination network D are designed, parameters of the generated countermeasure network are minimized through an Adam descent method, the learning rate is set to be 0.0002, and the batch size is set to be 12. After each training of generator G, discriminant network D is trained once. With the increase of the training times, the quality of the image generated by the generator is continuously improved, the capability of the discrimination network for discriminating the input image is continuously enhanced, and the training is finally completed. The deep learning frame used in the experiment is Tensorflow, the video card of the computer is 1080ti, and the training is stopped when 2 ten thousand batches are trained.
5) In the testing stage, the human face image I with any input posture is subjected toaThe training generator G can be used for synthesizing the front face image I of the same human facefAnd then by direct observation of the resultant frontal face image IfThe effect of the invention can be verified, and the effect of generating images is as shown in fig. 6, for each line of pictures, the first is a face image with an input deflection angle exceeding 45 °, the second is a front face image synthesized by the generator, and the third is a non-synthesized front face image of the same person in the data set. As can be seen from the figure, for the face images with the deflection angle exceeding 45 degrees, the invention can synthesize the front face images of the face images and can keep the identity information of the original face.
The above-mentioned embodiments are merely preferred embodiments of the present invention, and the scope of the present invention is not limited thereto, so that the changes in the shape and principle of the present invention should be covered within the protection scope of the present invention.

Claims (6)

1. A multi-pose face image obverse method based on a generation countermeasure network is characterized by comprising the following steps:
1) collecting the face images of all postures as a training set and a test set, wherein each input face image I of any posture needs to be ensuredaCan find the non-synthesized front face image I of the same person in the data setg
2) In the training stage, the face image I with any posture in the training set is processedaInput into generator G to obtain corrected code X2And a synthesized front face image IfA non-synthesized front face image IgInputting into a generator G to obtain a face-up code X3
3) Combining the synthesized front face image IfOr non-synthesized frontal face image IgInputting a discrimination network D for discriminating whether the input face image is synthesized or not, and synthesizing the synthesized face image IfOr non-synthesized frontal face image IgInputting a human face identity characteristic extractor F, and extracting the character identity characteristic of the human face image through the F;
4) combining the discrimination result of the step 3) with the extracted person identity characteristics and the synthesized front face image IfNon-synthesized front face image IgCorrecting code X2And face-up coding X3Carrying in each pre-designed loss function, and alternately training the generator G and the discrimination network D until the training is finished;
5) in the testing stage, the human face image I with any posture is takenaInputting the trained generator G to obtain a synthesized front face image IfBy direct observation of the resultant frontal image IfThe effect is verified.
2. The method for generating the multi-pose face image obverse based on the confrontation network as claimed in claim 1, wherein: in step 1), all face images of the respective poses used by the data set are derived from the data set Multi _ Pie; the number of pictures in the data set is more than 75 ten thousand, and the images under 20 kinds of illumination and 15 kinds of postures of 337 individuals are included; the illumination of the picture is changed from dark to light by illumination numbers 01 to 20, wherein the illumination number 07 is a standard illumination condition; marking all face images of a data set as IaFor each image IaIn the data set, the images of the same person with a face deflection angle of 0 ° and an illumination index of 07 are found, and they are marked as Ig
3. The method for generating the multi-pose face image obverse based on the confrontation network as claimed in claim 1, wherein: in step 2), the generator G consists of an attitude estimator P, an encoder En, an encoding residual error network R and a decoder De; the pose estimator P estimates the pose of the face by adopting a PnP algorithm, further obtains the deflection angle of the face in the yaw direction, and is realized by a function cv2.solvePnP () of an open-source opencv library; the encoder En is a convolutional neural network, the encoding residual error network R is a two-layer fully-connected neural network, and the decoder De is a deconvolution neural network;
the process of the generator G for synthesizing the picture is as follows: inputting human face image I with any posture to generator GaThe encoder En converts it to the initial code X1The coding residual network R is formed by initially coding X1Estimating a coded residual R (X)1) The pose estimator P calculates the accurate deflection angle gamma of the face in the yaw direction in the input image, the deflection angle gamma inputs the function Y to obtain the weight Y (gamma) of the coding residual error, and the initial coding X1Fusing with the coding residual to obtain a modified code X2Wherein X is2=X1+Y(γ)×R(X1) Encoding the correction X2Input to a decoder De, and generate a front face image I by deconvolutionf
4. The method for generating the multi-pose face image obverse based on the confrontation network as claimed in claim 1, wherein: in step 3), the discriminating network D is a convolutional neural network-based classifier for determining whether the input image is from the generator G or the original image data; the face identity characteristic extractor F adopts the Light-CNN-29 with an opened source, the Light-CNN29 is a lightweight convolutional neural network, the depth is 29 layers, and the number of parameters is 1200 ten thousand.
5. The method for generating the multi-pose face image obverse based on the confrontation network as claimed in claim 1, wherein: in step 4), the objective of the loss function is to minimize the synthesized frontal face image IfAnd a non-synthesized frontal face image IgThereby making a composite front face image IfMore identity information of the input face image can be reserved; the loss function used in the step 4) comprises a self-created coding loss function besides a pixel loss function, an identity loss function, a symmetrical loss function and a counter loss function which are commonly used in the same type of method, wherein the first target of the coding loss function is to input a face image IaAnd a non-synthesized frontal face image IgThe codes obtained by the encoders En are more similar, because the present human face posture correction method is in the process of inputtingThe effect is better under the condition that the deflection angle of the human face is smaller, so that the closer the code of the input human face image is to the code of the same human face under the condition of 0-degree deflection angle, the better the synthesized front face effect is; thus, the first part of the coding loss function is formulated as follows:
Figure FDA0003260741690000031
where N is the dimension of the respective code, En is the encoder, En (I)a)iIs the initial coding En (I)a) The value of the ith dimension of (a); r is the coding residual network, R (En (I)a))iIs the coded residual R (En (I)a) A value of the ith dimension of); en (I)a)i+R(En(Ia))iA value equal to the ith dimension of the modified code;
Figure FDA0003260741690000032
is a front face code X obtained by a coder from a non-synthesized front face image3The value of the ith dimension of (a); the first part of the coding loss function is the Manhattan distance between the modified code and the front face code;
the second objective of the coding loss function is to differentiate between different characters in the modified code En (I)a)+R(En(Ia) A full-connected layer C is constructed after the full-connected layer C, the number of neurons of the full-connected layer C is equal to the number M of people in the training set, and the second part of the coding loss function adopts a cross-entropy loss function:
Figure FDA0003260741690000033
wherein M is the number of people in the training set; y isiIs the value of the ith dimension of a one-hot vector y indicating the input face image IaWhich person belongs to the training set, if image IaBelonging to the j-th person, the j-th dimension of the vector y has a value of 1, the remaining dimensions have values of 0, and the y-dimension has a value ofM;En(Ia)+R(En(Ia) Is the correction code; c (En (I)a)+R(En(Ia)))iIs to correct the feature vector C (En (I) obtained by encoding through the full connection layer Ca)+R(En(Ia) ) a value of the ith dimension;
thus, the complete coding loss function is:
Lcode=Lcode1+λLcode2
in the formula, λ is a constant having a value of 0.1, and represents a weight.
6. The method for generating the multi-pose face image obverse based on the confrontation network as claimed in claim 1, wherein: in the step 4), the generator G and the discrimination network D are alternately trained to enable the two to be mutually optimized and improved in confrontation; in the initial stage, the face image generated by the generator G is blurred, and the judgment network D can easily judge the source of the input image, so that the generator G is stimulated to generate a clearer image, and the quality of the generator G is improved; in the subsequent stage, the image generated by the generator G is clearer and is close to the original image data, so that the judgment network is excited to judge the input image more accurately, and the judgment capability of the judgment network D is improved.
CN201910806159.5A 2019-08-29 2019-08-29 Multi-pose face image obverse method based on generation countermeasure network Expired - Fee Related CN110543846B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910806159.5A CN110543846B (en) 2019-08-29 2019-08-29 Multi-pose face image obverse method based on generation countermeasure network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910806159.5A CN110543846B (en) 2019-08-29 2019-08-29 Multi-pose face image obverse method based on generation countermeasure network

Publications (2)

Publication Number Publication Date
CN110543846A CN110543846A (en) 2019-12-06
CN110543846B true CN110543846B (en) 2021-12-17

Family

ID=68710718

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910806159.5A Expired - Fee Related CN110543846B (en) 2019-08-29 2019-08-29 Multi-pose face image obverse method based on generation countermeasure network

Country Status (1)

Country Link
CN (1) CN110543846B (en)

Families Citing this family (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111144240B (en) * 2019-12-12 2023-02-07 深圳数联天下智能科技有限公司 Image processing method and related equipment
CN111428667A (en) * 2020-03-31 2020-07-17 天津中科智能识别产业技术研究院有限公司 Human face image correcting method for generating confrontation network based on decoupling expression learning
CN111931484B (en) * 2020-07-31 2022-02-25 贵州多彩宝互联网服务有限公司 Data transmission method based on big data
CN111856962A (en) * 2020-08-13 2020-10-30 郑州智利信信息技术有限公司 Intelligent home control system based on cloud computing
CN111985995A (en) * 2020-08-14 2020-11-24 足购科技(杭州)有限公司 WeChat applet-based shoe virtual fitting method and device
US11810397B2 (en) 2020-08-18 2023-11-07 Samsung Electronics Co., Ltd. Method and apparatus with facial image generating
CN112164002B (en) * 2020-09-10 2024-02-09 深圳前海微众银行股份有限公司 Training method and device of face correction model, electronic equipment and storage medium
CN114981835A (en) * 2020-10-29 2022-08-30 京东方科技集团股份有限公司 Training method and device of face reconstruction model, face reconstruction method and device, electronic equipment and readable storage medium
CN112818850B (en) * 2021-02-01 2023-02-10 华南理工大学 Cross-posture face recognition method and system based on progressive neural network and attention mechanism
CN113140015B (en) * 2021-04-13 2023-03-14 杭州欣禾圣世科技有限公司 Multi-view face synthesis method and system based on generation countermeasure network
CN113361489B (en) * 2021-07-09 2022-09-16 重庆理工大学 Decoupling representation-based face orthogonalization model construction method and training method
CN113469269A (en) * 2021-07-16 2021-10-01 上海电力大学 Residual convolution self-coding wind-solar-charged scene generation method based on multi-channel fusion
CN114049250B (en) * 2022-01-13 2022-04-12 广州卓腾科技有限公司 Method, device and medium for correcting face pose of certificate photo

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107239766A (en) * 2017-06-08 2017-10-10 深圳市唯特视科技有限公司 A kind of utilization resists network and the significantly face of three-dimensional configuration model ajusts method
CN109815928A (en) * 2019-01-31 2019-05-28 中国电子进出口有限公司 A kind of face image synthesis method and apparatus based on confrontation study

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10474880B2 (en) * 2017-03-15 2019-11-12 Nec Corporation Face recognition using larger pose face frontalization

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107239766A (en) * 2017-06-08 2017-10-10 深圳市唯特视科技有限公司 A kind of utilization resists network and the significantly face of three-dimensional configuration model ajusts method
CN109815928A (en) * 2019-01-31 2019-05-28 中国电子进出口有限公司 A kind of face image synthesis method and apparatus based on confrontation study

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
Multi-poses Face Frontalization based on Pose Weighted GAN;Jiaxin Ma et al.;《2019 IEEE 3rd Information Technology, Networking, Electronic and Automation Control Conference (ITNEC)》;20190606;1271-1276 *
Pose-Weighted Gan for Photorealistic Face Frontalization;Sufang Zhang et al.;《2019 IEEE International Conference on Image Processing (ICIP)》;20190826;2384-2388 *
基于生成对抗的人脸正面化生成;钱一琛;《中国优秀硕士学位论文全文数据库 (信息科技辑)》;20190815;I138-1148 *

Also Published As

Publication number Publication date
CN110543846A (en) 2019-12-06

Similar Documents

Publication Publication Date Title
CN110543846B (en) Multi-pose face image obverse method based on generation countermeasure network
CN108537743B (en) Face image enhancement method based on generation countermeasure network
CN110443143B (en) Multi-branch convolutional neural network fused remote sensing image scene classification method
CN109543606B (en) Human face recognition method with attention mechanism
CN109657595B (en) Key feature region matching face recognition method based on stacked hourglass network
CN111401257B (en) Face recognition method based on cosine loss under non-constraint condition
CN106960202B (en) Smiling face identification method based on visible light and infrared image fusion
CN110348330B (en) Face pose virtual view generation method based on VAE-ACGAN
CN110909690B (en) Method for detecting occluded face image based on region generation
CN108268859A (en) A kind of facial expression recognizing method based on deep learning
CN112766160A (en) Face replacement method based on multi-stage attribute encoder and attention mechanism
CN112418074A (en) Coupled posture face recognition method based on self-attention
CN112950661A (en) Method for generating antithetical network human face cartoon based on attention generation
CN107423678A (en) A kind of training method and face identification method of the convolutional neural networks for extracting feature
CN1975759A (en) Human face identifying method based on structural principal element analysis
CN112418041B (en) Multi-pose face recognition method based on face orthogonalization
CN113963032A (en) Twin network structure target tracking method fusing target re-identification
CN111783748A (en) Face recognition method and device, electronic equipment and storage medium
CN111832405A (en) Face recognition method based on HOG and depth residual error network
CN112836625A (en) Face living body detection method and device and electronic equipment
CN111222433A (en) Automatic face auditing method, system, equipment and readable storage medium
CN113724354A (en) Reference image color style-based gray level image coloring method
CN114387641A (en) False video detection method and system based on multi-scale convolutional network and ViT
CN109360179A (en) A kind of image interfusion method, device and readable storage medium storing program for executing
CN113378949A (en) Dual-generation confrontation learning method based on capsule network and mixed attention

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20211217