CN110543846B - Multi-pose face image obverse method based on generation countermeasure network - Google Patents
Multi-pose face image obverse method based on generation countermeasure network Download PDFInfo
- Publication number
- CN110543846B CN110543846B CN201910806159.5A CN201910806159A CN110543846B CN 110543846 B CN110543846 B CN 110543846B CN 201910806159 A CN201910806159 A CN 201910806159A CN 110543846 B CN110543846 B CN 110543846B
- Authority
- CN
- China
- Prior art keywords
- face image
- image
- face
- network
- synthesized
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
- 238000000034 method Methods 0.000 title claims abstract description 39
- 238000012549 training Methods 0.000 claims abstract description 39
- 238000012937 correction Methods 0.000 claims abstract description 12
- 238000012360 testing method Methods 0.000 claims abstract description 8
- 230000036544 posture Effects 0.000 claims description 22
- 238000005286 illumination Methods 0.000 claims description 20
- 230000000694 effects Effects 0.000 claims description 16
- 239000002131 composite material Substances 0.000 claims description 12
- 238000013527 convolutional neural network Methods 0.000 claims description 9
- 230000002194 synthesizing effect Effects 0.000 claims description 9
- 238000013528 artificial neural network Methods 0.000 claims description 8
- 210000002569 neuron Anatomy 0.000 claims description 4
- 230000008569 process Effects 0.000 claims description 4
- 230000009286 beneficial effect Effects 0.000 abstract description 4
- 238000011161 development Methods 0.000 abstract description 2
- 230000006870 function Effects 0.000 description 54
- 238000010586 diagram Methods 0.000 description 5
- 238000001514 detection method Methods 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 4
- 230000002411 adverse Effects 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 description 1
- 101150052583 CALM1 gene Proteins 0.000 description 1
- 101150095793 PICALM gene Proteins 0.000 description 1
- 230000004913 activation Effects 0.000 description 1
- 101150014174 calm gene Proteins 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000002708 enhancing effect Effects 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 230000006855 networking Effects 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/048—Activation functions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/168—Feature extraction; Face representation
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Oral & Maxillofacial Surgery (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Human Computer Interaction (AREA)
- Multimedia (AREA)
- Image Analysis (AREA)
- Image Processing (AREA)
Abstract
The invention discloses a multi-pose face obverse method based on generation of a confrontation network. In the testing stage after training, the invention can correct the input various pose face pictures into the front face image. The corrected image is clear, the identity characteristics of the original face are kept, and the face correction method can be used for face recognition. The invention can effectively slow down the negative influence of the posture factor on the face recognition, and is beneficial to the development of the practical application of the face recognition under the non-limiting condition.
Description
Technical Field
The invention relates to the technical field of image processing, in particular to a multi-pose face image obverse method based on a generation countermeasure network.
Background
At present, the face recognition technology is widely applied to a plurality of fields such as entrance guard security, social networking, finance and the like. However, in most practical scenes, the face recognition technology needs to be used efficiently under strict standard environment. Generally, a detected person needs to be in a scene with sufficient and uniform illumination, so that a calm expression is kept, and a standard posture is adjusted by matching with an image acquisition device. However, in many practical application fields, such as suspects tracking, the above conditions are often difficult to satisfy, which causes a great reduction in performance of many face recognition techniques, and face recognition techniques are difficult to popularize in these fields. Among the adverse factors affecting the performance of face recognition technology, the pose of a photographic face is the most important. The gesture problem is well processed, and the application of the face recognition technology in the non-limiting environment is greatly stepped.
One of the methods for dealing with the pose problem is to perform a positive correction on an input side face image, that is, correct one side face image into a front face image of the same person, and then recognize the person identity of the synthesized front face image. At present, most of multi-pose face image orthogonalization methods lack the processing capability of face images with deflection angles exceeding 60 degrees, and the face images synthesized by the methods have serious deformation and lose the identity characteristics of people, so that the subsequent face recognition work is difficult to perform. The positive method of the multi-pose face image with better effect is based on the generation of the confrontation network.
Compared with other multi-pose face image obverse methods based on the generation of the confrontation network, the method adopts different network structures and loss functions. Even if the face image with the deflection angle exceeding 60 degrees is input, the model can also synthesize vivid face images and retain more person identity information, and the efficiency of subsequent face recognition work is greatly improved.
Disclosure of Invention
The invention aims to overcome the defects of the existing multi-pose face image obverse method, provides a multi-pose face image obverse method based on a generated countermeasure network, solves the problem of poor effect of similar methods under the condition that the deflection angle of an input face exceeds 60 degrees, improves the fidelity of a synthesized face, and retains the identity information of more image faces.
In order to achieve the purpose, the technical scheme provided by the invention is as follows: a multi-pose face image frontal method based on a generation countermeasure network comprises the following steps:
1) collecting the face images of all postures as a training set and a test set, wherein each input face image I of any posture needs to be ensuredaCan find the non-synthesized front face image I of the same person in the data setg;
2) In the training stage, the training is focused on any postureFace image of state IaInput into generator G to obtain corrected code X2And a synthesized front face image IfA non-synthesized front face image IgInputting into a generator G to obtain a face-up code X3;
3) Combining the synthesized front face image IfOr non-synthesized frontal face image IgInputting a discrimination network D for discriminating whether the input face image is synthesized or not, and synthesizing the synthesized face image IfOr non-synthesized frontal face image IgInputting a human face identity characteristic extractor F, and extracting the character identity characteristic of the human face image through the F;
4) combining the discrimination result of the step 3) with the extracted person identity characteristics and the synthesized front face image IfNon-synthesized front face image IgCorrecting code X2And face-up coding X3Carrying in each pre-designed loss function, and alternately training the generator G and the discrimination network D until the training is finished;
5) in the testing stage, the human face image I with any posture is takenaInputting the trained generator G to obtain a synthesized front face image IfBy direct observation of the resultant frontal image IfThe effect is verified.
In step 1), all face images of the respective poses used by the data set are derived from the data set Multi _ Pie; the number of pictures in the data set is more than 75 ten thousand, and the images under 20 kinds of illumination and 15 kinds of postures of 337 individuals are included; the illumination of the picture is changed from dark to light by illumination numbers 01 to 20, wherein the illumination number 07 is a standard illumination condition; marking all face images of a data set as IaFor each image IaIn the data set, the images of the same person with a face deflection angle of 0 ° and an illumination index of 07 are found, and they are marked as Ig。
In step 2), the generator G consists of an attitude estimator P, an encoder En, an encoding residual error network R and a decoder De; the pose estimator P estimates the pose of the face by adopting a PnP algorithm, further obtains the deflection angle of the face in the yaw direction, and is realized by a function cv2.solvePnP () of an open-source opencv library; the encoder En is a convolutional neural network, the encoding residual error network R is a two-layer fully-connected neural network, and the decoder De is a deconvolution neural network;
the process of the generator G for synthesizing the picture is as follows: inputting human face image I with any posture to generator GaThe encoder En converts it to the initial code X1The coding residual network R is formed by initially coding X1Estimating a coded residual R (X)1) The pose estimator P calculates the accurate deflection angle gamma of the face in the yaw direction in the input image, the deflection angle gamma inputs the function Y to obtain the weight Y (gamma) of the coding residual error, and the initial coding X1Fusing with the coding residual to obtain a modified code X2Wherein X is2=X1+Y(γ)×R(X1) Encoding the correction X2Input to a decoder De, and generate a front face image I by deconvolutionf。
In step 3), the discriminating network D is a convolutional neural network-based classifier for determining whether the input image is from the generator G or the original image data; the face identity characteristic extractor F adopts the Light-CNN-29 with an opened source, the Light-CNN29 is a lightweight convolutional neural network, the depth is 29 layers, and the number of parameters is 1200 ten thousand.
In step 4), the objective of the loss function is to minimize the synthesized frontal face image IfAnd a non-synthesized frontal face image IgThereby making a composite front face image IfMore identity information of the input face image can be reserved; the loss function used in the step 4) comprises a self-created coding loss function besides a pixel loss function, an identity loss function, a symmetrical loss function and a counter loss function which are commonly used in the same type of method, wherein the first target of the coding loss function is to input a face image IaAnd a non-synthesized frontal face image IgThe codes obtained by the encoders En are closer, and because the effect of the existing face posture correction method is better under the condition that the deflection angle of the input face is smaller, the closer the code of the input face image is to the code of the same face under the condition of 0-degree deflection angle, the better the synthesized face correcting effect is;thus, the first part of the coding loss function is formulated as follows:
where N is the dimension of the respective code, En is the encoder, En (I)a)iIs the initial coding En (I)a) The value of the ith dimension of (a); r is the coding residual network, R (En (I)a))iIs the coded residual R (En (I)a) A value of the ith dimension of); en (I)a)i+R(En(Ia))iA value equal to the ith dimension of the modified code; x3 iIs a front face code X obtained by a coder from a non-synthesized front face image3The value of the ith dimension of (a); the first part of the coding loss function is the Manhattan distance between the modified code and the front face code;
the second objective of the coding loss function is to differentiate between different characters in the modified code En (I)a)+R(En(Ia) A full-connected layer C is constructed after the full-connected layer C, the number of neurons of the full-connected layer C is equal to the number M of people in the training set, and the second part of the coding loss function adopts a cross-entropy loss function:
wherein M is the number of people in the training set; y isiIs the value of the ith dimension of a one-hot vector y indicating the input face image IaWhich person belongs to the training set, if image IaBelonging to the jth person, the jth dimension of the vector y is 1, the remaining dimensions are 0, and the y dimension is M; en (I)a)+R(En(Ia) Is the correction code; c (En (I)a)+R(En(Ia)))iIs to correct the feature vector C (En (I) obtained by encoding through the full connection layer Ca)+R(En(Ia) ) a value of the ith dimension;
thus, the complete coding loss function is:
Lcode=Lcode1+λLcode2
in the formula, λ is a constant having a value of 0.1, and represents a weight.
In the step 4), the generator G and the discrimination network D are alternately trained to enable the two to be mutually optimized and improved in confrontation; in the initial stage, the face image generated by the generator G is blurred, and the judgment network D can easily judge the source of the input image, so that the generator G is stimulated to generate a clearer image, and the quality of the generator G is improved; in the subsequent stage, the image generated by the generator G is clearer and is close to the original image data, so that the judgment network is excited to judge the input image more accurately, and the judgment capability of the judgment network D is improved.
Compared with the prior art, the invention has the following advantages and beneficial effects:
1. the invention fully utilizes the rotation angle information of the input face image, defines the related coding loss function and is beneficial to synthesizing the front face image with higher quality.
2. When the deflection angle of the input face image exceeds 60 degrees, the invention can also generate clear and vivid front face images without deformation.
3. The front face image synthesized by the invention can keep the identity information of the input face image, thereby being beneficial to reducing the adverse effect of face posture transformation on face identification and bringing convenience to the subsequent face identity identification work.
4. From the analysis of the actual application scene, the method is expected to promote the development of the fields of suspects tracking and the like. The efficiency of the related work is improved by rectifying the side face image of the target person into the front face image.
Drawings
FIG. 1 is a flow chart of the method of the present invention.
Fig. 2 is a generator composite image flow diagram.
Fig. 3 is a diagram of an encoder neural network architecture.
Fig. 4 is a diagram of a decoder neural network architecture.
Fig. 5 is a diagram of a discrimination network structure.
FIG. 6 is a diagram showing the effect of the present invention.
Detailed Description
To describe the present invention more specifically, the following detailed description will explain the technical solution of the present invention in conjunction with the steps, the accompanying drawings and the detailed description.
As shown in fig. 1, the method for generating a multi-pose face image obverse based on a confrontation network provided by this embodiment includes the following steps:
1) collecting the face images of all postures as a training set and a test set, wherein each input face image I of any posture needs to be ensuredaThe non-synthesized front face image I with the same face and the face deflection angle of 0 degree can be found in the data setg。
The images of the dataset used were derived from the Multi _ Pie face dataset, which contained 337 images of the individual under 20 illuminations and 15 poses, the number of pictures in the dataset being over 75 million, containing 337 images of the individual under 20 illuminations and 15 poses; the illumination of the picture is changed from dark to light by illumination numbers 01 to 20, wherein the illumination number 07 is a standard illumination condition; marking all face images of a data set as IaFor each image IaIn the data set, the images of the same person with a face deflection angle of 0 ° and an illumination index of 07 are found, and they are marked as Ig. The data set is subjected to preprocessing steps such as face detection, face interception and the like before use. Selecting 13 postures within 90 degrees of a deflection angle, taking images under all lighting conditions as a data set, dividing the images of the first 200 persons into a training set, and dividing the images of the remaining 137 persons into a testing set. All images of the dataset were normalized and resize operated. Wherein, the normalization is to divide the value of all pixels of the image by 255.0 to make the value range of all pixels of the image be [0,1]Resize refers to adjusting the dimensions of all images in a dataset to 128 × 128 × 3 using bilinear interpolation.
2) In the training stage, the face image I with any posture in the training set is processedaInput into generator G to obtain corrected code X2And the synthesized frontal face imageIf. Non-synthesized face image IgInputting into a generator G to obtain a face-up code X3。
The generator G is used for inputting a face image IaConverted into a composite frontal face image IfThe system consists of an attitude estimator P, an encoder En, an encoding residual error network, an R and a decoder De. As shown in fig. 2, the process of the generator G synthesizing the picture is: inputting human face image I with any posture to generator GaThe encoder En converts it to the initial code X1The coding residual network R is formed by initially coding X1Estimating a coded residual R (X)1) The pose estimator P calculates the accurate deflection angle gamma of the face in the yaw direction in the input image, the deflection angle gamma inputs the function Y to obtain the weight Y (gamma) of the coding residual error, and the initial coding X1Fusing with the coding residual to obtain a modified code X2Wherein X is2=X1+Y(γ)×R(X1) Encoding the correction X2Input to a decoder De, and generate a front face image I by deconvolutionf。
The pose estimator P is used for solving the accurate face deflection angle γ of the input face image in the yaw direction, and further obtaining the weight of the encoded residual error:the pose estimator P estimates the pose of the face using the PnP algorithm. The cv2.solvepnp () function in opencv library is used directly at the time of implementation. The parameters required by the function comprise 2D characteristic points of the face image and 3D positions corresponding to the characteristic points. The 2D characteristic points of the human face are directly obtained by a human face characteristic point detection algorithm provided by an open source dlib library. The 3D positions of the individual feature points are the positions of the feature points on the average face model, and these 3D positions are fixed and provided by the relevant documents of the 2D feature point detection algorithm in the dlib library.
The encoder En is a convolutional neural network, the network structure of which is shown in fig. 3, and the role of the encoder En is to input a human face IaConversion to initial code X1. Wherein an image I is inputaDimension of (2) is 128 × 128 × 3, and the activation function of the last layer of the encoderMaxout, the initial code X of the output1The dimension is 256.
The coding residual error network R is a fully-connected neural network with two layers, and the number of neurons in the two layers is 256. Let a non-synthesized front face image IgThe code obtained by the encoder is X3The effect of the coding residual network R is to estimate the initial code X1And face-up coding X3Coded residual R (X) between1). Multiplying the encoded residual by a weight and fusing the initial encoding X1To obtain a corrected code X2Wherein X is2=X1+Y(γ)×R(X1)。
The decoder De belongs to a deconvolution neural network, the network structure of which is shown in fig. 4. Its effect is to correct the code X2Decoding to obtain a composite frontal face image I by a deconvolution stepf. For the existing face correction method based on generation of the confrontation network, the smaller the input face deflection angle is, the higher the quality of the synthesized face image is, and the more the personal identity information can be kept in the image. Due to the modified code X2Than initial code X1More closely to face-up code X3By modifying the code X2Instead of the initial code X1And the input decoder can synthesize a human face image with higher quality.
3) Synthesized front face image IfOr non-synthesized frontal face image IgAnd inputting a judging network D, and judging whether the input face image is synthesized or not by the judging network D. Then combining the synthesized front face image IfOr non-synthesized frontal face image IgAnd inputting the human face identity characteristic extractor F to obtain the character identity characteristic of the input human face image.
The decision network D is a convolutional neural network-based classifier whose network structure is shown in fig. 5, and functions to determine whether the input image is from the generator G or from the original image data, and the final output value of the decision network is used to represent the probability that the input image is from the generator G, and the larger this value, the more likely the input image is from the generator G.
The face identity characteristic extractor F adopts the Light-CNN-29 with the open source. Light-CNN29 is a lightweight convolutional neural network, the depth is 29 layers, the number of parameters is about 1200 ten thousand, the identity features of the input face picture can be extracted, and the dimensionality of the finally extracted identity features is 256.
4) Combining the discrimination result of the step 3) with the extracted person identity characteristics and the synthesized front face image IfNon-synthesized front face image IgCorrecting code X2And face-up coding X3And (4) carrying in each pre-designed loss function, and alternately training the generator G and the discrimination network D until the training is finished.
The loss function of the training generator comprises a self-created coding loss function besides a pixel loss function, an identity loss function, a symmetric loss function and a confrontation loss function which are commonly used in the same type of method.
First is a pixel loss function that represents the composite frontal face image IfAnd a non-synthesized frontal face image IgThe pixel difference between them, the formula is as follows:
where W and H represent the width and height of the image, respectively.Andrespectively representing the synthesized front face image IfAnd a non-synthesized frontal face image IgPixel value at coordinate (x, y).
Then a symmetrical loss function is carried out, and a synthesized face image I is obtained in view of the symmetrical characteristic of the facefShould be compared with the image I obtained after it is flipped left and rightsymAs close as possible, the symmetric loss function is formulated as follows:
where W and H represent the width and height of the image, respectively.Andrespectively representing the synthesized front face image IfAnd an image I obtained by left-right turningsymPixel value at coordinate (x, y).
Then, the face identity characteristic extractor F can efficiently obtain the identity characteristic of the front face image by an identity loss function. Respectively synthesizing the front face images IfAnd a non-synthesized frontal face image IgRespectively inputting F to obtain their identity characteristics F (I)f) And F (I)g). To ensure a composite frontal face image IfCan contain non-synthesized frontal face image IgThe identity information of (2) needs to minimize the manhattan distance between the last two layers of identity feature maps obtained after inputting them into F. The identity loss function is formulated as follows:
Wiand HiRespectively representing the width and the height of the face identity characteristic map of the ith last layer. And F is a face identity feature extractor.Andrespectively representing a frontal face image IfAnd a non-synthesized frontal face image IgThe value of the coordinate (x, y) on the identity profile at the i-last level.
And finally, a loss resisting function, which aims to enable the synthesized image to confuse the discrimination network D, thereby enabling the synthesized image to be closer to a real image and enhancing the fidelity of the synthesized image. The formula for the penalty is as follows:
where N is the size of the current training batch, G and D are the generator and discrimination network, respectively, and IaAnd G (I)a) Respectively representing the input face image and the front face image synthesized by the generator. D (G (I)a) The value of) reflects the composite image G (I)a) The determined network D determines the possibility of synthesizing the picture. Minimizing the loss function LadvFor the purpose of the generator to synthesize a frontal face image G (I)a) The composite front face image G (I) can be improved by the detection of the discrimination networka) The degree of realism of the device.
The first goal of the coding loss function is to let the input face image IaAnd a non-synthesized frontal face image IgThe codes obtained by the encoders En are closer, and because the effect of the existing face posture correction method is better under the condition that the deflection angle of the input face is smaller, the closer the code of the input face image is to the code of the same face under the condition of 0-degree deflection angle, the better the synthesized face correcting effect is; thus, the first part of the coding loss function is formulated as follows:
where N is the dimension of the respective code, En is the encoder, En (I)a)iIs the initial coding En (I)a) The value of the ith dimension of (a); r is the coding residual network, R (En (I)a))iIs the coded residual R (En (I)a) A value of the ith dimension of); en (I)a)i+R(En(Ia))iA value equal to the ith dimension of the modified code; x3i is the frontal face code X obtained by the coder of the non-synthesized frontal face image3The value of the ith dimension of (a); the first part of the coding loss function is the manhattan distance between the modified code and the front face code.
Said coding loss functionThe second goal of the number is to differentiate between modified codes for different people, in which modified code En (I)a)+R(En(Ia) A full-connected layer C is constructed, the number of neurons of the full-connected layer C is equal to the number M of people in the training set, and the second part of the coding loss function adopts a cross-entropy loss function:
wherein M is the number of people in the training set; y isiIs the value of the ith dimension of a one-hot vector y indicating the input face image IaWhich person belongs to the training set, if image IaBelonging to the jth person, the jth dimension of the vector y is 1, the remaining dimensions are 0, and the y dimension is M; en (I)a)+R(En(Ia) Is the correction code; c (En (I)a)+R(En(Ia)))iIs to correct the feature vector C (En (I) obtained by encoding through the full connection layer Ca)+R(En(Ia) ) of the ith dimension.
Thus, the complete coding loss function is:
Lcode=Lcode1+λLcode2
in the formula, λ is a constant having a value of 0.1, and represents a weight.
In summary, the loss function of the training generator is:
Ltotal=Lpixel+λ1Lsym+λ2Lid+λ3Ladv+λ4Lcode
wherein L ispixel、Lsym、Lid、LadvAnd LcodeRespectively representing a pixel loss function, a symmetric loss function, an identity loss function, an opponent loss function and an encoding loss function. Lambda [ alpha ]1、λ2、λ3And λ4Representing the weights of the different loss functions.
Referring to parameter settings of the same type of method and a large amount of experimental experience, each lossWeight of function lambda1、λ2、λ3And λ4Are set to 0.2, 0.003, 0.001 and 0.002, respectively. Training a generator and a discrimination network D, wherein the discrimination network D is required to be trained at the same time, the goal of the training D is to enable the discrimination network to distinguish whether the input frontal face image is from the generator G or the original data set, and the loss function of the training discrimination network is as follows:
where N is the size of the current training batch, G and D are the generator and discrimination network, respectively, and IaIs the input image. G (I)a) And IgRespectively representing a composite frontal face image and a non-composite frontal face image. logD (G (I)a) And logD (I)a) Respectively reflect the synthesized front face image G (I)a) And a non-synthesized frontal face image IgThe determined network D determines the possibility of the synthesized frontal face image. Minimization of Ladv2The purpose of (1) is to enable the discrimination network D to accurately reflect the possibility that the input picture is a composite picture.
Training the generator G and the discrimination network D alternately can enable the two to be mutually optimized and improved in confrontation. In the initial stage, the face image generated by the generator G is blurred, and the judgment network D can easily judge the source of the input image, so that the generator G is stimulated to generate a clearer image, and the quality of the generator G is improved. In the subsequent stage, the image generated by the generator G is clearer and is close to the original image data, so that the judgment network is excited to judge the input image more accurately, and the judgment capability of the judgment network D is improved.
After the loss functions of the generator G and the discrimination network D are designed, parameters of the generated countermeasure network are minimized through an Adam descent method, the learning rate is set to be 0.0002, and the batch size is set to be 12. After each training of generator G, discriminant network D is trained once. With the increase of the training times, the quality of the image generated by the generator is continuously improved, the capability of the discrimination network for discriminating the input image is continuously enhanced, and the training is finally completed. The deep learning frame used in the experiment is Tensorflow, the video card of the computer is 1080ti, and the training is stopped when 2 ten thousand batches are trained.
5) In the testing stage, the human face image I with any input posture is subjected toaThe training generator G can be used for synthesizing the front face image I of the same human facefAnd then by direct observation of the resultant frontal face image IfThe effect of the invention can be verified, and the effect of generating images is as shown in fig. 6, for each line of pictures, the first is a face image with an input deflection angle exceeding 45 °, the second is a front face image synthesized by the generator, and the third is a non-synthesized front face image of the same person in the data set. As can be seen from the figure, for the face images with the deflection angle exceeding 45 degrees, the invention can synthesize the front face images of the face images and can keep the identity information of the original face.
The above-mentioned embodiments are merely preferred embodiments of the present invention, and the scope of the present invention is not limited thereto, so that the changes in the shape and principle of the present invention should be covered within the protection scope of the present invention.
Claims (6)
1. A multi-pose face image obverse method based on a generation countermeasure network is characterized by comprising the following steps:
1) collecting the face images of all postures as a training set and a test set, wherein each input face image I of any posture needs to be ensuredaCan find the non-synthesized front face image I of the same person in the data setg;
2) In the training stage, the face image I with any posture in the training set is processedaInput into generator G to obtain corrected code X2And a synthesized front face image IfA non-synthesized front face image IgInputting into a generator G to obtain a face-up code X3;
3) Combining the synthesized front face image IfOr non-synthesized frontal face image IgInputting a discrimination network D for discriminating whether the input face image is synthesized or not, and synthesizing the synthesized face image IfOr non-synthesized frontal face image IgInputting a human face identity characteristic extractor F, and extracting the character identity characteristic of the human face image through the F;
4) combining the discrimination result of the step 3) with the extracted person identity characteristics and the synthesized front face image IfNon-synthesized front face image IgCorrecting code X2And face-up coding X3Carrying in each pre-designed loss function, and alternately training the generator G and the discrimination network D until the training is finished;
5) in the testing stage, the human face image I with any posture is takenaInputting the trained generator G to obtain a synthesized front face image IfBy direct observation of the resultant frontal image IfThe effect is verified.
2. The method for generating the multi-pose face image obverse based on the confrontation network as claimed in claim 1, wherein: in step 1), all face images of the respective poses used by the data set are derived from the data set Multi _ Pie; the number of pictures in the data set is more than 75 ten thousand, and the images under 20 kinds of illumination and 15 kinds of postures of 337 individuals are included; the illumination of the picture is changed from dark to light by illumination numbers 01 to 20, wherein the illumination number 07 is a standard illumination condition; marking all face images of a data set as IaFor each image IaIn the data set, the images of the same person with a face deflection angle of 0 ° and an illumination index of 07 are found, and they are marked as Ig。
3. The method for generating the multi-pose face image obverse based on the confrontation network as claimed in claim 1, wherein: in step 2), the generator G consists of an attitude estimator P, an encoder En, an encoding residual error network R and a decoder De; the pose estimator P estimates the pose of the face by adopting a PnP algorithm, further obtains the deflection angle of the face in the yaw direction, and is realized by a function cv2.solvePnP () of an open-source opencv library; the encoder En is a convolutional neural network, the encoding residual error network R is a two-layer fully-connected neural network, and the decoder De is a deconvolution neural network;
the process of the generator G for synthesizing the picture is as follows: inputting human face image I with any posture to generator GaThe encoder En converts it to the initial code X1The coding residual network R is formed by initially coding X1Estimating a coded residual R (X)1) The pose estimator P calculates the accurate deflection angle gamma of the face in the yaw direction in the input image, the deflection angle gamma inputs the function Y to obtain the weight Y (gamma) of the coding residual error, and the initial coding X1Fusing with the coding residual to obtain a modified code X2Wherein X is2=X1+Y(γ)×R(X1) Encoding the correction X2Input to a decoder De, and generate a front face image I by deconvolutionf。
4. The method for generating the multi-pose face image obverse based on the confrontation network as claimed in claim 1, wherein: in step 3), the discriminating network D is a convolutional neural network-based classifier for determining whether the input image is from the generator G or the original image data; the face identity characteristic extractor F adopts the Light-CNN-29 with an opened source, the Light-CNN29 is a lightweight convolutional neural network, the depth is 29 layers, and the number of parameters is 1200 ten thousand.
5. The method for generating the multi-pose face image obverse based on the confrontation network as claimed in claim 1, wherein: in step 4), the objective of the loss function is to minimize the synthesized frontal face image IfAnd a non-synthesized frontal face image IgThereby making a composite front face image IfMore identity information of the input face image can be reserved; the loss function used in the step 4) comprises a self-created coding loss function besides a pixel loss function, an identity loss function, a symmetrical loss function and a counter loss function which are commonly used in the same type of method, wherein the first target of the coding loss function is to input a face image IaAnd a non-synthesized frontal face image IgThe codes obtained by the encoders En are more similar, because the present human face posture correction method is in the process of inputtingThe effect is better under the condition that the deflection angle of the human face is smaller, so that the closer the code of the input human face image is to the code of the same human face under the condition of 0-degree deflection angle, the better the synthesized front face effect is; thus, the first part of the coding loss function is formulated as follows:
where N is the dimension of the respective code, En is the encoder, En (I)a)iIs the initial coding En (I)a) The value of the ith dimension of (a); r is the coding residual network, R (En (I)a))iIs the coded residual R (En (I)a) A value of the ith dimension of); en (I)a)i+R(En(Ia))iA value equal to the ith dimension of the modified code;is a front face code X obtained by a coder from a non-synthesized front face image3The value of the ith dimension of (a); the first part of the coding loss function is the Manhattan distance between the modified code and the front face code;
the second objective of the coding loss function is to differentiate between different characters in the modified code En (I)a)+R(En(Ia) A full-connected layer C is constructed after the full-connected layer C, the number of neurons of the full-connected layer C is equal to the number M of people in the training set, and the second part of the coding loss function adopts a cross-entropy loss function:
wherein M is the number of people in the training set; y isiIs the value of the ith dimension of a one-hot vector y indicating the input face image IaWhich person belongs to the training set, if image IaBelonging to the j-th person, the j-th dimension of the vector y has a value of 1, the remaining dimensions have values of 0, and the y-dimension has a value ofM;En(Ia)+R(En(Ia) Is the correction code; c (En (I)a)+R(En(Ia)))iIs to correct the feature vector C (En (I) obtained by encoding through the full connection layer Ca)+R(En(Ia) ) a value of the ith dimension;
thus, the complete coding loss function is:
Lcode=Lcode1+λLcode2
in the formula, λ is a constant having a value of 0.1, and represents a weight.
6. The method for generating the multi-pose face image obverse based on the confrontation network as claimed in claim 1, wherein: in the step 4), the generator G and the discrimination network D are alternately trained to enable the two to be mutually optimized and improved in confrontation; in the initial stage, the face image generated by the generator G is blurred, and the judgment network D can easily judge the source of the input image, so that the generator G is stimulated to generate a clearer image, and the quality of the generator G is improved; in the subsequent stage, the image generated by the generator G is clearer and is close to the original image data, so that the judgment network is excited to judge the input image more accurately, and the judgment capability of the judgment network D is improved.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910806159.5A CN110543846B (en) | 2019-08-29 | 2019-08-29 | Multi-pose face image obverse method based on generation countermeasure network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910806159.5A CN110543846B (en) | 2019-08-29 | 2019-08-29 | Multi-pose face image obverse method based on generation countermeasure network |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110543846A CN110543846A (en) | 2019-12-06 |
CN110543846B true CN110543846B (en) | 2021-12-17 |
Family
ID=68710718
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910806159.5A Expired - Fee Related CN110543846B (en) | 2019-08-29 | 2019-08-29 | Multi-pose face image obverse method based on generation countermeasure network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110543846B (en) |
Families Citing this family (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111144240B (en) * | 2019-12-12 | 2023-02-07 | 深圳数联天下智能科技有限公司 | Image processing method and related equipment |
CN111428667A (en) * | 2020-03-31 | 2020-07-17 | 天津中科智能识别产业技术研究院有限公司 | Human face image correcting method for generating confrontation network based on decoupling expression learning |
CN111931484B (en) * | 2020-07-31 | 2022-02-25 | 贵州多彩宝互联网服务有限公司 | Data transmission method based on big data |
CN111856962A (en) * | 2020-08-13 | 2020-10-30 | 郑州智利信信息技术有限公司 | Intelligent home control system based on cloud computing |
CN111985995A (en) * | 2020-08-14 | 2020-11-24 | 足购科技(杭州)有限公司 | WeChat applet-based shoe virtual fitting method and device |
US11810397B2 (en) | 2020-08-18 | 2023-11-07 | Samsung Electronics Co., Ltd. | Method and apparatus with facial image generating |
CN112164002B (en) * | 2020-09-10 | 2024-02-09 | 深圳前海微众银行股份有限公司 | Training method and device of face correction model, electronic equipment and storage medium |
CN114981835A (en) * | 2020-10-29 | 2022-08-30 | 京东方科技集团股份有限公司 | Training method and device of face reconstruction model, face reconstruction method and device, electronic equipment and readable storage medium |
CN112818850B (en) * | 2021-02-01 | 2023-02-10 | 华南理工大学 | Cross-posture face recognition method and system based on progressive neural network and attention mechanism |
CN113140015B (en) * | 2021-04-13 | 2023-03-14 | 杭州欣禾圣世科技有限公司 | Multi-view face synthesis method and system based on generation countermeasure network |
CN113361489B (en) * | 2021-07-09 | 2022-09-16 | 重庆理工大学 | Decoupling representation-based face orthogonalization model construction method and training method |
CN113469269A (en) * | 2021-07-16 | 2021-10-01 | 上海电力大学 | Residual convolution self-coding wind-solar-charged scene generation method based on multi-channel fusion |
CN114049250B (en) * | 2022-01-13 | 2022-04-12 | 广州卓腾科技有限公司 | Method, device and medium for correcting face pose of certificate photo |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107239766A (en) * | 2017-06-08 | 2017-10-10 | 深圳市唯特视科技有限公司 | A kind of utilization resists network and the significantly face of three-dimensional configuration model ajusts method |
CN109815928A (en) * | 2019-01-31 | 2019-05-28 | 中国电子进出口有限公司 | A kind of face image synthesis method and apparatus based on confrontation study |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10474880B2 (en) * | 2017-03-15 | 2019-11-12 | Nec Corporation | Face recognition using larger pose face frontalization |
-
2019
- 2019-08-29 CN CN201910806159.5A patent/CN110543846B/en not_active Expired - Fee Related
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107239766A (en) * | 2017-06-08 | 2017-10-10 | 深圳市唯特视科技有限公司 | A kind of utilization resists network and the significantly face of three-dimensional configuration model ajusts method |
CN109815928A (en) * | 2019-01-31 | 2019-05-28 | 中国电子进出口有限公司 | A kind of face image synthesis method and apparatus based on confrontation study |
Non-Patent Citations (3)
Title |
---|
Multi-poses Face Frontalization based on Pose Weighted GAN;Jiaxin Ma et al.;《2019 IEEE 3rd Information Technology, Networking, Electronic and Automation Control Conference (ITNEC)》;20190606;1271-1276 * |
Pose-Weighted Gan for Photorealistic Face Frontalization;Sufang Zhang et al.;《2019 IEEE International Conference on Image Processing (ICIP)》;20190826;2384-2388 * |
基于生成对抗的人脸正面化生成;钱一琛;《中国优秀硕士学位论文全文数据库 (信息科技辑)》;20190815;I138-1148 * |
Also Published As
Publication number | Publication date |
---|---|
CN110543846A (en) | 2019-12-06 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110543846B (en) | Multi-pose face image obverse method based on generation countermeasure network | |
CN108537743B (en) | Face image enhancement method based on generation countermeasure network | |
CN110443143B (en) | Multi-branch convolutional neural network fused remote sensing image scene classification method | |
CN109543606B (en) | Human face recognition method with attention mechanism | |
CN109657595B (en) | Key feature region matching face recognition method based on stacked hourglass network | |
CN111401257B (en) | Face recognition method based on cosine loss under non-constraint condition | |
CN106960202B (en) | Smiling face identification method based on visible light and infrared image fusion | |
CN110348330B (en) | Face pose virtual view generation method based on VAE-ACGAN | |
CN110909690B (en) | Method for detecting occluded face image based on region generation | |
CN108268859A (en) | A kind of facial expression recognizing method based on deep learning | |
CN112766160A (en) | Face replacement method based on multi-stage attribute encoder and attention mechanism | |
CN112418074A (en) | Coupled posture face recognition method based on self-attention | |
CN112950661A (en) | Method for generating antithetical network human face cartoon based on attention generation | |
CN107423678A (en) | A kind of training method and face identification method of the convolutional neural networks for extracting feature | |
CN1975759A (en) | Human face identifying method based on structural principal element analysis | |
CN112418041B (en) | Multi-pose face recognition method based on face orthogonalization | |
CN113963032A (en) | Twin network structure target tracking method fusing target re-identification | |
CN111783748A (en) | Face recognition method and device, electronic equipment and storage medium | |
CN111832405A (en) | Face recognition method based on HOG and depth residual error network | |
CN112836625A (en) | Face living body detection method and device and electronic equipment | |
CN111222433A (en) | Automatic face auditing method, system, equipment and readable storage medium | |
CN113724354A (en) | Reference image color style-based gray level image coloring method | |
CN114387641A (en) | False video detection method and system based on multi-scale convolutional network and ViT | |
CN109360179A (en) | A kind of image interfusion method, device and readable storage medium storing program for executing | |
CN113378949A (en) | Dual-generation confrontation learning method based on capsule network and mixed attention |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20211217 |