CN115713680B - Semantic guidance-based face image identity synthesis method - Google Patents

Semantic guidance-based face image identity synthesis method Download PDF

Info

Publication number
CN115713680B
CN115713680B CN202211451581.1A CN202211451581A CN115713680B CN 115713680 B CN115713680 B CN 115713680B CN 202211451581 A CN202211451581 A CN 202211451581A CN 115713680 B CN115713680 B CN 115713680B
Authority
CN
China
Prior art keywords
layer
feature
input
attribute
features
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202211451581.1A
Other languages
Chinese (zh)
Other versions
CN115713680A (en
Inventor
刘瑞霞
李子安
舒明雷
陈长芳
单珂
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Qilu University of Technology
Shandong Institute of Artificial Intelligence
Original Assignee
Qilu University of Technology
Shandong Institute of Artificial Intelligence
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Qilu University of Technology, Shandong Institute of Artificial Intelligence filed Critical Qilu University of Technology
Priority to CN202211451581.1A priority Critical patent/CN115713680B/en
Publication of CN115713680A publication Critical patent/CN115713680A/en
Application granted granted Critical
Publication of CN115713680B publication Critical patent/CN115713680B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Abstract

A face image identity synthesis method based on semantic guidance extracts identity information, attribute information and background information from each image, then fuses the information in a feature fusion mode, and finally obtains a final result in an image generation mode through the fused information. The method introduces feature key points for guiding face shape changes. Meanwhile, the background information added in the training process enables the generated face image to be changed and stable in quality.

Description

Semantic guidance-based face image identity synthesis method
Technical Field
The invention relates to the field of image-level deep counterfeiting, in particular to a face image identity synthesis method based on semantic guidance.
Background
In recent years, with the breakthrough development of machine learning and graphics technology, the deep counterfeiting field is also greatly improved, and the face identity synthesis direction in the sub-direction of the field is also rapidly developed, so that more and more counterfeiting images and videos appear on the network. Specifically, the face identity synthesis technology is to convert the identity information of the source face to the target face through a reasonable technology, and meanwhile, attribute information (such as background, gesture, illumination and the like) of the target face in the image is not damaged. At present, face identity synthesis is widely applied to various fields of information protection, film industry, virtual entertainment and the like, advanced equipment is utilized in the film industry to reconstruct a face model of an actor, and lighting conditions of a scene are reconstructed, so that a realistic effect can be obtained. Compared with the directions of attribute editing, image restoration and the like in the field of deep counterfeiting, the face identity synthesis is more open, and meanwhile, more innovation technologies in the generated model are involved.
The traditional face identity synthesis direction research is mainly based on an image editing mode, and the method can be divided into two categories, namely a face image analysis and fusion mode and a 3D face modeling mode. The first conventional image editing method needs to manually parse a face area and perform face fusion by rendering, deformation, and the like, which is not efficient and consumes a lot of time and effort. In the second mode, a 3D face image of the face image needs to be acquired, and the image is generated by combining a deep learning technology, so that the problems of illumination and background loss can be caused. In addition, these generation methods pay little attention to the structure of the face, resulting in the face image being generated with face shape problems.
Disclosure of Invention
In order to overcome the defects of the technology, the invention provides a face image identity synthesis method which is used for firstly semantically guiding feature key points of face shape change, then extracting identity information, attribute information and background information from an image, fusing the information in a feature fusion mode and finally generating the fused information into the image.
The technical scheme adopted for overcoming the technical problems is as follows:
a face image identity synthesis method based on semantic guidance comprises the following steps:
a) Extracting key points of face images from all face images in the CelebA face image data set;
b) Establishing a PET key point adjustment network, and inputting key points of the face image into the PET key point adjustment network to obtain characteristic key points lm fake For characteristic key point lm fake Iteration is carried out to obtain optimized characteristic key points lm fake
c) Establishing a face image feature extraction network, and collecting CelebA face image data into a source image Pic s And a target image Pic t Inputting into a face image feature extraction network, and respectively outputting to obtain identity features F id Attribute featuresF attr
d) Establishing a background feature extraction network, and performing Pic on the target image t Inputting into a background feature extraction network to obtain background feature information F bg
e) Establishing a generating network, and integrating the identity characteristic F id Attribute characteristics F attr Background characteristic information F bg Feature key point lm after optimization fake Inputting into a generation network to obtain a face image Pic fake For the image Pic fake Iteration is carried out to obtain an optimized face image Pic fake
f) Repeating the steps b) to e) to obtain a face image Pic of a person with a real face with a changed outline fake
Further, step a) comprises the steps of:
a-1) detecting key points of all face images in the CelebA face image data set by using a face key point detection algorithm H3R, and obtaining a source image Pic in the CelebA face image data set s The extracted key points are expressed as source key points lm s Target image Pic from CelebA face image dataset t The extracted key points are expressed as source key points lm t
Further, step b) comprises the steps of:
b-1) set up by source encoder E lms Target encoder E lmt Key point generator G lm Similarity discriminator D S True and false discriminator D TF A PET key point adjusting network is formed;
b-2) Source encoder E lms The source key point lm is formed by a first downsampling convolution layer, a second downsampling convolution layer, a third downsampling convolution layer, a fourth downsampling convolution layer and a fifth downsampling convolution layer s Input to source encoder E lms In the first downsampled convolution layer of (2), outputting to obtain characteristic informationCharacteristic information +.>Input into the second downsampled convolution layer, output the resulting characteristic information +.>Characteristic information +.>Input into the third downsampled convolution layer, output the resulting characteristic information +.>Characteristic information +.>Input into a fourth downsampling convolution layer, and output to obtain characteristic information>Characteristic information +.>Input into fifth downsampling convolution layer, and output to obtain characteristic information +>b-3) target encoder E lmt Is composed of the first, the second, the third, the fourth and the fifth full-connection layers, and the source key point lm t Input to target encoder E lmt In the first full connection layer of (2), outputting the obtained characteristic information +.>Characteristic information +.>Input into the second full connection layer, output the obtained characteristic information +.>Characteristic information +.>Input into the third full connection layer, output the obtained characteristic information +.>Characteristic information +.>Input into fourth full connection layer, output to get characteristic information +.>Characteristic information +.>Input into fifth connection layer, output to get characteristic information +.>b-4) feature information +.cat () function is used>And characteristic information->Stacking to obtain feature vector->
b-5) Critical Point Generator G lm The feature vector is formed by a first upsampling convolution layer, a second upsampling convolution layer, a third upsampling convolution layer, a fourth upsampling convolution layer and a fifth upsampling convolution layerInput to the keypoint generator G lm In the first upsampled convolutional layer of (2), the output gets the feature key point +.>Feature key point->Inputting into a second up-sampling convolution layer, outputting to obtain characteristic key points +.>Feature key point->Inputting into a third up-sampling convolution layer, outputting to obtain characteristic key points +.>Feature key point->Inputting into a fourth up-sampling convolution layer, outputting to obtain characteristic key points +.>Feature key point- >Inputting into a fifth up-sampling convolution layer, outputting to obtain characteristic key points lm fake
b-6) similarity discriminator D S From Layer s Module, layer fake Module, layer c Module structure, layer fake The module consists of a first full-connection layer, a second full-connection layer, a third full-connection layer and a fourth full-connection layer, and features key points lm fake Input to Layer fake In the first full connection layer of the module, outputting and obtaining characteristic informationCharacteristic information +.>Input to Layer fake In the second fully connected layer of the module, the obtained characteristic information is output +.>Characteristic information +.>Input to Layer fake In the third fully connected layer of the module, the obtained characteristic information is output +.>Characteristic information +.>Input to Layer fake In the fourth fully connected layer of the module, the obtained characteristic information is output +.>Layer s The module consists of a first full-connection layer, a second full-connection layer, a third full-connection layer and a fourth full-connection layer, and the source key point lm is formed s Input to Layer s The module obtains characteristic information from the output of the first full connection layer>Characteristic information +.>Input to Layer s The module outputs the feature information +.>Characteristic information +.>Input to Layer s The module obtains the characteristic information from the third full connection layer by outputting>Characteristic information +.>Input to Layer s The module obtains the characteristic information by outputting in the fourth full connection layer Feature information +.cat () function is used>And characteristic information->Stacking to obtain feature vector->Layer c The module is composed of a first full-connection layer, a second full-connection layer, a third full-connection layer and a fourth full-connection layer, and feature vectors are +.>Input to Layer c In the first full connection Layer of the module, the similarity feature Fscore1 is output and obtained, and the similarity feature Fscore1 is input into a Layer c In the second full connection Layer of the module, the similarity feature Fscore2 is output and obtained, and the similarity feature Fscore2 is input into a Layer c In the third full connection Layer of the module, the similarity feature Fscore3 is output and obtained, and the similarity feature Fscore3 is input into a Layer c Outputting to obtain a similarity score in a fourth full-connection layer of the module;
b-7) true and false discriminator D TF The characteristic key point lm is formed by a first full-connection layer, a second full-connection layer, a third full-connection layer, a fourth full-connection layer, a fifth full-connection layer and a sixth full-connection layer fake Input to a true-false discriminator D TF In the first full connection layer of (a), output obtaining characteristicsFeatures->Input to the second full connection layer, output get feature +.>Features->Input to the third full connection layer, output get feature +.>Features->Input to the fourth full connection layer, output get feature +. >Features->Input to the fifth full connection layer, output get feature +.>Characteristics->Input to the sixth full connection layer, output to obtain the value of 1 channel +.>
b-8) passing through the formula loss L1 =||lm fake -lm s || 2 Calculating to obtain point-by-point loss L1 In the formula, I.I 2 Is the average square error, loss Cycle =||lm fake -lm t || 2 Calculated to obtainLoss to reconstruction loss Cycle By the formulaCalculating to obtain true and false loss DTF By the formulaCalculating to obtain similarity loss DS By using point-by-point loss through back propagation L1 Loss of reconstruction loss Cycle Loss of true and false loss DTF Similarity loss DS Iterative optimization feature key point lm fake
In the step b-2), the convolution kernels of the first downsampling convolution layer, the second downsampling convolution layer, the third downsampling convolution layer, the fourth downsampling convolution layer and the fifth downsampling convolution layer are all 1, the step sizes are all 1, and the filling is all 0; step by step
In step b-5), the convolution kernels of the first, second, third, fourth and fifth upsampling convolution layers are all 1, the step sizes are all 1, and the fillings are all 0.
Further, step c) comprises the steps of:
c-1) establishing a message by the identity encoder E id And attribute encoder E attr A face image feature extraction network is formed;
c-2) identity encoder E id The source image Pic is formed by an Arcface algorithm s Input to identity encoder E id In (3), the source image Pic is processed through the interaction () function s Adjusting to 112 x 112 resolution, inputting an image with 112 x 112 resolution into an Arcface algorithm, and outputting to obtain an identity vectorWherein b is training batch, c is channel number, h is image height, w is image width, identity vector +.>Sequentially input to the filling layer and the regularizationIn the chemical layer, outputting to obtain identity feature F id
c-3) Attribute encoder E attr The target image Pic comprises a first downsampling residual block, a second downsampling residual block, a third downsampling residual block, a fourth downsampling residual block, a fifth downsampling residual block, a first bottleneck residual block and a second bottleneck residual block, wherein the first downsampling residual block, the second downsampling residual block, the third downsampling residual block, the fourth downsampling residual block and the fifth downsampling residual block are sequentially formed by a first normalization layer, a second normalization layer, a first ReLU activation layer, a second ReLU activation layer, a first convolution layer, a second convolution layer, a downsampling layer and a residual connecting layer, and the first bottleneck residual block and the second bottleneck residual block are sequentially formed by a first normalization layer, a second normalization layer, a first ReLU activation layer, a second ReLU activation layer, a first convolution layer, a second convolution layer and a residual connecting layer respectively t Input to attribute encoder E attr In the first downsampled residual block of (2), outputting to obtain attribute featuresAttribute feature->Input into the second downsampled residual block and output the resulting attribute +.>Attribute feature->Input into the third downsampled residual block and output the resulting attribute +.>Attribute feature->Input into a fourth downsampled residual block and output to obtain attribute +.>Attribute feature->Input into the fifth downsampled residual block and output the resulting attribute +.>Attribute feature->Input into the first bottleneck residual block, output to get attribute +.>Attribute feature->Inputting into a second bottleneck residual block, and outputting to obtain attribute F attr
In the step c-3), the first normalization layer and the second normalization layer in the first downsampling residual block, the second downsampling residual block, the third downsampling residual block, the fourth downsampling residual block and the fifth downsampling residual block are all of BatchNorm2d; in the step c-3), convolution kernels of a first convolution layer and a second convolution layer in the first downsampling residual block, the second downsampling residual block, the third downsampling residual block, the fourth downsampling residual block and the fifth downsampling residual block are 3, and filling and step sizes are 1.
Further, step d) comprises the steps of:
d-1) establishing a face analysis module and a background information encoder E bg A background feature extraction network is formed;
d-2) the face analysis module consists of a face analysis algorithm BiSeNet and is used for analyzing the target image Pic t Inputting into a face analysis module, analyzing to obtain each part of the face, and filling colors into each part of the face to obtain an image Pic only retaining a background area bg
d-3) background information encoder E bg The image Pic is formed by a first self-attention module, a second self-attention module, a third self-attention module, a fourth self-attention module and a fifth self-attention module, wherein the first self-attention module, the second self-attention module, the third self-attention module, the fourth self-attention module and the fifth self-attention module are sequentially formed by a downsampling convolution layer, a self-attention layer and a ReLU activation layer bg Input to background information encoder E bg In the first self-attention module of (2), outputting to obtain background characteristicsBackground feature->Input into a second self-attention module, output to obtain background characteristicsBackground feature->Input into the third self-attention module, output gets the background feature +.>Will background featuresInput to the fourth self-attention module, output to get the background feature +. >Background feature->Input into a fourth self-attention module, output to obtain background feature F bg
In the step d-3), the convolution kernels of the downsampled convolution layers of the first self-attention module, the second self-attention module, the third self-attention module, the fourth self-attention module and the fifth self-attention module are all 3, the step sizes are all 0, and the filling is all 0.
Further, step e) comprises the steps of:
e-1) establishing a generating network formed by a fusion module, an up-sampling module and a discriminator module;
e-2) the fusion module is composed of a first fusion block, a second fusion block, a third fusion block, a fourth fusion block, a fifth fusion block and a sixth fusion block, wherein the first fusion block, the second fusion block, the third fusion block, the fourth fusion block, the fifth fusion block and the sixth fusion block are sequentially composed of a first convolution layer, a first self-adaptive instance normalization layer, a ReLU activation layer, a second convolution layer and a second self-adaptive instance normalization layer, and the attribute characteristic F is obtained attr Inputting the attribute characteristics into a fusion module, and obtaining the attribute characteristics after the attribute characteristics pass through a first convolution layer of the first fusion blockBy incorporating identity feature F id And attribute features->Input into the first adaptive instance normalization layer, by the formula +. >Calculating to obtain fusion characteristic->Middle sigma id For identity feature F id Standard deviation of>Is attribute feature->Mu (·) is the channel mean operation, sigma (·) is the standard deviation operation, fusion feature ++>Inputting into ReLU activation layer to obtain characteristic +.>Features->After input to the second convolution layer, the attribute feature +.>By incorporating identity feature F id And attribute features->Input into the second adaptive instance normalization layer, by the formula +.>Calculating to obtain fusion characteristic->In the middle ofIs attribute feature->Channel mean value of (2);
e-3) fusing featuresInputting the attribute characteristics into a fusion module, and obtaining the attribute characteristics after the attribute characteristics pass through a first convolution layer of a second fusion block>By incorporating identity feature F id And attribute features->Input into the first adaptive instance normalization layer, by the formula +.>Calculating to obtain fusion characteristic->In->Is attribute feature->Channel mean of (2), fusion features->Inputting into ReLU activation layer to obtain characteristic +.>Features->After input to the second convolution layer, the attribute feature +.>By incorporating identity feature F id And attribute features->Input into the second adaptive instance normalization layer, by the formula +.>Calculating to obtain fusion characteristic->In the middle ofIs attribute feature- >Channel mean value of (2);
e-4) fusing featuresInputting the attribute characteristics into a fusion module, and obtaining the attribute characteristics after the attribute characteristics pass through a first convolution layer of a third fusion block>By incorporating identity feature F id And attribute features->Input into the first adaptive instance normalization layer, by the formula +.>Calculating to obtain fusion characteristic->In->Is attribute feature->Channel mean of (2), fusion features->Inputting into ReLU activation layer to obtain characteristic +.>Features->After input to the second convolution layer, the attribute feature +.>By incorporating identity feature F id And attribute features->Input into the second adaptive instance normalization layer, by the formula +.>Calculating to obtain fusion characteristic->In the middle ofIs attribute feature->Channel mean value of (2);
e-5) fusing featuresInputting the attribute characteristics into a fusion module, and obtaining the attribute characteristics after the attribute characteristics pass through a first convolution layer of a fourth fusion block>By incorporating identity feature F id And attribute features->Input into the first adaptive instance normalization layer, by the formula +.>Calculating to obtain fusion characteristic->In->Is attribute feature->Channel mean of (2), fusion features->Inputting into ReLU activation layer to obtain characteristic +.>Features->After input to the second convolution layer, the attribute feature +. >By incorporating identity feature F id And attribute features->Input into the second adaptive instance normalization layer, by the formula +.>Calculating to obtain fusion characteristic->In the middle ofIs attribute feature->Channel mean value of (2);
e-6) fusing featuresInputting the attribute characteristics into a fusion module, and obtaining the attribute characteristics after the attribute characteristics pass through a first convolution layer of a fifth fusion block from the fifth fusion block>By incorporating identity feature F id And attribute features->Input into the first adaptive instance normalization layer, by the formula +.>Calculating to obtain fusion characteristic->In->Is attribute feature->Channel mean of (2), fusion features->Inputting into ReLU activation layer to obtain characteristic +.>Features->After input to the second convolution layer, the attribute feature +.>By incorporating identity feature F id And attribute features->Input into the second adaptive instance normalization layer, by the formula +.>Calculating to obtain fusion characteristic->In the middle ofIs attribute feature->Channel mean value of (2);
e-7) fusing featuresInputting the attribute characteristics into a fusion module, and obtaining the attribute characteristics after the attribute characteristics pass through a first convolution layer of a sixth fusion block from the sixth fusion block>By incorporating identity feature F id And attribute features->Input into the first adaptive instance normalization layer, by the formula +.>Calculating to obtain fusion characteristic->In->Is attribute feature- >Channel mean of (2), fusion features->Inputting into ReLU activation layer to obtain characteristic +.>Features->After input to the second convolution layer, the attribute feature +.>By incorporating identity feature F id And attribute features->Input into the second adaptive instance normalization layer, by the formula +.>Calculating to obtain fusion characteristic->In->Is attribute feature->Channel mean value of (2);
e-8) feature key point lm after optimizing fake Respectively inputting two convolution layers to obtain characteristic F gamma And feature F beta By the formulaCalculating to obtain a fusion vector F fuse
e-9) the upsampling module is composed of a first upsampling layer, a second upsampling layer, a third upsampling layer, a fourth upsampling layer and a fifth upsampling layer, and features the backgroundAnd fusion vector F fuse Input into the first upsampling layer of the upsampling module, output the resulting feature +.>Features->And background features->Together into a second upsampling layer, and output to obtain featuresFeatures->And background features->Together into a third upsampling layer, the output gets the feature +.>Features to be characterizedAnd background features->Together into a fourth upsampling layer, the output gets the feature +.>Features->And background feature F bg Together input into a fifth upsampling layer, and output to obtain face image Pic fake
e-10) the discriminator module is composed of a first downsampling convolution layer, a second downsampling convolution layer, a third downsampling convolution layer, a fourth downsampling convolution layer, a fifth downsampling convolution layer, a sixth downsampling convolution layer and a Sigmoid function layer,face image Pic fake Input to the first downsampled convolution layer and output to obtain featuresFeatures->After input to the second downsampled convolution layer, the output yields the feature + ->Features->After input to the third downsampled convolution layer, the output yields the feature + ->Features->Input to the fourth downsampling convolution layer and output to obtain the characteristicFeatures->After input to the fifth downsampled convolution layer, the output yields the feature + ->Features to be characterizedAfter input to the sixth downsampled convolution layer, the output yields the feature + ->Features->After input to the Sigmoid function layer, the value +.>Image Pic of target t After input to the first downsampled convolution layer, the characteristic +.>Features->After input to the second downsampled convolution layer, the output yields the feature + ->Features to be characterizedAfter input to the third downsampled convolution layer, the output yields the feature + ->Features->After input to the fourth downsampled convolution layer, the output yields the feature +.>Features- >Input to the fifth downsampling convolution layer and output to obtain featuresFeatures->Input to the sixth downsampling convolution layer and output to obtain a specialSyndrome of->Features->After input to the Sigmoid function layer, the value +.>e-11) by the formula->The loss of identity l1 is calculated, and the loss of identity l1 is calculated by the formula l2= ||pic fake -Pic t || 2 Calculating to obtain reconstruction loss l2 through a formulaThe attribute loss l3 is calculated, and the face image Pic is optimized by utilizing the identity loss l1, the reconstruction loss l2 and the attribute loss l3 through a back propagation method in an iterative mode fake
Further, in step e-2), the convolution kernels of the first convolution layer and the second convolution layer of the first fusion block, the second fusion block, the third fusion block, the fourth fusion block, the fifth fusion block and the sixth fusion block are 3, the step sizes are 1, and the filling is 0; in the step e-8), the convolution kernels of the two convolution layers are 1, the step sizes are 1, and the filling is 0; in the step e-9), the convolution kernels of the first upsampling layer, the second upsampling layer, the third upsampling layer and the fourth upsampling layer are all 3, the step sizes are all 1, the filling is all 1, the convolution kernel of the fifth upsampling layer is 7, the step size is 1, and the filling is 0; e-10), the convolution kernels of the first downsampling convolution layer, the second downsampling convolution layer and the third downsampling convolution layer are 4*4, the step sizes are 2, the filling is 1, the convolution kernels of the fourth downsampling convolution layer, the fifth downsampling convolution layer and the sixth downsampling convolution layer are 4*4, the step sizes are 1, and the filling is 1.
The beneficial effects of the invention are as follows: and extracting identity information, attribute information and background information from each image, fusing the information in a feature fusion mode, and finally obtaining a final result from the fused information in an image generation mode. The method introduces feature key points for guiding face shape changes. Meanwhile, the background information added in the training process enables the generated face image to be changed and stable in quality.
Drawings
FIG. 1 is a flow chart of the method of the present invention;
FIG. 2 is a diagram of a key point extraction and adjustment architecture of the present invention;
FIG. 3 is a network architecture diagram of a keypoint arbiter of the present invention;
FIG. 4 is a diagram of an attribute extraction architecture and a downsampling architecture of the present invention;
FIG. 5 is a normalized block diagram of a spatially adaptive example of the present invention;
fig. 6 is a diagram showing a semantic parsing and background information extraction structure according to the present invention.
Detailed Description
The invention is further described with reference to fig. 1 to 6.
A face image identity synthesis method based on semantic guidance comprises the following steps:
a) And extracting key points of the face images from all face images in the CelebA face image data set.
b) Establishing a PET key point adjustment network, and inputting key points of the face image into the PET key point adjustment network to obtain characteristic key points lm fake For characteristic key point lm fake Iteration is carried out to obtain optimized characteristic key points lm fake
c) Establishing a face image feature extraction network, and collecting CelebA face image data into a source image Pic s And a target image Pic t Inputting into a face image feature extraction network, and respectively outputting to obtain identity features F id Attribute features F attr
d) Establishing a background feature extraction network, and performing Pic on the target image t Inputting into a background feature extraction network to obtain background feature information F bg
e) Establishing a generating network, and integrating the identity characteristic F id Attribute characteristics F attr Background characteristic information F bg Feature key point lm after optimization fake Inputting into a generation network to obtain a face image Pic fake For the image Pic fake Iteration is carried out to obtain an optimized face image Pic fake
f) Repeating the steps b) to e) to obtain a face image Pic of a person with a real face with a changed outline fake . The feature key points for semantic guidance of facial form changes are provided, identity information, attribute information and background information are extracted from each image, information is fused in a feature fusion mode, and finally the fused information is subjected to image generation mode to obtain a final result. The method introduces feature key points for guiding face shape changes. Meanwhile, the background information added in the training process enables the generated face image to be changed and stable in quality.
Example 1:
step a) comprises the steps of:
a-1) detecting key points of all face images in the CelebA face image data set by using a face key point detection algorithm H3R, and obtaining a source image Pic in the CelebA face image data set s The extracted key points are expressed as source key points lm s Target image Pic from CelebA face image dataset t The extracted key points are expressed as source key points lm t . CelebA face image dataset consists of 30000 face images with different identities, the resolution of each image is 512 x 512, and the source image Pic is a source image Pic s And a target image Pic t Are images in the CelebA dataset.
Example 2:
step b) comprises the steps of:
b-1) set up by source encoder E lms Target encoder E lmt Key point generator G lm Similarity discriminator D S True and false discriminator D TF The PET key points are formed to adjust the network.
b-2) Source encoder E lms From a first downsampling rollThe source key point lm is formed by a lamination layer, a second downsampling convolution layer, a third downsampling convolution layer, a fourth downsampling convolution layer and a fifth downsampling convolution layer s Input to source encoder E lms In the first downsampled convolution layer of (2), outputting to obtain characteristic informationCharacteristic information +.>Input into the second downsampled convolution layer, output the resulting characteristic information +. >Characteristic information +.>Input into the third downsampled convolution layer, output the resulting characteristic information +.>Characteristic information +.>Input into a fourth downsampling convolution layer, and output to obtain characteristic information>Characteristic information +.>Input into fifth downsampling convolution layer, and output to obtain characteristic information +>b-3) target encoder E lmt Is composed of the first, the second, the third, the fourth and the fifth full-connection layers, and the source key point lm t Input to target encoder E lmt In the first full connection layer of (a), outputting and obtaining characteristic information/>Characteristic information +.>Input into the second full connection layer, output the obtained characteristic information +.>Characteristic information +.>Input into the third full connection layer, output the obtained characteristic information +.>Characteristic information +.>Input into fourth full connection layer, output to get characteristic information +.>To the characteristic informationInput into fifth connection layer, output to get characteristic information +.>b-4) feature information Using a torch.cat () functionAnd characteristic information->Stacking to obtain feature vector->
b-5) offKey point generator G lm The feature vector is formed by a first upsampling convolution layer, a second upsampling convolution layer, a third upsampling convolution layer, a fourth upsampling convolution layer and a fifth upsampling convolution layer Input to the keypoint generator G lm In the first upsampled convolutional layer of (2), the output gets the feature key point +.>Feature key point->Inputting into a second up-sampling convolution layer, outputting to obtain characteristic key points +.>Feature key point->Inputting into a third up-sampling convolution layer, outputting to obtain characteristic key points +.>Feature key point->Inputting into a fourth up-sampling convolution layer, outputting to obtain characteristic key points +.>Feature key point->Inputting into a fifth up-sampling convolution layer, outputting to obtain characteristic key points lm fake Its dimension is 1 x 212.
b-6) similarity discriminator D S From Layer s Module, layer fake Module, layer c Module structureLayer is formed fake The module consists of a first full-connection layer, a second full-connection layer, a third full-connection layer and a fourth full-connection layer, and features key points lm fake Input to Layer fake In the first full connection layer of the module, outputting and obtaining characteristic informationCharacteristic information +.>Input to Layer fake In the second fully connected layer of the module, the obtained characteristic information is output +.>Characteristic information +.>Input to Layer fake In the third fully connected layer of the module, the obtained characteristic information is output +.>Characteristic information +.>Input to Layer fake In the fourth fully connected layer of the module, the obtained characteristic information is output +.>Layer s The module consists of a first full-connection layer, a second full-connection layer, a third full-connection layer and a fourth full-connection layer, and the source key point lm is formed s Input to Layer s The module obtains characteristic information from the output of the first full connection layer>Characteristic information +.>Input to Layer s The module outputs the feature information +.>Characteristic information +.>Input to Layer s The module obtains the characteristic information from the third full connection layer by outputting>Characteristic information +.>Input to Layer s The module obtains the characteristic information by outputting in the fourth full connection layerFeature information +.cat () function is used>And characteristic information->Stacking to obtain feature vector->Layer c The module is composed of a first full-connection layer, a second full-connection layer, a third full-connection layer and a fourth full-connection layer, and feature vectors are +.>Input to Layer c In the first full connection Layer of the module, the similarity feature Fscore1 is output and obtained, and the similarity feature Fscore1 is input into a Layer c In the second full connection Layer of the module, the similarity feature Fscore2 is output and obtained, and the similarity feature Fscore2 is input into a Layer c In the third full connection layer of the module, the output obtains the similarity feature Fscore3, which is similarThe degree characteristic Fscore3 is input to Layer c In the fourth full-connection layer of the module, the output obtains a similarity score.
b-7) true and false discriminator D TF The characteristic key point lm is formed by a first full-connection layer, a second full-connection layer, a third full-connection layer, a fourth full-connection layer, a fifth full-connection layer and a sixth full-connection layer fake Input to a true-false discriminator D TF In the first full connection layer of (a), output obtaining characteristicsFeatures->Input to the second full connection layer, output get feature +.>Features->Input to the third full connection layer, output get feature +.>Features->Input to the fourth full connection layer, output get feature +.>Features->Input to the fifth full connection layer, output get feature +.>Characteristics->Input to the sixth wholeA connection layer outputting a value of 1 channel +.>
b-8) passing through the formula loss L1 =||lm fake -lm s || 2 Calculating to obtain point-by-point loss L1 In the formula, I.I 2 Is the average square error, loss Cycle =||lm fake -lm t || 2 Calculating to obtain reconstruction loss Cycle By the formulaCalculating to obtain true and false loss DTF By the formula->Calculating to obtain similarity loss DS By using point-by-point loss through back propagation L1 Loss of reconstruction loss Cycle Loss of true and false loss DTF Similarity loss DS Iterative optimization feature key point lm fake
Example 3:
in the step b-2), the convolution kernels of the first downsampling convolution layer, the second downsampling convolution layer, the third downsampling convolution layer, the fourth downsampling convolution layer and the fifth downsampling convolution layer are all 1, the step sizes are all 1, and the filling is all 0; in the step b-5), the convolution kernels of the first up-sampling convolution layer, the second up-sampling convolution layer, the third up-sampling convolution layer, the fourth up-sampling convolution layer and the fifth up-sampling convolution layer are all 1, the step length is all 1, and the filling is all 0.
Example 4:
step c) comprises the steps of:
c-1) establishing a message by the identity encoder E id And attribute encoder E attr A face image feature extraction network is formed; c-2) identity encoder E id The source image Pic is formed by an Arcface algorithm s Input to identity encoder E id In (3), the source image Pic is processed through the interaction () function s Adjusted to 112 x 112Resolution, inputting an image with 112 x 112 resolution into an Arcface algorithm, and outputting to obtain an identity vectorWherein b is training batch, c is channel number, h is image height, w is image width, identity vector +.>Sequentially inputting into a filling layer and a regularization layer, and outputting to obtain identity feature F id
c-3) Attribute encoder E attr The target image Pic comprises a first downsampling residual block, a second downsampling residual block, a third downsampling residual block, a fourth downsampling residual block, a fifth downsampling residual block, a first bottleneck residual block and a second bottleneck residual block, wherein the first downsampling residual block, the second downsampling residual block, the third downsampling residual block, the fourth downsampling residual block and the fifth downsampling residual block are sequentially formed by a first normalization layer, a second normalization layer, a first ReLU activation layer, a second ReLU activation layer, a first convolution layer, a second convolution layer, a downsampling layer and a residual connecting layer, and the first bottleneck residual block and the second bottleneck residual block are sequentially formed by a first normalization layer, a second normalization layer, a first ReLU activation layer, a second ReLU activation layer, a first convolution layer, a second convolution layer and a residual connecting layer respectively t Input to attribute encoder E attr In the first downsampled residual block of (2), outputting to obtain attribute featuresAttribute feature->Input into the second downsampled residual block and output the resulting attribute +.>Attribute feature->Input into the third downsampled residual block and output the resulting attribute +.>Attribute feature->Input into a fourth downsampled residual block and output to obtain attribute +.>Attribute feature->Input into the fifth downsampled residual block and output the resulting attribute +.>Attribute feature->Input into the first bottleneck residual block, output to get attribute +.>Attribute feature->Inputting into a second bottleneck residual block, and outputting to obtain attribute F attr
Example 5:
in the step c-3), the first normalization layer and the second normalization layer in the first downsampling residual block, the second downsampling residual block, the third downsampling residual block, the fourth downsampling residual block and the fifth downsampling residual block are all of BatchNorm2d; in the step c-3), convolution kernels of a first convolution layer and a second convolution layer in the first downsampling residual block, the second downsampling residual block, the third downsampling residual block, the fourth downsampling residual block and the fifth downsampling residual block are 3, and filling and step sizes are 1.
Example 6:
step d) comprises the steps of:
d-1) establishing a face analysis module and a background information encoder E bg A background feature extraction network is formed;
d-2) the face analysis module consists of a face analysis algorithm BiSeNet and is used for analyzing the target image Pic t Inputting into a face analysis module, analyzing to obtain each part of the face, and filling colors into each part of the face to obtain an image Pic only retaining a background area bg
d-3) background information encoder E bg The image Pic is formed by a first self-attention module, a second self-attention module, a third self-attention module, a fourth self-attention module and a fifth self-attention module, wherein the first self-attention module, the second self-attention module, the third self-attention module, the fourth self-attention module and the fifth self-attention module are sequentially formed by a downsampling convolution layer, a self-attention layer and a ReLU activation layer bg Input to background information encoder E bg In the first self-attention module of (2), outputting to obtain background characteristicsBackground feature->Input into a second self-attention module, output to obtain background characteristicsBackground feature->Input into the third self-attention module, output gets the background feature +.>Will background features Input to the fourth self-attention module, output to get the background feature +.>Background feature->Input into a fourth self-attention module, output to obtain background feature F bg
Example 7:
in the step d-3), the convolution kernels of the downsampled convolution layers of the first self-attention module, the second self-attention module, the third self-attention module, the fourth self-attention module and the fifth self-attention module are all 3, the step sizes are all 0, and the filling is all 0.
Example 8:
step e) comprises the steps of:
e-1) establishing a generating network formed by a fusion module, an up-sampling module and a discriminator module;
e-2) the fusion module is composed of a first fusion block, a second fusion block, a third fusion block, a fourth fusion block, a fifth fusion block and a sixth fusion block, wherein the first fusion block, the second fusion block, the third fusion block, the fourth fusion block, the fifth fusion block and the sixth fusion block are sequentially composed of a first convolution layer, a first self-adaptive instance normalization layer, a ReLU activation layer, a second convolution layer and a second self-adaptive instance normalization layer, and the attribute characteristic F is obtained attr Inputting the attribute characteristics into a fusion module, and obtaining the attribute characteristics after the attribute characteristics pass through a first convolution layer of the first fusion blockBy incorporating identity feature F id And attribute features->Input into the first adaptive instance normalization layer, by the formula +.>Calculating to obtain fusion characteristic->Middle sigma id For identity feature F id Standard deviation of>Is attribute feature->Mu (·) is the channel mean operation, sigma (·) is the standard deviation operation, fusion feature ++>Inputting into ReLU activation layer to obtain characteristic +.>Features->After input to the second convolution layer, the attribute feature +.>By incorporating identity feature F id And attribute features->Input into the second adaptive instance normalization layer, by the formula +.>Calculating to obtain fusion characteristic->In the middle ofIs attribute feature->Channel mean value of (2); />
e-3) fusing featuresInputting the attribute characteristics into a fusion module, and obtaining the attribute characteristics after the attribute characteristics pass through a first convolution layer of a second fusion block>By incorporating identity feature F id And attribute features->Input into the first adaptive instance normalization layer, by the formula +.>Calculating to obtain fusion characteristic->In->Is attribute feature->Channel mean of (2), fusion features->Inputting into ReLU activation layer to obtain characteristic +.>Features->After input to the second convolution layer, the attribute feature +.>By incorporating identity feature F id And attribute features->Input into the second adaptive instance normalization layer, by the formula +. >Calculating to obtain fusion characteristic->In the middle ofIs attribute feature->Channel mean value of (2);
e-4) fusing featuresInputting the attribute characteristics into a fusion module, and obtaining the attribute characteristics after the attribute characteristics pass through a first convolution layer of a third fusion block>By incorporating identity feature F id And attribute features->Input into the first adaptive instance normalization layer, by the formula +.>Calculating to obtain fusion characteristic->In->Is attribute feature->Channel mean of (2), fusion features->Inputting into ReLU activation layer to obtain characteristic +.>Features->After input to the second convolution layer, the attribute feature +.>By incorporating identity feature F id And attribute features->Input into the second adaptive instance normalization layer, by the formula +.>Calculating to obtain fusion characteristic->In the middle ofIs attribute feature->Channel mean value of (2);
e-5) fusing featuresInputting the attribute characteristics into a fusion module, and obtaining the attribute characteristics after the attribute characteristics pass through a first convolution layer of a fourth fusion block>By incorporating identity feature F id And attribute features->Input into the first adaptive instance normalization layer, by the formula +.>Calculating to obtain fusion characteristic->In->Is attribute feature->Channel mean of (2), fusion features->Inputting into ReLU activation layer to obtain characteristic +.>Features- >After input to the second convolution layer, the attribute feature +.>By incorporating identity feature F id And attribute features->Input into the second adaptive instance normalization layer, by the formula +.>Calculating to obtain fusion characteristic->In the middle ofIs attribute feature->Channel mean value of (2); />
e-6) fusing featuresInputting the attribute characteristics into a fusion module, and obtaining the attribute characteristics after the attribute characteristics pass through a first convolution layer of a fifth fusion block from the fifth fusion block>By incorporating identity feature F id And attribute features->Input into the first adaptive instance normalization layer, by the formula +.>Calculating to obtain fusion characteristic->In->Is attribute feature->Channel mean of (2), fusion features->Inputting into ReLU activation layer to obtain characteristic +.>Features->After input to the second convolution layer, the attribute feature +.>By incorporating identity feature F id And attribute features->Input into the second adaptive instance normalization layer, by the formula +.>Calculating to obtain fusion characteristic->In the middle ofIs attribute feature->Channel mean value of (2);
e-7) fusing featuresInputting the attribute characteristics into a fusion module, and obtaining the attribute characteristics after the attribute characteristics pass through a first convolution layer of a sixth fusion block from the sixth fusion block>By incorporating identity feature F id And attribute features->Input into the first adaptive instance normalization layer, by the formula +. >Calculating to obtain fusion characteristic->In->Is attribute feature->Channel mean of (2), fusion features->Inputting into ReLU activation layer to obtain characteristic +.>Features->After input to the second convolution layer, the attribute feature +.>By incorporating identity feature F id And attribute features->Input into the second adaptive instance normalization layer, by the formula +.>Calculating to obtain fusion characteristic->In->Is attribute feature->Channel mean value of (2);
e-8) feature key point lm after optimizing fake Respectively inputting two convolution layers to obtain characteristic F gamma And feature F beta By the formulaCalculating to obtain a fusion vector F fuse
e-9) the upsampling module is composed of a first upsampling layer, a second upsampling layer, a third upsampling layer, a fourth upsampling layer and a fifth upsampling layer, and features the backgroundAnd fusion vector F fuse Input into the first upsampling layer of the upsampling module, output the resulting feature +.>Features->And background features->Together into a second upsampling layer, and output to obtain featuresFeatures->And background features->Together into a third upsampling layer, the output gets the feature +.>Features to be characterizedAnd background features->Together input to the firstIn the four upsampling layers, the output gets the feature +.>Features->And background feature F bg Together input into a fifth upsampling layer, and output to obtain face image Pic fake
The e-10) discriminator module consists of a first downsampling convolution layer, a second downsampling convolution layer, a third downsampling convolution layer, a fourth downsampling convolution layer, a fifth downsampling convolution layer, a sixth downsampling convolution layer and a Sigmoid function layer, and is used for converting the face image Pic fake Input to the first downsampled convolution layer and output to obtain featuresFeatures->After input to the second downsampled convolution layer, the output yields the feature + ->Features->After input to the third downsampled convolution layer, the output yields the feature + ->Features->Input to the fourth downsampling convolution layer and output to obtain the characteristicFeatures->After input to the fifth downsampled convolution layer, the output yields the feature + ->Features to be characterizedAfter input to the sixth downsampled convolution layer, the output yields the feature + ->Features->After input to the Sigmoid function layer, the value +.>Image Pic of target t After input to the first downsampled convolution layer, the characteristic +.>Features->After input to the second downsampled convolution layer, the output yields the feature + ->Features to be characterizedAfter input to the third downsampled convolution layer, the output yields the feature + ->Features->After input to the fourth downsampled convolution layer, the output yields the feature +. >Features->Input to the fifth downsampling convolution layer and output to obtain featuresFeatures->After input to the sixth downsampled convolution layer, the output yields the feature + ->Features->After input to the Sigmoid function layer, the value +.>e-11) by the formula->The loss of identity l1 is calculated, and the loss of identity l1 is calculated by the formula l2= ||pic fake -Pic t || 2 Calculating to obtain reconstruction loss l2 through a formulaThe attribute loss l3 is calculated, and the face image Pic is optimized by utilizing the identity loss l1, the reconstruction loss l2 and the attribute loss l3 through a back propagation method in an iterative mode fake
Example 9:
in the step e-2), the convolution kernels of the first convolution layer and the second convolution layer of the first fusion block, the second fusion block, the third fusion block, the fourth fusion block, the fifth fusion block and the sixth fusion block are 3, the step sizes are 1, and the filling is 0; in the step e-8), the convolution kernels of the two convolution layers are 1, the step sizes are 1, and the filling is 0; in the step e-9), the convolution kernels of the first upsampling layer, the second upsampling layer, the third upsampling layer and the fourth upsampling layer are all 3, the step sizes are all 1, the filling is all 1, the convolution kernel of the fifth upsampling layer is 7, the step size is 1, and the filling is 0; e-10), the convolution kernels of the first downsampling convolution layer, the second downsampling convolution layer and the third downsampling convolution layer are 4*4, the step sizes are 2, the filling is 1, the convolution kernels of the fourth downsampling convolution layer, the fifth downsampling convolution layer and the sixth downsampling convolution layer are 4*4, the step sizes are 1, and the filling is 1.
Finally, it should be noted that: the foregoing description is only a preferred embodiment of the present invention, and the present invention is not limited thereto, but it is to be understood that modifications and equivalents of some of the technical features described in the foregoing embodiments may be made by those skilled in the art, although the present invention has been described in detail with reference to the foregoing embodiments. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (8)

1. The face image identity synthesis method based on semantic guidance is characterized by comprising the following steps of:
a) Extracting key points of face images from all face images in the CelebA face image data set;
b) Establishing a PET key point adjustment network, and inputting key points of the face image into the PET key point adjustment network to obtain characteristic key points lm fake For characteristic key point lm fake Iteration is carried out to obtain optimized characteristic key points lm fake
c) Establishing a face image feature extraction network, and collecting CelebA face image data into a source image Pic s And a target image Pic t Inputting into a face image feature extraction network, and respectively outputting to obtain identity features F id Attribute features F attr
d) Establishing a background feature extraction network, and performing Pic on the target image t Inputting into a background feature extraction network to obtain background feature information F bg
e) Establishing a generating network, and integrating the identity characteristic F id Attribute characteristics F attr Background characteristic information F bg Feature key point lm after optimization fake Inputting into a generation network to obtain a face image Pic fake For the image Pic fake Iteration is carried out to obtain an optimized face image Pic fake
f) Repeating the steps b) to e) to obtain a face image Pic of a person with a real face with a changed outline fake
Step b) comprises the steps of:
b-1) set up by source encoder E lms Target encoder E lmt Key point generator G lm Similarity discriminator D S True and false discriminator D TF A PET key point adjusting network is formed;
b-2) Source encoder E lms The source key point lm is formed by a first downsampling convolution layer, a second downsampling convolution layer, a third downsampling convolution layer, a fourth downsampling convolution layer and a fifth downsampling convolution layer s Input to source encoder E lms In the first downsampled convolution layer of (2), outputting to obtain characteristic informationCharacteristic information +.>Input into the second downsampled convolution layer, output the resulting characteristic information +.>Characteristic information +.>Input into the third downsampled convolution layer, output the resulting characteristic information +. >Characteristic information +.>Input into a fourth downsampling convolution layer, and output to obtain characteristic information>Characteristic information +.>Input into fifth downsampling convolution layer, and output to obtain characteristic information +>
b-3) target encoder E lmt Is composed of the first, the second, the third, the fourth and the fifth full-connection layers, and the source key point lm t Input to target encoder E lmt In the first full connection layer of (a), outputting and obtaining characteristic informationCharacteristic information +.>Input into the second full connection layer, output the obtained characteristic information +.>To the characteristic informationInput into the third full connection layer, output the obtained characteristic information +.>Characteristic information +.>Input into fourth full connection layer, output to get characteristic information +.>Characteristic information +.>Input into the fifth connection layer, and output to obtain characteristic information
b-4) feature information Using a torch.cat () functionAnd characteristic information->Stacking to obtain feature vector->
b-5) Critical Point Generator G lm The feature vector is formed by a first upsampling convolution layer, a second upsampling convolution layer, a third upsampling convolution layer, a fourth upsampling convolution layer and a fifth upsampling convolution layerInput to the keypoint generator G lm In the first upsampled convolutional layer of (2), the output gets the feature key point +. >Feature key point->Inputting into a second up-sampling convolution layer, outputting to obtain characteristic key points +.>Feature key point->Inputting into a third up-sampling convolution layer, outputting to obtain characteristic key points +.>Feature key point->Inputting into a fourth up-sampling convolution layer, outputting to obtain characteristic key pointsFeature key point->Inputting into a fifth up-sampling convolution layer, outputting to obtain characteristic key points lm fake
b-6) similarity discriminator D S From Layer s Module, layer fake Module, layer c Module structure, layer fake The module consists of a first full-connection layer, a second full-connection layer, a third full-connection layer and a fourth full-connection layer, and features key points lm fake Input to Layer fake In the first full connection layer of the module, outputting and obtaining characteristic informationCharacteristic information +.>Input to Layer fake In the second full connection layer of the module, outputting and obtaining characteristic information/>Characteristic information +.>Input to Layer fake In the third fully connected layer of the module, the obtained characteristic information is output +.>Characteristic information +.>Input to Layer fake In the fourth fully connected layer of the module, the obtained characteristic information is output +.>Layer s The module consists of a first full-connection layer, a second full-connection layer, a third full-connection layer and a fourth full-connection layer, and the source key point lm is formed s Input to Layer s The module obtains characteristic information from the output of the first full connection layer>Characteristic information +.>Input to Layer s The module outputs the feature information +.>Characteristic information +.>Input to Layer s The module obtains the characteristic information from the third full connection layer by outputting>Characteristic information +.>Input to Layer s The module obtains the characteristic information by outputting in the fourth full connection layerFeature information +.cat () function is used>And characteristic information->Stacking to obtain feature vector->Layer c The module is composed of a first full-connection layer, a second full-connection layer, a third full-connection layer and a fourth full-connection layer, and feature vectors are +.>Input to Layer c In the first full connection Layer of the module, the similarity feature Fscore1 is output and obtained, and the similarity feature Fscore1 is input into a Layer c In the second full connection Layer of the module, the similarity feature Fscore2 is output and obtained, and the similarity feature Fscore2 is input into a Layer c In the third full connection Layer of the module, the similarity feature Fscore3 is output and obtained, and the similarity feature Fscore3 is input into a Layer c Outputting to obtain a similarity score in a fourth full-connection layer of the module;
b-7) true and false discriminator D TF The characteristic key point lm is formed by a first full-connection layer, a second full-connection layer, a third full-connection layer, a fourth full-connection layer, a fifth full-connection layer and a sixth full-connection layer fake Input to a true-false discriminator D TF In the first full connection layer of (2), outputObtaining characteristicsFeatures->Input to the second full connection layer, output get feature +.>Features to be characterizedInput to the third full connection layer, output get feature +.>Features->Input to the fourth full connection layer, output get feature +.>Features->Input to the fifth full connection layer, output get feature +.>Characteristics->Input to the sixth full connection layer, output to obtain the value of 1 channel +.>
b-8) passing through the formula loss L1 =||lm fake -lm s || 2 Calculating to obtain point-by-point loss L1 In the formula, I.I 2 Is the average square error, loss Cycle =||lm fake -lm t || 2 Calculating to obtain reconstruction loss Cycle By the formulaCalculating to obtain true and false loss DTF By the formulaCalculating to obtain similarity loss DS By using point-by-point loss through back propagation L1 Loss of reconstruction loss Cycle Loss of true and false loss DTF Similarity loss DS Iterative optimization feature key point lm fake
Step c) comprises the steps of:
c-1) establishing a message by the identity encoder E id And attribute encoder E attr A face image feature extraction network is formed;
c-2) identity encoder E id The source image Pic is formed by an Arcface algorithm s Input to identity encoder E id In (3), the source image Pic is processed through the interaction () function s Adjusting to 112 x 112 resolution, inputting an image with 112 x 112 resolution into an Arcface algorithm, and outputting to obtain an identity vector Wherein b is training batch, c is channel number, h is image height, w is image width, identity vector +.>Sequentially inputting into a filling layer and a regularization layer, and outputting to obtain identity feature F id
c-3) Attribute encoder E attr From a first downsampled residual block, a second downsampled residual block, a third downsampled residual block, a fourth downsampled residual block, and a fifthThe downsampling residual block, the first bottleneck residual block and the second bottleneck residual block are formed by a first normalization layer, a second normalization layer, a first ReLU activation layer, a second ReLU activation layer, a first convolution layer, a second convolution layer, a downsampling layer and a residual connecting layer in sequence, and the first bottleneck residual block and the second bottleneck residual block are formed by the first normalization layer, the second normalization layer, the first ReLU activation layer, the second ReLU activation layer, the first convolution layer, the second convolution layer and the residual connecting layer in sequence, so that the target image Pic t Input to attribute encoder E attr In the first downsampled residual block of (2), outputting to obtain attribute featuresAttribute feature->Input into the second downsampled residual block and output the resulting attribute +. >Attribute feature->Input into the third downsampled residual block and output the resulting attribute +.>Attribute feature->Input into a fourth downsampled residual block and output to obtain attribute +.>Attribute feature->Input into the fifth downsampled residual block and output the resulting attribute +.>Attribute feature->Inputting into a first bottleneck residual block, and outputting to obtain attribute featuresAttribute feature->Inputting into a second bottleneck residual block, and outputting to obtain attribute F attr
2. The semantic guidance-based face image identity synthesis method according to claim 1, wherein the step a) comprises the steps of:
a-1) detecting key points of all face images in the CelebA face image data set by using a face key point detection algorithm H3R, and obtaining a source image Pic in the CelebA face image data set s The extracted key points are expressed as source key points lm s Target image Pic from CelebA face image dataset t The extracted key points are expressed as source key points lm t
3. The semantic guidance-based face image identity synthesis method according to claim 1, wherein: in the step b-2), the convolution kernels of the first downsampling convolution layer, the second downsampling convolution layer, the third downsampling convolution layer, the fourth downsampling convolution layer and the fifth downsampling convolution layer are all 1, the step sizes are all 1, and the filling is all 0; in the step b-5), the convolution kernels of the first up-sampling convolution layer, the second up-sampling convolution layer, the third up-sampling convolution layer, the fourth up-sampling convolution layer and the fifth up-sampling convolution layer are all 1, the step length is all 1, and the filling is all 0.
4. The semantic guidance-based face image identity synthesis method according to claim 1, wherein: in the step c-3), the first normalization layer and the second normalization layer in the first downsampling residual block, the second downsampling residual block, the third downsampling residual block, the fourth downsampling residual block and the fifth downsampling residual block are all of BatchNorm2d; in the step c-3), convolution kernels of a first convolution layer and a second convolution layer in the first downsampling residual block, the second downsampling residual block, the third downsampling residual block, the fourth downsampling residual block and the fifth downsampling residual block are 3, and filling and step sizes are 1.
5. The semantic guidance-based face image identity synthesis method according to claim 1, wherein the step d) comprises the steps of:
d-1) establishing a face analysis module and a background information encoder E bg A background feature extraction network is formed;
d-2) the face analysis module consists of a face analysis algorithm BiSeNet and is used for analyzing the target image Pic t Inputting into a face analysis module, analyzing to obtain each part of the face, and filling colors into each part of the face to obtain an image Pic only retaining a background area bg
d-3) background information encoder E bg The image Pic is formed by a first self-attention module, a second self-attention module, a third self-attention module, a fourth self-attention module and a fifth self-attention module, wherein the first self-attention module, the second self-attention module, the third self-attention module, the fourth self-attention module and the fifth self-attention module are sequentially formed by a downsampling convolution layer, a self-attention layer and a ReLU activation layer bg Input to background information encoder E bg In the first self-attention module of (2), outputting to obtain background characteristicsBackground feature->Input into the second self-attention module, output gets the background feature +.>Background feature->Input into the third self-attention module, output gets the background feature +.>Background feature->Input to the fourth self-attention module, output to get the background feature +.>Background feature->Input into a fourth self-attention module, output to obtain background feature F bg
6. The semantic guidance-based face image identity synthesis method according to claim 5, wherein: in the step d-3), the convolution kernels of the downsampled convolution layers of the first self-attention module, the second self-attention module, the third self-attention module, the fourth self-attention module and the fifth self-attention module are all 3, the step sizes are all 0, and the filling is all 0.
7. The semantic guidance-based face image identity synthesis method according to claim 1, wherein the step e) comprises the steps of:
e-1) establishing a generating network formed by a fusion module, an up-sampling module and a discriminator module;
e-2) the fusion module is composed of a first fusion block, a second fusion block, a third fusion block, a fourth fusion block, a fifth fusion block and a sixth fusion block, wherein the first fusion block, the second fusion block, the third fusion block, the fourth fusion block, the fifth fusion block and the sixth fusion block are sequentially composed of a first convolution layer, a first self-adaptive instance normalization layer, a ReLU activation layer, a second convolution layer and a second self-adaptive instance normalization layer, and the attribute characteristic F is obtained attr Inputting the attribute characteristics into a fusion module, and obtaining the attribute characteristics after the attribute characteristics pass through a first convolution layer of the first fusion blockBy incorporating identity feature F id And attribute features->Input into the first adaptive instance normalization layer, by the formula +.>Calculating to obtain fusion characteristic->Middle sigma id For identity feature F id Standard deviation of>Is attribute feature->Mu (·) is the channel mean operation, sigma (·) is the standard deviation operation, fusion feature ++>Inputting into ReLU activation layer to obtain characteristic +. >Features->After input to the second convolution layer, the attribute feature +.>By incorporating identity feature F id And attribute features->Input into the second adaptive instance normalization layer, by the formula +.>Calculating to obtain fusion characteristic->In->Is attribute feature->Channel mean value of (2);
e-3) fusing featuresInputting the attribute characteristics into a fusion module, and obtaining the attribute characteristics after the attribute characteristics pass through a first convolution layer of a second fusion block>By incorporating identity feature F id And attribute features->Input into the first adaptive instance normalization layer, by the formula +.>Calculating to obtain fusion characteristic->In->Is attribute characteristicsChannel mean of (2), fusion features->Inputting into ReLU activation layer to obtain characteristic +.>Features->After input to the second convolution layer, the attribute feature +.>By incorporating identity feature F id And attribute features->Input into the second adaptive instance normalization layer, by the formula +.>Calculating to obtain fusion characteristic->In the middle ofIs attribute feature->Channel mean value of (2);
e-4) fusing featuresInputting the attribute characteristics into a fusion module, and obtaining the attribute characteristics after the attribute characteristics pass through a first convolution layer of a third fusion block>By incorporating identity feature F id And attribute features->Input into the first adaptive instance normalization layer, by the formula +. >Calculating to obtain fusion characteristic->In->Is attribute characteristicsChannel mean of (2), fusion features->Inputting into ReLU activation layer to obtain characteristic +.>Features->After input to the second convolution layer, the attribute feature +.>By incorporating identity feature F id And attribute features->Input into the second adaptive instance normalization layer, by the formula +.>Calculating to obtain fusion characteristic->In the middle ofIs attribute feature->Channel mean value of (2);
e-5) fusing featuresInputting the attribute characteristics into a fusion module, and obtaining the attribute characteristics after the attribute characteristics pass through a first convolution layer of a fourth fusion block>By incorporating identity feature F id And attribute features->Input into the first adaptive instance normalization layer, by the formula +.>Calculating to obtain fusion characteristic->In->Is attribute characteristicsChannel mean of (2), fusion features->Inputting into ReLU activation layer to obtain characteristic +.>Features->After input to the second convolution layer, the attribute feature +.>By incorporating identity feature F id And attribute features->Input into the second adaptive instance normalization layer, by the formula +.>Calculating to obtain fusion characteristic->In the middle ofIs attribute feature->Channel mean value of (2);
e-6) fusing featuresInputting the attribute characteristics into a fusion module, and obtaining the attribute characteristics after the attribute characteristics pass through a first convolution layer of a fifth fusion block from the fifth fusion block >By incorporating identity feature F id And attribute features->Input into the first adaptive instance normalization layer, by the formula +.>Calculating to obtain fusion characteristic->In->Is attribute characteristicsChannel mean of (2), fusion features->Inputting into ReLU activation layer to obtain characteristic +.>Features->After input to the second convolution layer, the attribute feature +.>By incorporating identity feature F id And attribute features->Input into the second adaptive instance normalization layer, by the formula +.>Calculating to obtain fusion characteristic->In the middle ofIs attribute feature->Channel mean value of (2);
e-7) fusing featuresInputting the attribute characteristics into a fusion module, and obtaining the attribute characteristics after the attribute characteristics pass through a first convolution layer of a sixth fusion block from the sixth fusion block>By incorporating identity feature F id And attribute features->Input into the first adaptive instance normalization layer, by the formula +.>Calculated to obtainFusion characteristics->In->Is attribute characteristicsChannel mean of (2), fusion features->Inputting into ReLU activation layer to obtain characteristic +.>Features->After input to the second convolution layer, the attribute feature +.>By incorporating identity feature F id And attribute features->Input into the second adaptive instance normalization layer, by the formula +.>Calculating to obtain fusion characteristic->In the middle ofIs attribute feature->Channel mean value of (2);
e-8) feature key point lm after optimizing fake Respectively inputting two convolution layers to obtain characteristic F gamma And feature F beta By the formulaCalculating to obtain a fusion vector F fuse
e-9) the upsampling module is composed of a first upsampling layer, a second upsampling layer, a third upsampling layer, a fourth upsampling layer and a fifth upsampling layer, and features the backgroundAnd fusion vector F fuse Input into the first upsampling layer of the upsampling module, output the resulting feature +.>Features->And background features->Together into a second upsampling layer, and output to obtain featuresFeatures->And background features->Together into a third upsampling layer, the output gets the feature +.>Features to be characterizedAnd background features->Together into a fourth upsampling layer, the output gets the feature +.>Features->And background feature F bg Together input into a fifth upsampling layer, and output to obtain face image Pic fake
The e-10) discriminator module consists of a first downsampling convolution layer, a second downsampling convolution layer, a third downsampling convolution layer, a fourth downsampling convolution layer, a fifth downsampling convolution layer, a sixth downsampling convolution layer and a Sigmoid function layer, and is used for converting the face image Pic fake Input to the first downsampled convolution layer and output to obtain featuresFeatures- >After input to the second downsampled convolution layer, the output yields the feature + ->Features->After input to the third downsampled convolution layer, the output yields the feature + ->Features->After input to the fourth downsampled convolution layer, the output yields the feature +.>Features->After input to the fifth downsampled convolution layer, the output yields the feature + ->Features->After input to the sixth downsampled convolution layer, the output yields the feature + ->Features->After input to the Sigmoid function layer, the value +.>Image Pic of target t After input to the first downsampled convolution layer, the characteristic +.>Features->After input to the second downsampled convolution layer, the output yields the feature + ->Features->After input to the third downsampled convolution layer, the output yields the feature + ->Features->After input to the fourth downsampled convolution layer, the output yields the feature +.>Features->After input to the fifth downsampled convolution layer, the output yields the feature + ->Features->After input to the sixth downsampled convolution layer, the output yields the feature + ->Features->After input to the Sigmoid function layer, the value +.>
e-11) is calculated by the formulaThe loss of identity l1 is calculated, and the loss of identity l1 is calculated by the formula l2= ||pic fake -Pic t || 2 Calculating to obtain reconstruction loss l2 by the formula +. >The attribute loss l3 is calculated, and the face image Pic is optimized by utilizing the identity loss l1, the reconstruction loss l2 and the attribute loss l3 through a back propagation method in an iterative mode fake
8. The semantic guidance-based face image identity synthesis method according to claim 7, wherein: in the step e-2), the convolution kernels of the first convolution layer and the second convolution layer of the first fusion block, the second fusion block, the third fusion block, the fourth fusion block, the fifth fusion block and the sixth fusion block are 3, the step sizes are 1, and the filling is 0; in the step e-8), the convolution kernels of the two convolution layers are 1, the step sizes are 1, and the filling is 0; in the step e-9), the convolution kernels of the first upsampling layer, the second upsampling layer, the third upsampling layer and the fourth upsampling layer are all 3, the step sizes are all 1, the filling is all 1, the convolution kernel of the fifth upsampling layer is 7, the step size is 1, and the filling is 0; e-10), the convolution kernels of the first downsampling convolution layer, the second downsampling convolution layer and the third downsampling convolution layer are 4*4, the step sizes are 2, the filling is 1, the convolution kernels of the fourth downsampling convolution layer, the fifth downsampling convolution layer and the sixth downsampling convolution layer are 4*4, the step sizes are 1, and the filling is 1.
CN202211451581.1A 2022-11-18 2022-11-18 Semantic guidance-based face image identity synthesis method Active CN115713680B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211451581.1A CN115713680B (en) 2022-11-18 2022-11-18 Semantic guidance-based face image identity synthesis method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211451581.1A CN115713680B (en) 2022-11-18 2022-11-18 Semantic guidance-based face image identity synthesis method

Publications (2)

Publication Number Publication Date
CN115713680A CN115713680A (en) 2023-02-24
CN115713680B true CN115713680B (en) 2023-07-25

Family

ID=85233817

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211451581.1A Active CN115713680B (en) 2022-11-18 2022-11-18 Semantic guidance-based face image identity synthesis method

Country Status (1)

Country Link
CN (1) CN115713680B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116246022B (en) * 2023-03-09 2024-01-26 山东省人工智能研究院 Face image identity synthesis method based on progressive denoising guidance
CN116612211B (en) * 2023-05-08 2024-02-02 山东省人工智能研究院 Face image identity synthesis method based on GAN and 3D coefficient reconstruction

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112766160A (en) * 2021-01-20 2021-05-07 西安电子科技大学 Face replacement method based on multi-stage attribute encoder and attention mechanism
CN113112411A (en) * 2020-01-13 2021-07-13 南京信息工程大学 Human face image semantic restoration method based on multi-scale feature fusion
WO2022151535A1 (en) * 2021-01-15 2022-07-21 苏州大学 Deep learning-based face feature point detection method

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10671918B2 (en) * 2017-10-24 2020-06-02 International Business Machines Corporation Attention based sequential image processing
CN110197167B (en) * 2019-06-05 2021-03-26 清华大学深圳研究生院 Video motion migration method
CN111368662B (en) * 2020-02-25 2023-03-21 华南理工大学 Method, device, storage medium and equipment for editing attribute of face image
CN111783603A (en) * 2020-06-24 2020-10-16 有半岛(北京)信息科技有限公司 Training method for generating confrontation network, image face changing method and video face changing method and device
CN111932444B (en) * 2020-07-16 2023-09-19 中国石油大学(华东) Face attribute editing method based on generation countermeasure network and information processing terminal
CN114078172B (en) * 2020-08-19 2023-04-07 四川大学 Text image generation method for progressively generating confrontation network based on resolution
CN112734634B (en) * 2021-03-30 2021-07-27 中国科学院自动化研究所 Face changing method and device, electronic equipment and storage medium
CN113627233A (en) * 2021-06-17 2021-11-09 中国科学院自动化研究所 Visual semantic information-based face counterfeiting detection method and device
CN113689328A (en) * 2021-09-13 2021-11-23 中国海洋大学 Image harmony system based on self-attention transformation
CN115311720B (en) * 2022-08-11 2023-06-06 山东省人工智能研究院 Method for generating deepfake based on transducer

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113112411A (en) * 2020-01-13 2021-07-13 南京信息工程大学 Human face image semantic restoration method based on multi-scale feature fusion
WO2022151535A1 (en) * 2021-01-15 2022-07-21 苏州大学 Deep learning-based face feature point detection method
CN112766160A (en) * 2021-01-20 2021-05-07 西安电子科技大学 Face replacement method based on multi-stage attribute encoder and attention mechanism

Also Published As

Publication number Publication date
CN115713680A (en) 2023-02-24

Similar Documents

Publication Publication Date Title
CN115713680B (en) Semantic guidance-based face image identity synthesis method
Wu et al. Pq-net: A generative part seq2seq network for 3d shapes
Wen et al. Cycle4completion: Unpaired point cloud completion using cycle transformation with missing region coding
Zhang et al. PC-RGNN: Point cloud completion and graph neural network for 3D object detection
Zhang et al. Text-guided neural image inpainting
Tursun et al. Mtrnet: A generic scene text eraser
CN110728219A (en) 3D face generation method based on multi-column multi-scale graph convolution neural network
WO2023072067A1 (en) Face attribute editing model training and face attribute editing methods
CN113140020B (en) Method for generating image based on text of countermeasure network generated by accompanying supervision
Tao et al. Point cloud projection and multi-scale feature fusion network based blind quality assessment for colored point clouds
Bhunia et al. Word level font-to-font image translation using convolutional recurrent generative adversarial networks
CN112949707A (en) Cross-mode face image generation method based on multi-scale semantic information supervision
Chen et al. Transformer-based 3d face reconstruction with end-to-end shape-preserved domain transfer
Fang et al. Ctrl-Room: Controllable Text-to-3D Room Meshes Generation with Layout Constraints
Sun et al. TSINIT: a two-stage Inpainting network for incomplete text
Li et al. Silhouette-assisted 3d object instance reconstruction from a cluttered scene
Chen et al. ScribbleSeg: Scribble-based Interactive Image Segmentation
Bian et al. Unsupervised domain adaptation for point cloud semantic segmentation via graph matching
Siddharth et al. Blended multi-class text to image synthesis GANs with RoBerTa and Mask R-CNN
CN116612211B (en) Face image identity synthesis method based on GAN and 3D coefficient reconstruction
Zhang et al. Mutual dual-task generator with adaptive attention fusion for image inpainting
Guo et al. TRUST: an accurate and end-to-end table structure recognizer using splitting-based transformers
CN115497085A (en) Point cloud completion method and system based on multi-resolution dual-feature folding
CN116309913A (en) Method for generating image based on ASG-GAN text description of generation countermeasure network
Zhi et al. Pose-guided person image synthesis for data augmentation in pedestrian detection

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant