CN115713680B

CN115713680B - Semantic guidance-based face image identity synthesis method

Info

Publication number: CN115713680B
Application number: CN202211451581.1A
Authority: CN
Inventors: 刘瑞霞; 李子安; 舒明雷; 陈长芳; 单珂
Original assignee: Qilu University of Technology; Shandong Institute of Artificial Intelligence
Current assignee: Qilu University of Technology; Shandong Institute of Artificial Intelligence
Priority date: 2022-11-18
Filing date: 2022-11-18
Publication date: 2023-07-25
Anticipated expiration: 2042-11-18
Also published as: CN115713680A

Abstract

A face image identity synthesis method based on semantic guidance extracts identity information, attribute information and background information from each image, then fuses the information in a feature fusion mode, and finally obtains a final result in an image generation mode through the fused information. The method introduces feature key points for guiding face shape changes. Meanwhile, the background information added in the training process enables the generated face image to be changed and stable in quality.

Description

Semantic guidance-based face image identity synthesis method

Technical Field

The invention relates to the field of image-level deep counterfeiting, in particular to a face image identity synthesis method based on semantic guidance.

Background

In recent years, with the breakthrough development of machine learning and graphics technology, the deep counterfeiting field is also greatly improved, and the face identity synthesis direction in the sub-direction of the field is also rapidly developed, so that more and more counterfeiting images and videos appear on the network. Specifically, the face identity synthesis technology is to convert the identity information of the source face to the target face through a reasonable technology, and meanwhile, attribute information (such as background, gesture, illumination and the like) of the target face in the image is not damaged. At present, face identity synthesis is widely applied to various fields of information protection, film industry, virtual entertainment and the like, advanced equipment is utilized in the film industry to reconstruct a face model of an actor, and lighting conditions of a scene are reconstructed, so that a realistic effect can be obtained. Compared with the directions of attribute editing, image restoration and the like in the field of deep counterfeiting, the face identity synthesis is more open, and meanwhile, more innovation technologies in the generated model are involved.

The traditional face identity synthesis direction research is mainly based on an image editing mode, and the method can be divided into two categories, namely a face image analysis and fusion mode and a 3D face modeling mode. The first conventional image editing method needs to manually parse a face area and perform face fusion by rendering, deformation, and the like, which is not efficient and consumes a lot of time and effort. In the second mode, a 3D face image of the face image needs to be acquired, and the image is generated by combining a deep learning technology, so that the problems of illumination and background loss can be caused. In addition, these generation methods pay little attention to the structure of the face, resulting in the face image being generated with face shape problems.

Disclosure of Invention

In order to overcome the defects of the technology, the invention provides a face image identity synthesis method which is used for firstly semantically guiding feature key points of face shape change, then extracting identity information, attribute information and background information from an image, fusing the information in a feature fusion mode and finally generating the fused information into the image.

The technical scheme adopted for overcoming the technical problems is as follows:

a face image identity synthesis method based on semantic guidance comprises the following steps:

a) Extracting key points of face images from all face images in the CelebA face image data set;

b) Establishing a PET key point adjustment network, and inputting key points of the face image into the PET key point adjustment network to obtain characteristic key points lm _fake For characteristic key point lm _fake Iteration is carried out to obtain optimized characteristic key points lm _fake ；

c) Establishing a face image feature extraction network, and collecting CelebA face image data into a source image Pic _s And a target image Pic _t Inputting into a face image feature extraction network, and respectively outputting to obtain identity features F _id Attribute featuresF _attr ；

d) Establishing a background feature extraction network, and performing Pic on the target image _t Inputting into a background feature extraction network to obtain background feature information F _bg ；

e) Establishing a generating network, and integrating the identity characteristic F _id Attribute characteristics F _attr Background characteristic information F _bg Feature key point lm after optimization _fake Inputting into a generation network to obtain a face image Pic _fake For the image Pic _fake Iteration is carried out to obtain an optimized face image Pic _fake ；

f) Repeating the steps b) to e) to obtain a face image Pic of a person with a real face with a changed outline _fake 。

Further, step a) comprises the steps of:

a-1) detecting key points of all face images in the CelebA face image data set by using a face key point detection algorithm H3R, and obtaining a source image Pic in the CelebA face image data set _s The extracted key points are expressed as source key points lm _s Target image Pic from CelebA face image dataset _t The extracted key points are expressed as source key points lm _t 。

Further, step b) comprises the steps of:

b-1) set up by source encoder E _lms Target encoder E _lmt Key point generator G _lm Similarity discriminator D _S True and false discriminator D _TF A PET key point adjusting network is formed;

b-2) Source encoder E _lms The source key point lm is formed by a first downsampling convolution layer, a second downsampling convolution layer, a third downsampling convolution layer, a fourth downsampling convolution layer and a fifth downsampling convolution layer _s Input to source encoder E _lms In the first downsampled convolution layer of (2), outputting to obtain characteristic informationCharacteristic information +.>Input into the second downsampled convolution layer, output the resulting characteristic information +.>Characteristic information +.>Input into the third downsampled convolution layer, output the resulting characteristic information +.>Characteristic information +.>Input into a fourth downsampling convolution layer, and output to obtain characteristic information>Characteristic information +.>Input into fifth downsampling convolution layer, and output to obtain characteristic information +>b-3) target encoder E _lmt Is composed of the first, the second, the third, the fourth and the fifth full-connection layers, and the source key point lm _t Input to target encoder E _lmt In the first full connection layer of (2), outputting the obtained characteristic information +.>Characteristic information +.>Input into the second full connection layer, output the obtained characteristic information +.>Characteristic information +.>Input into the third full connection layer, output the obtained characteristic information +.>Characteristic information +.>Input into fourth full connection layer, output to get characteristic information +.>Characteristic information +.>Input into fifth connection layer, output to get characteristic information +.>b-4) feature information +.cat () function is used>And characteristic information->Stacking to obtain feature vector->

b-5) Critical Point Generator G _lm The feature vector is formed by a first upsampling convolution layer, a second upsampling convolution layer, a third upsampling convolution layer, a fourth upsampling convolution layer and a fifth upsampling convolution layerInput to the keypoint generator G _lm In the first upsampled convolutional layer of (2), the output gets the feature key point +.>Feature key point->Inputting into a second up-sampling convolution layer, outputting to obtain characteristic key points +.>Feature key point->Inputting into a third up-sampling convolution layer, outputting to obtain characteristic key points +.>Feature key point->Inputting into a fourth up-sampling convolution layer, outputting to obtain characteristic key points +.>Feature key point- >Inputting into a fifth up-sampling convolution layer, outputting to obtain characteristic key points lm _fake ；

b-6) similarity discriminator D _S From Layer _s Module, layer _fake Module, layer _c Module structure, layer _fake The module consists of a first full-connection layer, a second full-connection layer, a third full-connection layer and a fourth full-connection layer, and features key points lm _fake Input to Layer _fake In the first full connection layer of the module, outputting and obtaining characteristic informationCharacteristic information +.>Input to Layer _fake In the second fully connected layer of the module, the obtained characteristic information is output +.>Characteristic information +.>Input to Layer _fake In the third fully connected layer of the module, the obtained characteristic information is output +.>Characteristic information +.>Input to Layer _fake In the fourth fully connected layer of the module, the obtained characteristic information is output +.>Layer _s The module consists of a first full-connection layer, a second full-connection layer, a third full-connection layer and a fourth full-connection layer, and the source key point lm is formed _s Input to Layer _s The module obtains characteristic information from the output of the first full connection layer>Characteristic information +.>Input to Layer _s The module outputs the feature information +.>Characteristic information +.>Input to Layer _s The module obtains the characteristic information from the third full connection layer by outputting>Characteristic information +.>Input to Layer _s The module obtains the characteristic information by outputting in the fourth full connection layer Feature information +.cat () function is used>And characteristic information->Stacking to obtain feature vector->Layer _c The module is composed of a first full-connection layer, a second full-connection layer, a third full-connection layer and a fourth full-connection layer, and feature vectors are +.>Input to Layer _c In the first full connection Layer of the module, the similarity feature Fscore1 is output and obtained, and the similarity feature Fscore1 is input into a Layer _c In the second full connection Layer of the module, the similarity feature Fscore2 is output and obtained, and the similarity feature Fscore2 is input into a Layer _c In the third full connection Layer of the module, the similarity feature Fscore3 is output and obtained, and the similarity feature Fscore3 is input into a Layer _c Outputting to obtain a similarity score in a fourth full-connection layer of the module;

b-7) true and false discriminator D _TF The characteristic key point lm is formed by a first full-connection layer, a second full-connection layer, a third full-connection layer, a fourth full-connection layer, a fifth full-connection layer and a sixth full-connection layer _fake Input to a true-false discriminator D _TF In the first full connection layer of (a), output obtaining characteristicsFeatures->Input to the second full connection layer, output get feature +.>Features->Input to the third full connection layer, output get feature +.>Features->Input to the fourth full connection layer, output get feature +. >Features->Input to the fifth full connection layer, output get feature +.>Characteristics->Input to the sixth full connection layer, output to obtain the value of 1 channel +.>

b-8) passing through the formula loss _L1 ＝||lm _fake -lm _s || ² Calculating to obtain point-by-point loss _L1 In the formula, I.I ² Is the average square error, loss _Cycle ＝||lm _fake -lm _t || ² Calculated to obtainLoss to reconstruction loss _Cycle By the formulaCalculating to obtain true and false loss _DTF By the formulaCalculating to obtain similarity loss _DS By using point-by-point loss through back propagation _L1 Loss of reconstruction loss _Cycle Loss of true and false loss _DTF Similarity loss _DS Iterative optimization feature key point lm _fake 。

In the step b-2), the convolution kernels of the first downsampling convolution layer, the second downsampling convolution layer, the third downsampling convolution layer, the fourth downsampling convolution layer and the fifth downsampling convolution layer are all 1, the step sizes are all 1, and the filling is all 0; step by step

In step b-5), the convolution kernels of the first, second, third, fourth and fifth upsampling convolution layers are all 1, the step sizes are all 1, and the fillings are all 0.

Further, step c) comprises the steps of:

c-1) establishing a message by the identity encoder E _id And attribute encoder E _attr A face image feature extraction network is formed;

c-2) identity encoder E _id The source image Pic is formed by an Arcface algorithm _s Input to identity encoder E _id In (3), the source image Pic is processed through the interaction () function _s Adjusting to 112 x 112 resolution, inputting an image with 112 x 112 resolution into an Arcface algorithm, and outputting to obtain an identity vectorWherein b is training batch, c is channel number, h is image height, w is image width, identity vector +.>Sequentially input to the filling layer and the regularizationIn the chemical layer, outputting to obtain identity feature F _id ；

c-3) Attribute encoder E _attr The target image Pic comprises a first downsampling residual block, a second downsampling residual block, a third downsampling residual block, a fourth downsampling residual block, a fifth downsampling residual block, a first bottleneck residual block and a second bottleneck residual block, wherein the first downsampling residual block, the second downsampling residual block, the third downsampling residual block, the fourth downsampling residual block and the fifth downsampling residual block are sequentially formed by a first normalization layer, a second normalization layer, a first ReLU activation layer, a second ReLU activation layer, a first convolution layer, a second convolution layer, a downsampling layer and a residual connecting layer, and the first bottleneck residual block and the second bottleneck residual block are sequentially formed by a first normalization layer, a second normalization layer, a first ReLU activation layer, a second ReLU activation layer, a first convolution layer, a second convolution layer and a residual connecting layer respectively _t Input to attribute encoder E _attr In the first downsampled residual block of (2), outputting to obtain attribute featuresAttribute feature->Input into the second downsampled residual block and output the resulting attribute +.>Attribute feature->Input into the third downsampled residual block and output the resulting attribute +.>Attribute feature->Input into a fourth downsampled residual block and output to obtain attribute +.>Attribute feature->Input into the fifth downsampled residual block and output the resulting attribute +.>Attribute feature->Input into the first bottleneck residual block, output to get attribute +.>Attribute feature->Inputting into a second bottleneck residual block, and outputting to obtain attribute F _attr 。

In the step c-3), the first normalization layer and the second normalization layer in the first downsampling residual block, the second downsampling residual block, the third downsampling residual block, the fourth downsampling residual block and the fifth downsampling residual block are all of BatchNorm2d; in the step c-3), convolution kernels of a first convolution layer and a second convolution layer in the first downsampling residual block, the second downsampling residual block, the third downsampling residual block, the fourth downsampling residual block and the fifth downsampling residual block are 3, and filling and step sizes are 1.

Further, step d) comprises the steps of:

d-1) establishing a face analysis module and a background information encoder E _bg A background feature extraction network is formed;

d-2) the face analysis module consists of a face analysis algorithm BiSeNet and is used for analyzing the target image Pic _t Inputting into a face analysis module, analyzing to obtain each part of the face, and filling colors into each part of the face to obtain an image Pic only retaining a background area _bg ；

d-3) background information encoder E _bg The image Pic is formed by a first self-attention module, a second self-attention module, a third self-attention module, a fourth self-attention module and a fifth self-attention module, wherein the first self-attention module, the second self-attention module, the third self-attention module, the fourth self-attention module and the fifth self-attention module are sequentially formed by a downsampling convolution layer, a self-attention layer and a ReLU activation layer _bg Input to background information encoder E _bg In the first self-attention module of (2), outputting to obtain background characteristicsBackground feature->Input into a second self-attention module, output to obtain background characteristicsBackground feature->Input into the third self-attention module, output gets the background feature +.>Will background featuresInput to the fourth self-attention module, output to get the background feature +. >Background feature->Input into a fourth self-attention module, output to obtain background feature F _bg 。

In the step d-3), the convolution kernels of the downsampled convolution layers of the first self-attention module, the second self-attention module, the third self-attention module, the fourth self-attention module and the fifth self-attention module are all 3, the step sizes are all 0, and the filling is all 0.

Further, step e) comprises the steps of:

e-1) establishing a generating network formed by a fusion module, an up-sampling module and a discriminator module;

e-2) the fusion module is composed of a first fusion block, a second fusion block, a third fusion block, a fourth fusion block, a fifth fusion block and a sixth fusion block, wherein the first fusion block, the second fusion block, the third fusion block, the fourth fusion block, the fifth fusion block and the sixth fusion block are sequentially composed of a first convolution layer, a first self-adaptive instance normalization layer, a ReLU activation layer, a second convolution layer and a second self-adaptive instance normalization layer, and the attribute characteristic F is obtained _attr Inputting the attribute characteristics into a fusion module, and obtaining the attribute characteristics after the attribute characteristics pass through a first convolution layer of the first fusion blockBy incorporating identity feature F _id And attribute features->Input into the first adaptive instance normalization layer, by the formula +. >Calculating to obtain fusion characteristic->Middle sigma _id For identity feature F _id Standard deviation of>Is attribute feature->Mu (·) is the channel mean operation, sigma (·) is the standard deviation operation, fusion feature ++>Inputting into ReLU activation layer to obtain characteristic +.>Features->After input to the second convolution layer, the attribute feature +.>By incorporating identity feature F _id And attribute features->Input into the second adaptive instance normalization layer, by the formula +.>Calculating to obtain fusion characteristic->In the middle ofIs attribute feature->Channel mean value of (2);

e-3) fusing featuresInputting the attribute characteristics into a fusion module, and obtaining the attribute characteristics after the attribute characteristics pass through a first convolution layer of a second fusion block>By incorporating identity feature F _id And attribute features->Input into the first adaptive instance normalization layer, by the formula +.>Calculating to obtain fusion characteristic->In->Is attribute feature->Channel mean of (2), fusion features->Inputting into ReLU activation layer to obtain characteristic +.>Features->After input to the second convolution layer, the attribute feature +.>By incorporating identity feature F _id And attribute features->Input into the second adaptive instance normalization layer, by the formula +.>Calculating to obtain fusion characteristic->In the middle ofIs attribute feature- >Channel mean value of (2);

e-4) fusing featuresInputting the attribute characteristics into a fusion module, and obtaining the attribute characteristics after the attribute characteristics pass through a first convolution layer of a third fusion block>By incorporating identity feature F _id And attribute features->Input into the first adaptive instance normalization layer, by the formula +.>Calculating to obtain fusion characteristic->In->Is attribute feature->Channel mean of (2), fusion features->Inputting into ReLU activation layer to obtain characteristic +.>Features->After input to the second convolution layer, the attribute feature +.>By incorporating identity feature F _id And attribute features->Input into the second adaptive instance normalization layer, by the formula +.>Calculating to obtain fusion characteristic->In the middle ofIs attribute feature->Channel mean value of (2);

e-5) fusing featuresInputting the attribute characteristics into a fusion module, and obtaining the attribute characteristics after the attribute characteristics pass through a first convolution layer of a fourth fusion block>By incorporating identity feature F _id And attribute features->Input into the first adaptive instance normalization layer, by the formula +.>Calculating to obtain fusion characteristic->In->Is attribute feature->Channel mean of (2), fusion features->Inputting into ReLU activation layer to obtain characteristic +.>Features->After input to the second convolution layer, the attribute feature +. >By incorporating identity feature F _id And attribute features->Input into the second adaptive instance normalization layer, by the formula +.>Calculating to obtain fusion characteristic->In the middle ofIs attribute feature->Channel mean value of (2);

e-6) fusing featuresInputting the attribute characteristics into a fusion module, and obtaining the attribute characteristics after the attribute characteristics pass through a first convolution layer of a fifth fusion block from the fifth fusion block>By incorporating identity feature F _id And attribute features->Input into the first adaptive instance normalization layer, by the formula +.>Calculating to obtain fusion characteristic->In->Is attribute feature->Channel mean of (2), fusion features->Inputting into ReLU activation layer to obtain characteristic +.>Features->After input to the second convolution layer, the attribute feature +.>By incorporating identity feature F _id And attribute features->Input into the second adaptive instance normalization layer, by the formula +.>Calculating to obtain fusion characteristic->In the middle ofIs attribute feature->Channel mean value of (2);

e-7) fusing featuresInputting the attribute characteristics into a fusion module, and obtaining the attribute characteristics after the attribute characteristics pass through a first convolution layer of a sixth fusion block from the sixth fusion block>By incorporating identity feature F _id And attribute features->Input into the first adaptive instance normalization layer, by the formula +.>Calculating to obtain fusion characteristic->In->Is attribute feature- >Channel mean of (2), fusion features->Inputting into ReLU activation layer to obtain characteristic +.>Features->After input to the second convolution layer, the attribute feature +.>By incorporating identity feature F _id And attribute features->Input into the second adaptive instance normalization layer, by the formula +.>Calculating to obtain fusion characteristic->In->Is attribute feature->Channel mean value of (2);

e-8) feature key point lm after optimizing _fake Respectively inputting two convolution layers to obtain characteristic F _gamma And feature F _beta By the formulaCalculating to obtain a fusion vector F _fuse ；

e-9) the upsampling module is composed of a first upsampling layer, a second upsampling layer, a third upsampling layer, a fourth upsampling layer and a fifth upsampling layer, and features the backgroundAnd fusion vector F _fuse Input into the first upsampling layer of the upsampling module, output the resulting feature +.>Features->And background features->Together into a second upsampling layer, and output to obtain featuresFeatures->And background features->Together into a third upsampling layer, the output gets the feature +.>Features to be characterizedAnd background features->Together into a fourth upsampling layer, the output gets the feature +.>Features->And background feature F _bg Together input into a fifth upsampling layer, and output to obtain face image Pic _fake ；

e-10) the discriminator module is composed of a first downsampling convolution layer, a second downsampling convolution layer, a third downsampling convolution layer, a fourth downsampling convolution layer, a fifth downsampling convolution layer, a sixth downsampling convolution layer and a Sigmoid function layer,face image Pic _fake Input to the first downsampled convolution layer and output to obtain featuresFeatures->After input to the second downsampled convolution layer, the output yields the feature + ->Features->After input to the third downsampled convolution layer, the output yields the feature + ->Features->Input to the fourth downsampling convolution layer and output to obtain the characteristicFeatures->After input to the fifth downsampled convolution layer, the output yields the feature + ->Features to be characterizedAfter input to the sixth downsampled convolution layer, the output yields the feature + ->Features->After input to the Sigmoid function layer, the value +.>Image Pic of target _t After input to the first downsampled convolution layer, the characteristic +.>Features->After input to the second downsampled convolution layer, the output yields the feature + ->Features to be characterizedAfter input to the third downsampled convolution layer, the output yields the feature + ->Features->After input to the fourth downsampled convolution layer, the output yields the feature +.>Features- >Input to the fifth downsampling convolution layer and output to obtain featuresFeatures->Input to the sixth downsampling convolution layer and output to obtain a specialSyndrome of->Features->After input to the Sigmoid function layer, the value +.>e-11) by the formula->The loss of identity l1 is calculated, and the loss of identity l1 is calculated by the formula l2= ||pic _fake -Pic _t || ² Calculating to obtain reconstruction loss l2 through a formulaThe attribute loss l3 is calculated, and the face image Pic is optimized by utilizing the identity loss l1, the reconstruction loss l2 and the attribute loss l3 through a back propagation method in an iterative mode _fake 。

Further, in step e-2), the convolution kernels of the first convolution layer and the second convolution layer of the first fusion block, the second fusion block, the third fusion block, the fourth fusion block, the fifth fusion block and the sixth fusion block are 3, the step sizes are 1, and the filling is 0; in the step e-8), the convolution kernels of the two convolution layers are 1, the step sizes are 1, and the filling is 0; in the step e-9), the convolution kernels of the first upsampling layer, the second upsampling layer, the third upsampling layer and the fourth upsampling layer are all 3, the step sizes are all 1, the filling is all 1, the convolution kernel of the fifth upsampling layer is 7, the step size is 1, and the filling is 0; e-10), the convolution kernels of the first downsampling convolution layer, the second downsampling convolution layer and the third downsampling convolution layer are 4*4, the step sizes are 2, the filling is 1, the convolution kernels of the fourth downsampling convolution layer, the fifth downsampling convolution layer and the sixth downsampling convolution layer are 4*4, the step sizes are 1, and the filling is 1.

The beneficial effects of the invention are as follows: and extracting identity information, attribute information and background information from each image, fusing the information in a feature fusion mode, and finally obtaining a final result from the fused information in an image generation mode. The method introduces feature key points for guiding face shape changes. Meanwhile, the background information added in the training process enables the generated face image to be changed and stable in quality.

Drawings

FIG. 1 is a flow chart of the method of the present invention;

FIG. 2 is a diagram of a key point extraction and adjustment architecture of the present invention;

FIG. 3 is a network architecture diagram of a keypoint arbiter of the present invention;

FIG. 4 is a diagram of an attribute extraction architecture and a downsampling architecture of the present invention;

FIG. 5 is a normalized block diagram of a spatially adaptive example of the present invention;

fig. 6 is a diagram showing a semantic parsing and background information extraction structure according to the present invention.

Detailed Description

The invention is further described with reference to fig. 1 to 6.

a) And extracting key points of the face images from all face images in the CelebA face image data set.

b) Establishing a PET key point adjustment network, and inputting key points of the face image into the PET key point adjustment network to obtain characteristic key points lm _fake For characteristic key point lm _fake Iteration is carried out to obtain optimized characteristic key points lm _fake 。

c) Establishing a face image feature extraction network, and collecting CelebA face image data into a source image Pic _s And a target image Pic _t Inputting into a face image feature extraction network, and respectively outputting to obtain identity features F _id Attribute features F _attr 。

d) Establishing a background feature extraction network, and performing Pic on the target image _t Inputting into a background feature extraction network to obtain background feature information F _bg 。

e) Establishing a generating network, and integrating the identity characteristic F _id Attribute characteristics F _attr Background characteristic information F _bg Feature key point lm after optimization _fake Inputting into a generation network to obtain a face image Pic _fake For the image Pic _fake Iteration is carried out to obtain an optimized face image Pic _fake 。

f) Repeating the steps b) to e) to obtain a face image Pic of a person with a real face with a changed outline _fake . The feature key points for semantic guidance of facial form changes are provided, identity information, attribute information and background information are extracted from each image, information is fused in a feature fusion mode, and finally the fused information is subjected to image generation mode to obtain a final result. The method introduces feature key points for guiding face shape changes. Meanwhile, the background information added in the training process enables the generated face image to be changed and stable in quality.

Example 1:

step a) comprises the steps of:

a-1) detecting key points of all face images in the CelebA face image data set by using a face key point detection algorithm H3R, and obtaining a source image Pic in the CelebA face image data set _s The extracted key points are expressed as source key points lm _s Target image Pic from CelebA face image dataset _t The extracted key points are expressed as source key points lm _t . CelebA face image dataset consists of 30000 face images with different identities, the resolution of each image is 512 x 512, and the source image Pic is a source image Pic _s And a target image Pic _t Are images in the CelebA dataset.

Example 2:

step b) comprises the steps of:

b-1) set up by source encoder E _lms Target encoder E _lmt Key point generator G _lm Similarity discriminator D _S True and false discriminator D _TF The PET key points are formed to adjust the network.

b-2) Source encoder E _lms From a first downsampling rollThe source key point lm is formed by a lamination layer, a second downsampling convolution layer, a third downsampling convolution layer, a fourth downsampling convolution layer and a fifth downsampling convolution layer _s Input to source encoder E _lms In the first downsampled convolution layer of (2), outputting to obtain characteristic informationCharacteristic information +.>Input into the second downsampled convolution layer, output the resulting characteristic information +. >Characteristic information +.>Input into the third downsampled convolution layer, output the resulting characteristic information +.>Characteristic information +.>Input into a fourth downsampling convolution layer, and output to obtain characteristic information>Characteristic information +.>Input into fifth downsampling convolution layer, and output to obtain characteristic information +>b-3) target encoder E _lmt Is composed of the first, the second, the third, the fourth and the fifth full-connection layers, and the source key point lm _t Input to target encoder E _lmt In the first full connection layer of (a), outputting and obtaining characteristic information/>Characteristic information +.>Input into the second full connection layer, output the obtained characteristic information +.>Characteristic information +.>Input into the third full connection layer, output the obtained characteristic information +.>Characteristic information +.>Input into fourth full connection layer, output to get characteristic information +.>To the characteristic informationInput into fifth connection layer, output to get characteristic information +.>b-4) feature information Using a torch.cat () functionAnd characteristic information->Stacking to obtain feature vector->

b-5) offKey point generator G _lm The feature vector is formed by a first upsampling convolution layer, a second upsampling convolution layer, a third upsampling convolution layer, a fourth upsampling convolution layer and a fifth upsampling convolution layer Input to the keypoint generator G _lm In the first upsampled convolutional layer of (2), the output gets the feature key point +.>Feature key point->Inputting into a second up-sampling convolution layer, outputting to obtain characteristic key points +.>Feature key point->Inputting into a third up-sampling convolution layer, outputting to obtain characteristic key points +.>Feature key point->Inputting into a fourth up-sampling convolution layer, outputting to obtain characteristic key points +.>Feature key point->Inputting into a fifth up-sampling convolution layer, outputting to obtain characteristic key points lm _fake Its dimension is 1 x 212.

b-6) similarity discriminator D _S From Layer _s Module, layer _fake Module, layer _c Module structureLayer is formed _fake The module consists of a first full-connection layer, a second full-connection layer, a third full-connection layer and a fourth full-connection layer, and features key points lm _fake Input to Layer _fake In the first full connection layer of the module, outputting and obtaining characteristic informationCharacteristic information +.>Input to Layer _fake In the second fully connected layer of the module, the obtained characteristic information is output +.>Characteristic information +.>Input to Layer _fake In the third fully connected layer of the module, the obtained characteristic information is output +.>Characteristic information +.>Input to Layer _fake In the fourth fully connected layer of the module, the obtained characteristic information is output +.>Layer _s The module consists of a first full-connection layer, a second full-connection layer, a third full-connection layer and a fourth full-connection layer, and the source key point lm is formed _s Input to Layer _s The module obtains characteristic information from the output of the first full connection layer>Characteristic information +.>Input to Layer _s The module outputs the feature information +.>Characteristic information +.>Input to Layer _s The module obtains the characteristic information from the third full connection layer by outputting>Characteristic information +.>Input to Layer _s The module obtains the characteristic information by outputting in the fourth full connection layerFeature information +.cat () function is used>And characteristic information->Stacking to obtain feature vector->Layer _c The module is composed of a first full-connection layer, a second full-connection layer, a third full-connection layer and a fourth full-connection layer, and feature vectors are +.>Input to Layer _c In the first full connection Layer of the module, the similarity feature Fscore1 is output and obtained, and the similarity feature Fscore1 is input into a Layer _c In the second full connection Layer of the module, the similarity feature Fscore2 is output and obtained, and the similarity feature Fscore2 is input into a Layer _c In the third full connection layer of the module, the output obtains the similarity feature Fscore3, which is similarThe degree characteristic Fscore3 is input to Layer _c In the fourth full-connection layer of the module, the output obtains a similarity score.

b-7) true and false discriminator D _TF The characteristic key point lm is formed by a first full-connection layer, a second full-connection layer, a third full-connection layer, a fourth full-connection layer, a fifth full-connection layer and a sixth full-connection layer _fake Input to a true-false discriminator D _TF In the first full connection layer of (a), output obtaining characteristicsFeatures->Input to the second full connection layer, output get feature +.>Features->Input to the third full connection layer, output get feature +.>Features->Input to the fourth full connection layer, output get feature +.>Features->Input to the fifth full connection layer, output get feature +.>Characteristics->Input to the sixth wholeA connection layer outputting a value of 1 channel +.>

b-8) passing through the formula loss _L1 ＝||lm _fake -lm _s || ² Calculating to obtain point-by-point loss _L1 In the formula, I.I ² Is the average square error, loss _Cycle ＝||lm _fake -lm _t || ² Calculating to obtain reconstruction loss _Cycle By the formulaCalculating to obtain true and false loss _DTF By the formula->Calculating to obtain similarity loss _DS By using point-by-point loss through back propagation _L1 Loss of reconstruction loss _Cycle Loss of true and false loss _DTF Similarity loss _DS Iterative optimization feature key point lm _fake 。

Example 3:

in the step b-2), the convolution kernels of the first downsampling convolution layer, the second downsampling convolution layer, the third downsampling convolution layer, the fourth downsampling convolution layer and the fifth downsampling convolution layer are all 1, the step sizes are all 1, and the filling is all 0; in the step b-5), the convolution kernels of the first up-sampling convolution layer, the second up-sampling convolution layer, the third up-sampling convolution layer, the fourth up-sampling convolution layer and the fifth up-sampling convolution layer are all 1, the step length is all 1, and the filling is all 0.

Example 4:

step c) comprises the steps of:

c-1) establishing a message by the identity encoder E _id And attribute encoder E _attr A face image feature extraction network is formed; c-2) identity encoder E _id The source image Pic is formed by an Arcface algorithm _s Input to identity encoder E _id In (3), the source image Pic is processed through the interaction () function _s Adjusted to 112 x 112Resolution, inputting an image with 112 x 112 resolution into an Arcface algorithm, and outputting to obtain an identity vectorWherein b is training batch, c is channel number, h is image height, w is image width, identity vector +.>Sequentially inputting into a filling layer and a regularization layer, and outputting to obtain identity feature F _id ；

Example 5:

Example 6:

step d) comprises the steps of:

d-3) background information encoder E _bg The image Pic is formed by a first self-attention module, a second self-attention module, a third self-attention module, a fourth self-attention module and a fifth self-attention module, wherein the first self-attention module, the second self-attention module, the third self-attention module, the fourth self-attention module and the fifth self-attention module are sequentially formed by a downsampling convolution layer, a self-attention layer and a ReLU activation layer _bg Input to background information encoder E _bg In the first self-attention module of (2), outputting to obtain background characteristicsBackground feature->Input into a second self-attention module, output to obtain background characteristicsBackground feature->Input into the third self-attention module, output gets the background feature +.>Will background features Input to the fourth self-attention module, output to get the background feature +.>Background feature->Input into a fourth self-attention module, output to obtain background feature F _bg 。

Example 7:

Example 8:

step e) comprises the steps of:

e-2) the fusion module is composed of a first fusion block, a second fusion block, a third fusion block, a fourth fusion block, a fifth fusion block and a sixth fusion block, wherein the first fusion block, the second fusion block, the third fusion block, the fourth fusion block, the fifth fusion block and the sixth fusion block are sequentially composed of a first convolution layer, a first self-adaptive instance normalization layer, a ReLU activation layer, a second convolution layer and a second self-adaptive instance normalization layer, and the attribute characteristic F is obtained _attr Inputting the attribute characteristics into a fusion module, and obtaining the attribute characteristics after the attribute characteristics pass through a first convolution layer of the first fusion blockBy incorporating identity feature F _id And attribute features->Input into the first adaptive instance normalization layer, by the formula +.>Calculating to obtain fusion characteristic->Middle sigma _id For identity feature F _id Standard deviation of>Is attribute feature->Mu (·) is the channel mean operation, sigma (·) is the standard deviation operation, fusion feature ++>Inputting into ReLU activation layer to obtain characteristic +.>Features->After input to the second convolution layer, the attribute feature +.>By incorporating identity feature F _id And attribute features->Input into the second adaptive instance normalization layer, by the formula +.>Calculating to obtain fusion characteristic->In the middle ofIs attribute feature->Channel mean value of (2); />

e-3) fusing featuresInputting the attribute characteristics into a fusion module, and obtaining the attribute characteristics after the attribute characteristics pass through a first convolution layer of a second fusion block>By incorporating identity feature F _id And attribute features->Input into the first adaptive instance normalization layer, by the formula +.>Calculating to obtain fusion characteristic->In->Is attribute feature->Channel mean of (2), fusion features->Inputting into ReLU activation layer to obtain characteristic +.>Features->After input to the second convolution layer, the attribute feature +.>By incorporating identity feature F _id And attribute features->Input into the second adaptive instance normalization layer, by the formula +. >Calculating to obtain fusion characteristic->In the middle ofIs attribute feature->Channel mean value of (2);

e-5) fusing featuresInputting the attribute characteristics into a fusion module, and obtaining the attribute characteristics after the attribute characteristics pass through a first convolution layer of a fourth fusion block>By incorporating identity feature F _id And attribute features->Input into the first adaptive instance normalization layer, by the formula +.>Calculating to obtain fusion characteristic->In->Is attribute feature->Channel mean of (2), fusion features->Inputting into ReLU activation layer to obtain characteristic +.>Features- >After input to the second convolution layer, the attribute feature +.>By incorporating identity feature F _id And attribute features->Input into the second adaptive instance normalization layer, by the formula +.>Calculating to obtain fusion characteristic->In the middle ofIs attribute feature->Channel mean value of (2); />

e-7) fusing featuresInputting the attribute characteristics into a fusion module, and obtaining the attribute characteristics after the attribute characteristics pass through a first convolution layer of a sixth fusion block from the sixth fusion block>By incorporating identity feature F _id And attribute features->Input into the first adaptive instance normalization layer, by the formula +. >Calculating to obtain fusion characteristic->In->Is attribute feature->Channel mean of (2), fusion features->Inputting into ReLU activation layer to obtain characteristic +.>Features->After input to the second convolution layer, the attribute feature +.>By incorporating identity feature F _id And attribute features->Input into the second adaptive instance normalization layer, by the formula +.>Calculating to obtain fusion characteristic->In->Is attribute feature->Channel mean value of (2);

e-9) the upsampling module is composed of a first upsampling layer, a second upsampling layer, a third upsampling layer, a fourth upsampling layer and a fifth upsampling layer, and features the backgroundAnd fusion vector F _fuse Input into the first upsampling layer of the upsampling module, output the resulting feature +.>Features->And background features->Together into a second upsampling layer, and output to obtain featuresFeatures->And background features->Together into a third upsampling layer, the output gets the feature +.>Features to be characterizedAnd background features->Together input to the firstIn the four upsampling layers, the output gets the feature +.>Features->And background feature F _bg Together input into a fifth upsampling layer, and output to obtain face image Pic _fake ；

The e-10) discriminator module consists of a first downsampling convolution layer, a second downsampling convolution layer, a third downsampling convolution layer, a fourth downsampling convolution layer, a fifth downsampling convolution layer, a sixth downsampling convolution layer and a Sigmoid function layer, and is used for converting the face image Pic _fake Input to the first downsampled convolution layer and output to obtain featuresFeatures->After input to the second downsampled convolution layer, the output yields the feature + ->Features->After input to the third downsampled convolution layer, the output yields the feature + ->Features->Input to the fourth downsampling convolution layer and output to obtain the characteristicFeatures->After input to the fifth downsampled convolution layer, the output yields the feature + ->Features to be characterizedAfter input to the sixth downsampled convolution layer, the output yields the feature + ->Features->After input to the Sigmoid function layer, the value +.>Image Pic of target _t After input to the first downsampled convolution layer, the characteristic +.>Features->After input to the second downsampled convolution layer, the output yields the feature + ->Features to be characterizedAfter input to the third downsampled convolution layer, the output yields the feature + ->Features->After input to the fourth downsampled convolution layer, the output yields the feature +. >Features->Input to the fifth downsampling convolution layer and output to obtain featuresFeatures->After input to the sixth downsampled convolution layer, the output yields the feature + ->Features->After input to the Sigmoid function layer, the value +.>e-11) by the formula->The loss of identity l1 is calculated, and the loss of identity l1 is calculated by the formula l2= ||pic _fake -Pic _t || ² Calculating to obtain reconstruction loss l2 through a formulaThe attribute loss l3 is calculated, and the face image Pic is optimized by utilizing the identity loss l1, the reconstruction loss l2 and the attribute loss l3 through a back propagation method in an iterative mode _fake 。

Example 9:

in the step e-2), the convolution kernels of the first convolution layer and the second convolution layer of the first fusion block, the second fusion block, the third fusion block, the fourth fusion block, the fifth fusion block and the sixth fusion block are 3, the step sizes are 1, and the filling is 0; in the step e-8), the convolution kernels of the two convolution layers are 1, the step sizes are 1, and the filling is 0; in the step e-9), the convolution kernels of the first upsampling layer, the second upsampling layer, the third upsampling layer and the fourth upsampling layer are all 3, the step sizes are all 1, the filling is all 1, the convolution kernel of the fifth upsampling layer is 7, the step size is 1, and the filling is 0; e-10), the convolution kernels of the first downsampling convolution layer, the second downsampling convolution layer and the third downsampling convolution layer are 4*4, the step sizes are 2, the filling is 1, the convolution kernels of the fourth downsampling convolution layer, the fifth downsampling convolution layer and the sixth downsampling convolution layer are 4*4, the step sizes are 1, and the filling is 1.

Finally, it should be noted that: the foregoing description is only a preferred embodiment of the present invention, and the present invention is not limited thereto, but it is to be understood that modifications and equivalents of some of the technical features described in the foregoing embodiments may be made by those skilled in the art, although the present invention has been described in detail with reference to the foregoing embodiments. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. The face image identity synthesis method based on semantic guidance is characterized by comprising the following steps of:

c) Establishing a face image feature extraction network, and collecting CelebA face image data into a source image Pic _s And a target image Pic _t Inputting into a face image feature extraction network, and respectively outputting to obtain identity features F _id Attribute features F _attr ；

f) Repeating the steps b) to e) to obtain a face image Pic of a person with a real face with a changed outline _fake ；

Step b) comprises the steps of:

b-2) Source encoder E _lms The source key point lm is formed by a first downsampling convolution layer, a second downsampling convolution layer, a third downsampling convolution layer, a fourth downsampling convolution layer and a fifth downsampling convolution layer _s Input to source encoder E _lms In the first downsampled convolution layer of (2), outputting to obtain characteristic informationCharacteristic information +.>Input into the second downsampled convolution layer, output the resulting characteristic information +.>Characteristic information +.>Input into the third downsampled convolution layer, output the resulting characteristic information +. >Characteristic information +.>Input into a fourth downsampling convolution layer, and output to obtain characteristic information>Characteristic information +.>Input into fifth downsampling convolution layer, and output to obtain characteristic information +>

b-3) target encoder E _lmt Is composed of the first, the second, the third, the fourth and the fifth full-connection layers, and the source key point lm _t Input to target encoder E _lmt In the first full connection layer of (a), outputting and obtaining characteristic informationCharacteristic information +.>Input into the second full connection layer, output the obtained characteristic information +.>To the characteristic informationInput into the third full connection layer, output the obtained characteristic information +.>Characteristic information +.>Input into fourth full connection layer, output to get characteristic information +.>Characteristic information +.>Input into the fifth connection layer, and output to obtain characteristic information

b-4) feature information Using a torch.cat () functionAnd characteristic information->Stacking to obtain feature vector->

b-5) Critical Point Generator G _lm The feature vector is formed by a first upsampling convolution layer, a second upsampling convolution layer, a third upsampling convolution layer, a fourth upsampling convolution layer and a fifth upsampling convolution layerInput to the keypoint generator G _lm In the first upsampled convolutional layer of (2), the output gets the feature key point +. >Feature key point->Inputting into a second up-sampling convolution layer, outputting to obtain characteristic key points +.>Feature key point->Inputting into a third up-sampling convolution layer, outputting to obtain characteristic key points +.>Feature key point->Inputting into a fourth up-sampling convolution layer, outputting to obtain characteristic key pointsFeature key point->Inputting into a fifth up-sampling convolution layer, outputting to obtain characteristic key points lm _fake ；

b-6) similarity discriminator D _S From Layer _s Module, layer _fake Module, layer _c Module structure, layer _fake The module consists of a first full-connection layer, a second full-connection layer, a third full-connection layer and a fourth full-connection layer, and features key points lm _fake Input to Layer _fake In the first full connection layer of the module, outputting and obtaining characteristic informationCharacteristic information +.>Input to Layer _fake In the second full connection layer of the module, outputting and obtaining characteristic information/>Characteristic information +.>Input to Layer _fake In the third fully connected layer of the module, the obtained characteristic information is output +.>Characteristic information +.>Input to Layer _fake In the fourth fully connected layer of the module, the obtained characteristic information is output +.>Layer _s The module consists of a first full-connection layer, a second full-connection layer, a third full-connection layer and a fourth full-connection layer, and the source key point lm is formed _s Input to Layer _s The module obtains characteristic information from the output of the first full connection layer>Characteristic information +.>Input to Layer _s The module outputs the feature information +.>Characteristic information +.>Input to Layer _s The module obtains the characteristic information from the third full connection layer by outputting>Characteristic information +.>Input to Layer _s The module obtains the characteristic information by outputting in the fourth full connection layerFeature information +.cat () function is used>And characteristic information->Stacking to obtain feature vector->Layer _c The module is composed of a first full-connection layer, a second full-connection layer, a third full-connection layer and a fourth full-connection layer, and feature vectors are +.>Input to Layer _c In the first full connection Layer of the module, the similarity feature Fscore1 is output and obtained, and the similarity feature Fscore1 is input into a Layer _c In the second full connection Layer of the module, the similarity feature Fscore2 is output and obtained, and the similarity feature Fscore2 is input into a Layer _c In the third full connection Layer of the module, the similarity feature Fscore3 is output and obtained, and the similarity feature Fscore3 is input into a Layer _c Outputting to obtain a similarity score in a fourth full-connection layer of the module;

b-7) true and false discriminator D _TF The characteristic key point lm is formed by a first full-connection layer, a second full-connection layer, a third full-connection layer, a fourth full-connection layer, a fifth full-connection layer and a sixth full-connection layer _fake Input to a true-false discriminator D _TF In the first full connection layer of (2), outputObtaining characteristicsFeatures->Input to the second full connection layer, output get feature +.>Features to be characterizedInput to the third full connection layer, output get feature +.>Features->Input to the fourth full connection layer, output get feature +.>Features->Input to the fifth full connection layer, output get feature +.>Characteristics->Input to the sixth full connection layer, output to obtain the value of 1 channel +.>

b-8) passing through the formula loss _L1 ＝||lm _fake -lm _s || ² Calculating to obtain point-by-point loss _L1 In the formula, I.I ² Is the average square error, loss _Cycle ＝||lm _fake -lm _t || ² Calculating to obtain reconstruction loss _Cycle By the formulaCalculating to obtain true and false loss _DTF By the formulaCalculating to obtain similarity loss _DS By using point-by-point loss through back propagation _L1 Loss of reconstruction loss _Cycle Loss of true and false loss _DTF Similarity loss _DS Iterative optimization feature key point lm _fake ；

Step c) comprises the steps of:

c-2) identity encoder E _id The source image Pic is formed by an Arcface algorithm _s Input to identity encoder E _id In (3), the source image Pic is processed through the interaction () function _s Adjusting to 112 x 112 resolution, inputting an image with 112 x 112 resolution into an Arcface algorithm, and outputting to obtain an identity vector Wherein b is training batch, c is channel number, h is image height, w is image width, identity vector +.>Sequentially inputting into a filling layer and a regularization layer, and outputting to obtain identity feature F _id ；

c-3) Attribute encoder E _attr From a first downsampled residual block, a second downsampled residual block, a third downsampled residual block, a fourth downsampled residual block, and a fifthThe downsampling residual block, the first bottleneck residual block and the second bottleneck residual block are formed by a first normalization layer, a second normalization layer, a first ReLU activation layer, a second ReLU activation layer, a first convolution layer, a second convolution layer, a downsampling layer and a residual connecting layer in sequence, and the first bottleneck residual block and the second bottleneck residual block are formed by the first normalization layer, the second normalization layer, the first ReLU activation layer, the second ReLU activation layer, the first convolution layer, the second convolution layer and the residual connecting layer in sequence, so that the target image Pic _t Input to attribute encoder E _attr In the first downsampled residual block of (2), outputting to obtain attribute featuresAttribute feature->Input into the second downsampled residual block and output the resulting attribute +. >Attribute feature->Input into the third downsampled residual block and output the resulting attribute +.>Attribute feature->Input into a fourth downsampled residual block and output to obtain attribute +.>Attribute feature->Input into the fifth downsampled residual block and output the resulting attribute +.>Attribute feature->Inputting into a first bottleneck residual block, and outputting to obtain attribute featuresAttribute feature->Inputting into a second bottleneck residual block, and outputting to obtain attribute F _attr 。

2. The semantic guidance-based face image identity synthesis method according to claim 1, wherein the step a) comprises the steps of:

3. The semantic guidance-based face image identity synthesis method according to claim 1, wherein: in the step b-2), the convolution kernels of the first downsampling convolution layer, the second downsampling convolution layer, the third downsampling convolution layer, the fourth downsampling convolution layer and the fifth downsampling convolution layer are all 1, the step sizes are all 1, and the filling is all 0; in the step b-5), the convolution kernels of the first up-sampling convolution layer, the second up-sampling convolution layer, the third up-sampling convolution layer, the fourth up-sampling convolution layer and the fifth up-sampling convolution layer are all 1, the step length is all 1, and the filling is all 0.

4. The semantic guidance-based face image identity synthesis method according to claim 1, wherein: in the step c-3), the first normalization layer and the second normalization layer in the first downsampling residual block, the second downsampling residual block, the third downsampling residual block, the fourth downsampling residual block and the fifth downsampling residual block are all of BatchNorm2d; in the step c-3), convolution kernels of a first convolution layer and a second convolution layer in the first downsampling residual block, the second downsampling residual block, the third downsampling residual block, the fourth downsampling residual block and the fifth downsampling residual block are 3, and filling and step sizes are 1.

5. The semantic guidance-based face image identity synthesis method according to claim 1, wherein the step d) comprises the steps of:

d-3) background information encoder E _bg The image Pic is formed by a first self-attention module, a second self-attention module, a third self-attention module, a fourth self-attention module and a fifth self-attention module, wherein the first self-attention module, the second self-attention module, the third self-attention module, the fourth self-attention module and the fifth self-attention module are sequentially formed by a downsampling convolution layer, a self-attention layer and a ReLU activation layer _bg Input to background information encoder E _bg In the first self-attention module of (2), outputting to obtain background characteristicsBackground feature->Input into the second self-attention module, output gets the background feature +.>Background feature->Input into the third self-attention module, output gets the background feature +.>Background feature->Input to the fourth self-attention module, output to get the background feature +.>Background feature->Input into a fourth self-attention module, output to obtain background feature F _bg 。

6. The semantic guidance-based face image identity synthesis method according to claim 5, wherein: in the step d-3), the convolution kernels of the downsampled convolution layers of the first self-attention module, the second self-attention module, the third self-attention module, the fourth self-attention module and the fifth self-attention module are all 3, the step sizes are all 0, and the filling is all 0.

7. The semantic guidance-based face image identity synthesis method according to claim 1, wherein the step e) comprises the steps of:

e-2) the fusion module is composed of a first fusion block, a second fusion block, a third fusion block, a fourth fusion block, a fifth fusion block and a sixth fusion block, wherein the first fusion block, the second fusion block, the third fusion block, the fourth fusion block, the fifth fusion block and the sixth fusion block are sequentially composed of a first convolution layer, a first self-adaptive instance normalization layer, a ReLU activation layer, a second convolution layer and a second self-adaptive instance normalization layer, and the attribute characteristic F is obtained _attr Inputting the attribute characteristics into a fusion module, and obtaining the attribute characteristics after the attribute characteristics pass through a first convolution layer of the first fusion blockBy incorporating identity feature F _id And attribute features->Input into the first adaptive instance normalization layer, by the formula +.>Calculating to obtain fusion characteristic->Middle sigma _id For identity feature F _id Standard deviation of>Is attribute feature->Mu (·) is the channel mean operation, sigma (·) is the standard deviation operation, fusion feature ++>Inputting into ReLU activation layer to obtain characteristic +. >Features->After input to the second convolution layer, the attribute feature +.>By incorporating identity feature F _id And attribute features->Input into the second adaptive instance normalization layer, by the formula +.>Calculating to obtain fusion characteristic->In->Is attribute feature->Channel mean value of (2);

e-3) fusing featuresInputting the attribute characteristics into a fusion module, and obtaining the attribute characteristics after the attribute characteristics pass through a first convolution layer of a second fusion block>By incorporating identity feature F _id And attribute features->Input into the first adaptive instance normalization layer, by the formula +.>Calculating to obtain fusion characteristic->In->Is attribute characteristicsChannel mean of (2), fusion features->Inputting into ReLU activation layer to obtain characteristic +.>Features->After input to the second convolution layer, the attribute feature +.>By incorporating identity feature F _id And attribute features->Input into the second adaptive instance normalization layer, by the formula +.>Calculating to obtain fusion characteristic->In the middle ofIs attribute feature->Channel mean value of (2);

e-4) fusing featuresInputting the attribute characteristics into a fusion module, and obtaining the attribute characteristics after the attribute characteristics pass through a first convolution layer of a third fusion block>By incorporating identity feature F _id And attribute features->Input into the first adaptive instance normalization layer, by the formula +. >Calculating to obtain fusion characteristic->In->Is attribute characteristicsChannel mean of (2), fusion features->Inputting into ReLU activation layer to obtain characteristic +.>Features->After input to the second convolution layer, the attribute feature +.>By incorporating identity feature F _id And attribute features->Input into the second adaptive instance normalization layer, by the formula +.>Calculating to obtain fusion characteristic->In the middle ofIs attribute feature->Channel mean value of (2);

e-5) fusing featuresInputting the attribute characteristics into a fusion module, and obtaining the attribute characteristics after the attribute characteristics pass through a first convolution layer of a fourth fusion block>By incorporating identity feature F _id And attribute features->Input into the first adaptive instance normalization layer, by the formula +.>Calculating to obtain fusion characteristic->In->Is attribute characteristicsChannel mean of (2), fusion features->Inputting into ReLU activation layer to obtain characteristic +.>Features->After input to the second convolution layer, the attribute feature +.>By incorporating identity feature F _id And attribute features->Input into the second adaptive instance normalization layer, by the formula +.>Calculating to obtain fusion characteristic->In the middle ofIs attribute feature->Channel mean value of (2);

e-6) fusing featuresInputting the attribute characteristics into a fusion module, and obtaining the attribute characteristics after the attribute characteristics pass through a first convolution layer of a fifth fusion block from the fifth fusion block >By incorporating identity feature F _id And attribute features->Input into the first adaptive instance normalization layer, by the formula +.>Calculating to obtain fusion characteristic->In->Is attribute characteristicsChannel mean of (2), fusion features->Inputting into ReLU activation layer to obtain characteristic +.>Features->After input to the second convolution layer, the attribute feature +.>By incorporating identity feature F _id And attribute features->Input into the second adaptive instance normalization layer, by the formula +.>Calculating to obtain fusion characteristic->In the middle ofIs attribute feature->Channel mean value of (2);

e-7) fusing featuresInputting the attribute characteristics into a fusion module, and obtaining the attribute characteristics after the attribute characteristics pass through a first convolution layer of a sixth fusion block from the sixth fusion block>By incorporating identity feature F _id And attribute features->Input into the first adaptive instance normalization layer, by the formula +.>Calculated to obtainFusion characteristics->In->Is attribute characteristicsChannel mean of (2), fusion features->Inputting into ReLU activation layer to obtain characteristic +.>Features->After input to the second convolution layer, the attribute feature +.>By incorporating identity feature F _id And attribute features->Input into the second adaptive instance normalization layer, by the formula +.>Calculating to obtain fusion characteristic->In the middle ofIs attribute feature->Channel mean value of (2);

The e-10) discriminator module consists of a first downsampling convolution layer, a second downsampling convolution layer, a third downsampling convolution layer, a fourth downsampling convolution layer, a fifth downsampling convolution layer, a sixth downsampling convolution layer and a Sigmoid function layer, and is used for converting the face image Pic _fake Input to the first downsampled convolution layer and output to obtain featuresFeatures- >After input to the second downsampled convolution layer, the output yields the feature + ->Features->After input to the third downsampled convolution layer, the output yields the feature + ->Features->After input to the fourth downsampled convolution layer, the output yields the feature +.>Features->After input to the fifth downsampled convolution layer, the output yields the feature + ->Features->After input to the sixth downsampled convolution layer, the output yields the feature + ->Features->After input to the Sigmoid function layer, the value +.>Image Pic of target _t After input to the first downsampled convolution layer, the characteristic +.>Features->After input to the second downsampled convolution layer, the output yields the feature + ->Features->After input to the third downsampled convolution layer, the output yields the feature + ->Features->After input to the fourth downsampled convolution layer, the output yields the feature +.>Features->After input to the fifth downsampled convolution layer, the output yields the feature + ->Features->After input to the sixth downsampled convolution layer, the output yields the feature + ->Features->After input to the Sigmoid function layer, the value +.>

e-11) is calculated by the formulaThe loss of identity l1 is calculated, and the loss of identity l1 is calculated by the formula l2= ||pic _fake -Pic _t || ² Calculating to obtain reconstruction loss l2 by the formula +. >The attribute loss l3 is calculated, and the face image Pic is optimized by utilizing the identity loss l1, the reconstruction loss l2 and the attribute loss l3 through a back propagation method in an iterative mode _fake 。

8. The semantic guidance-based face image identity synthesis method according to claim 7, wherein: in the step e-2), the convolution kernels of the first convolution layer and the second convolution layer of the first fusion block, the second fusion block, the third fusion block, the fourth fusion block, the fifth fusion block and the sixth fusion block are 3, the step sizes are 1, and the filling is 0; in the step e-8), the convolution kernels of the two convolution layers are 1, the step sizes are 1, and the filling is 0; in the step e-9), the convolution kernels of the first upsampling layer, the second upsampling layer, the third upsampling layer and the fourth upsampling layer are all 3, the step sizes are all 1, the filling is all 1, the convolution kernel of the fifth upsampling layer is 7, the step size is 1, and the filling is 0; e-10), the convolution kernels of the first downsampling convolution layer, the second downsampling convolution layer and the third downsampling convolution layer are 4*4, the step sizes are 2, the filling is 1, the convolution kernels of the fourth downsampling convolution layer, the fifth downsampling convolution layer and the sixth downsampling convolution layer are 4*4, the step sizes are 1, and the filling is 1.