CN117391929A - High-definition face changing method based on stylized generation countermeasure network - Google Patents

High-definition face changing method based on stylized generation countermeasure network Download PDF

Info

Publication number
CN117391929A
CN117391929A CN202311280255.3A CN202311280255A CN117391929A CN 117391929 A CN117391929 A CN 117391929A CN 202311280255 A CN202311280255 A CN 202311280255A CN 117391929 A CN117391929 A CN 117391929A
Authority
CN
China
Prior art keywords
face
image
loss
target
changing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311280255.3A
Other languages
Chinese (zh)
Inventor
黄东晋
刘传蔓
刘金华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Shanghai for Science and Technology
Original Assignee
University of Shanghai for Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Shanghai for Science and Technology filed Critical University of Shanghai for Science and Technology
Priority to CN202311280255.3A priority Critical patent/CN117391929A/en
Publication of CN117391929A publication Critical patent/CN117391929A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • G06N3/0455Auto-encoder networks; Encoder-decoder networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0475Generative networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/094Adversarial learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person
    • G06T2207/30201Face
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Multimedia (AREA)
  • Mathematical Physics (AREA)
  • General Engineering & Computer Science (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Human Computer Interaction (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Image Processing (AREA)

Abstract

The invention relates to a high-definition face changing method based on stylized generation of an countermeasure network, which comprises the following steps: acquiring a source face-target face image pair; inputting the face image pair into a pre-constructed feature extraction network model, outputting a predicted face image, inputting a target face image into a target mask predictor, and outputting a predicted target mask, wherein the feature extraction network model comprises an identity encoder, an attribute encoder, a mapping network and a StyleGAN2 generator; inputting the face-changing image and the target mask into an FS-Net network model for face-changing image reconstruction; and extracting the background part of the target mask as a background mask, and performing linear calculation with the reconstructed face-changing image to obtain a high-definition face-changing image. Compared with the prior art, the invention has the advantages of obviously improving the quality of the face-changing image and the like.

Description

High-definition face changing method based on stylized generation countermeasure network
Technical Field
The invention relates to the fields of image processing technology and face attribute editing, in particular to a high-definition face changing method based on a stylized generation countermeasure network.
Background
Face changing aims to convert an identity entity of a source face into a target face, and meanwhile, attribute characteristics (such as expression, gesture, lamplight and the like) of the target face are kept unchanged, so that the face changing method is a long-standing and challenging problem in visual special effects. With the advent of the ultra-high definition video era, the research value and market value of the high definition face-changing method have received high attention in the digital entertainment fields of movies, games, virtual digital people and the like.
However, the current face-changing method still cannot meet the actual industrial requirements. First, the mainstream face-changing method is limited by information compression of the codec, and it is difficult to generate a high-definition face-changing image. Recent studies have realized high definition face-changing using stylized generation countermeasure networks, but these methods do not adequately decouple potential space, resulting in entanglement of the generated face-changing image features. And the training process requires significant computational resources. Secondly, the mainstream face-changing method adopts poisson fusion to stitch and generate a face and a background, so that fusion artifacts are easy to generate. Thus, how to generate a fully decoupled face-changing image with high visual quality remains the current focus of research.
Disclosure of Invention
The invention aims to provide a high-definition face changing method based on a stylized generation countermeasure network, which improves the quality of face changing images.
The aim of the invention can be achieved by the following technical scheme:
a high-definition face changing method based on stylized generation of an countermeasure network comprises the following steps:
acquiring a source face-target face image pair;
inputting the image pairs into a pre-constructed feature extraction network model, outputting a predicted original face-changing image, inputting a target face image into a target mask predictor, and outputting a predicted target mask, wherein the feature extraction network model comprises an identity encoder, an attribute encoder, a mapping network and a StyleGAN2 generator;
inputting the original face-changing image and the target mask into an FS-Net network model to reconstruct the original face-changing image;
and extracting the background part of the target mask as a background mask, and performing linear calculation with the reconstructed original face-changing image to obtain a final high-definition face-changing image.
Further, the construction process of the feature extraction network model specifically includes:
randomly generating a plurality of high-definition face image construction data sets;
preprocessing the high-definition face image in the data set to align the face in the middle;
pairing the faces aligned in pairs to obtain a source face-target face image pair;
the source face-target face image pair is input into a feature extraction network for training to construct a final feature extraction network model, wherein an Adam optimization algorithm is adopted for optimization in the training process.
Further, preprocessing is carried out by adopting an MTCNN face recognition method.
Further, the expression of the loss function of the feature extraction network model is:
L t =λ 1 L s2 L g3 L b4 L c
wherein L is t Representing a loss function of the training feature extraction network model, L s Representing identity loss, L g Is the perceived loss, L b Is the loss of mark point, L c Lambda for reconstruction loss 1 、λ 2 、λ 3 And lambda (lambda) 4 The weight coefficients corresponding to the identity loss, the perception loss, the mark point loss and the reconstruction loss are respectively.
Further, the identity loss is expressed as:
L s =1-cos(Ψ s (Y s-t ),Ψ s (X s ))
wherein L is s Representing identity loss; psi s (Y s-t ) And psi is s (X s ) The pre-training models respectively representing the face recognition network ArcFace extract feature vectors from the predicted original face-changing image and the target image.
Further, the expression of the perceptual loss is:
L g =||Φ g (Y s-t )-Φ g (X t )|| 2
wherein L is g Is a perceived loss, phi g (Y s-t ) And phi is g (X t ) Representing feature vectors extracted from the target image and the predicted original face-change image, respectively, by the pre-trained VGG16 model.
Further, the expression of the marker point loss is:
wherein L is b Is the loss of the marker point,and->Representing feature vectors extracted by a pre-trained marker point predictor from the predicted original face-change image and the target image, respectively.
Further, the expression of the reconstruction loss is:
wherein L is c To reconstruct the loss, Y s-t 、X s And X t Respectively representing a predicted face-change image, a source face image and a target face image.
Further, the specific steps of reconstructing the original face-change image include:
mapping the original face-change image and target mask into a hybrid Gao Weiqian space in the FS-Net network model;
searching a latent code which is matched with the target shade and is similar to the face image in the mixed Gao Weiqian space;
inputting the latent codes into a StyleGAN2 generator to generate a pre-corrected face-changing image which is completely consistent with the attribute characteristics of the target face image, and completing the reconstruction process of the original face-changing image.
Further, the target mask predictor is constructed based on a BiseNet network.
Compared with the prior art, the invention has the following beneficial effects:
(1) According to the method, a feature extraction network model is constructed, and in a feature extraction stage, a feature encoder and a StyleGAN2 generator are used for efficiently generating a decoupled predicted original face-changing image; and the original face-changing image is reconstructed, so that the image details and definition are improved, the wrong face attribute characteristics in the original face-changing image are corrected, the problem of image consistency and the problem of fusion artifact are effectively solved, and the quality of the face-changing image is improved.
(2) In the process of training the feature extraction network model, identity loss, perception loss, marking point loss and reconstruction loss are introduced, identity entity of the images is identical, stability of the training process, consistency of attribute features of the face-changing image and the target image and the like are guaranteed, the features can be extracted more accurately, and accuracy of the face-changing image is improved.
(3) Compared with the existing high-definition face-changing method, the face-changing image identity decoupling degree and the image visual quality generated by the method are higher, and therefore the high-quality development of digital entertainment can be effectively promoted.
Drawings
FIG. 1 is a schematic flow chart of the method of the present invention;
FIG. 2 is an overall frame diagram of an embodiment of the present invention;
FIG. 3 is a network architecture diagram of an FS-Net in accordance with an embodiment of the present invention;
fig. 4 is a high-definition face-changing result chart according to an embodiment of the present invention;
fig. 5 is a graph comparing the high-definition face-changing result provided by the embodiment of the invention with the results of a plurality of existing high-definition face-changing methods.
Detailed Description
The invention will now be described in detail with reference to the drawings and specific examples. The present embodiment is implemented on the premise of the technical scheme of the present invention, and a detailed implementation manner and a specific operation process are given, but the protection scope of the present invention is not limited to the following examples.
The embodiment provides a high-definition face changing method for generating an countermeasure network based on stylization, as shown in fig. 1, the method comprises the following steps:
s1, obtaining a source face-target face image pair.
S2, inputting the image pairs into a pre-constructed feature extraction network model, outputting a predicted original face-changing image, inputting a target face image into a target mask predictor, and outputting a predicted target mask, wherein the feature extraction network model comprises an identity encoder, an attribute encoder, a mapping network and a StyleGAN2 generator.
The step needs to train the feature extraction network model to achieve good extraction effect. The specific construction process is as follows:
s2.1, data set creation and preprocessing.
S2.1.1 generating a pre-training model of an countermeasure network by utilizing stylization, and randomly generating 8 ten thousand high-definition face images, wherein the resolution of each image is 1024 multiplied by 1024;
s2.1.2, pretreatment: centering and aligning the face by using a face recognition algorithm MTCNN (Multi-task Cascaded Convolutional Networks, multi-task cascade convolution network) method;
s2.1.3, pairing face images of the data set in pairs randomly, and marking the face images as a source face image and a target face image respectively to obtain 4 ten thousand pairs of source-target face image pairs;
s2.1.4 the dataset was assembled according to 9: the scale of 1 is divided into a training data set and a test data set.
S2.2, training a feature extraction network.
The step inputs the preprocessed face image data set into a feature extraction network according to the label pair to perform feature extraction network model training, and specifically comprises the following steps:
s2.2.1, scaling the size of the face image in the training set to 256×256;
s2.2.2, inputting the face image into a feature extraction network for training, wherein the frame of the feature extraction network provided by the invention is shown in fig. 2.
The feature extraction network proposed in this embodiment includes an identity encoder, an attribute encoder, a mapping network and a StyleGAN2 generator; the identity encoder uses a pre-trained ArcFace model; the attribute encoder selects a face attribute feature encoder based on the acceptance-v 3; the mapping network consists of four full connection layers and an active layer, responsible for W in z-space g The latent codes are widened and mapped to a w space of the stylized generation countermeasure network; the StyleGAN2 generator uses a pre-trained network model. In the invention, a fixed identity encoder and a StyleGAN2 generator are used in the training process of the feature extraction network, and only an attribute encoder and a mapping network are trained;
s2.2.3, the loss function consists of four loss terms, namely identity loss, perception loss, mark point loss and reconstruction loss; the expression of the loss function is as follows:
L t =λ 1 L s2 L g3 L b4 L c
wherein L is t Representing a loss function of the training feature extraction network, L s Representing identity loss, L g Is the perceived loss, L b Is the loss of mark point, L c Lambda for reconstruction loss 1234 The weight coefficients corresponding to the identity loss, the perception loss, the mark point loss and the reconstruction loss are respectively.
In order to keep identity entities of the original face-changing image and the source image identical, the identity loss L is used in the invention id Identity consistency of the predicted original face-change image and the source image is constrained. L (L) s The detailed expression of (2) is as follows:
L s =1-cos(Ψ s (Y s-t ),Ψ s (X s ))
wherein, ψ is s (Y s-t ) And psi is s (X s ) The pre-training models respectively representing the face recognition network ArcFace extract feature vectors from the predicted original face-change image and the source image.
In order to control the stability of the training process and avoid mode breakdown caused by StyleGAN2 network, the invention uses the perception loss L g The semantic loss of the predicted original face-change image and the target image is constrained. L (L) p The detailed expression of (2) is as follows:
L g =||Φ g (Y s-t )-Φ g (X t )|| 2
in phi, phi g (Y s-t ) And phi is g (X t ) Representing feature vectors extracted from the target image and the predicted original face-change image, respectively, by the pre-trained VGG16 model.
In order to keep the attribute characteristics of the predicted face-changing image and the target image consistent, the invention uses L b The mark point loss function reduces the difference of the facial expression gestures of the predicted face-changing image and the target image. L (L) b The detailed expression of (2) is as follows:
in the method, in the process of the invention,and->Representing features extracted from the pre-trained mark point predictor for predicting the original face-change image and the target image, respectivelySign vector.
In addition, in order to ensure identity entity consistency, if the identity entities of the source image and the target image are the same natural person, the identity entity of the generated predicted original face-change image and the identity entity of the source-target image are both the same natural person, and the invention uses L c To constrain this behavior, the detailed expression is as follows:
s2.2.4, the network model training adopts an Adam optimization algorithm, the learning rate is 0.00001, the batch processing size is 8, and the total iterative training is 54000 times.
Inputting the source face-target face image pair of the test set into a trained feature extraction network model, outputting a predicted original face-changing image, and feeding the target face image in the test set into a target mask predictor based on a BiseNet network to obtain a predicted target mask. Through the training and testing process, the feature extraction network model with good prediction results is obtained.
S3, inputting the original face-changing image and the target mask into an FS-Net network model to reconstruct the original face-changing image.
S3.1, the original face image and the target mask are input into an FS-Net network model to be mapped into a mixed Gao Weiqian space, and the FS-Net network structure is as shown in FIG. 3:
the FS-Net used in the invention replaces the first 7 layers of w+ space with StyleGAN2 generator 32×32 layers style block output vector F to form FS= (F, S), whereinS then represents the last 11 layers of w+ space, i.e
S3.2, searching a latent code FS which is matched with the predicted target mask and is similar to the predicted face image in the mixed Gao Weiqian space g
S3.3, latent code FS g Feeding a pre-trained StyleGAN2 generator to generate a pre-corrected face-changing image which is completely consistent with the attribute characteristics of the target face image.
And S4, extracting the background part of the prediction target mask as a background mask, and performing linear calculation with the reconstructed original face-changing image to obtain a final high-definition face-changing image.
The method specifically comprises the following steps:
s4.1, extracting a background part of the prediction target mask generated in the step as a background mask;
and S4.2, performing linear calculation aiming at corrosion and mixing of the background mask and the pre-correction face-changing image output in the step 3, and correcting the distorted background generated by the StyleGAN2 generator to obtain a decoupled and high-definition face-changing image.
In the embodiment, a part of images are selected from a public high-definition face image data set CelebA-HQ and a self-built face high-definition data set to verify the performance of the method. The method of the invention is adopted to generate the high-definition face-changing image from the random source-target face image, and the high-definition face-changing image is respectively matched with the most advanced high-definition face-changing method: megaFS was compared with Hires. The high-definition face-changing effect of the method is shown in fig. 4, the predicted original face-changing image only has face features meeting the face-changing requirement, the image background, lamplight, character hair, clothes and other attribute features still have a larger gap from the target image, the problem of attribute feature errors is basically solved by the pre-correction face-changing image obtained through image reconstruction, but the image background and the target image still have feature entanglement, and the high-definition face-changing image is obtained by linearly calculating the corrected background. Fig. 5 is a comparison of the method of the present invention with the prior most advanced high definition face-changing method. Compared with other high-definition face-changing methods, the method has the advantages that the decoupling degree of the identity and the attribute characteristics is better, the restoration of the facial features and the expressions is more sufficient, the processing of complex lamplight and shielding is more excellent, the definition of the generated final face-changing image is stronger, and the skin texture details of the generated face are reserved.
The above functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
It will be appreciated by those skilled in the art that embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein. The scheme in the embodiment of the invention can be realized by adopting various computer languages, such as object-oriented programming language Java, an transliteration script language JavaScript and the like.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. It is therefore intended that the following claims be interpreted as including the preferred embodiments and all such alterations and modifications as fall within the scope of the invention.
It will be apparent to those skilled in the art that various modifications and variations can be made to the present invention without departing from the spirit or scope of the invention. Thus, it is intended that the present invention also include such modifications and alterations insofar as they come within the scope of the appended claims or the equivalents thereof.

Claims (10)

1. The high-definition face changing method for generating the countermeasure network based on stylization is characterized by comprising the following steps of:
acquiring a source face-target face image pair;
inputting the image pairs into a pre-constructed feature extraction network model, outputting a predicted original face-changing image, inputting a target face image into a target mask predictor, and outputting a predicted target mask, wherein the feature extraction network model comprises an identity encoder, an attribute encoder, a mapping network and a StyleGAN2 generator;
inputting the original face-changing image and the target mask into an FS-Net network model to reconstruct the original face-changing image;
and extracting the background part of the target mask as a background mask, and performing linear calculation with the reconstructed original face-changing image to obtain a final high-definition face-changing image.
2. The high-definition face-changing method based on stylized generation of an countermeasure network according to claim 1, wherein the construction process of the feature extraction network model specifically includes:
randomly generating a plurality of high-definition face image construction data sets;
preprocessing the high-definition face image in the data set to align the face in the middle;
pairing the faces aligned in pairs to obtain a source face-target face image pair;
the source face-target face image pair is input into a feature extraction network for training to construct a final feature extraction network model, wherein an Adam optimization algorithm is adopted for optimization in the training process.
3. The high-definition face-changing method based on stylized generation of the countermeasure network according to claim 2, wherein the MTCNN face recognition method is adopted for preprocessing.
4. The high-definition face-changing method based on stylized generation of an countermeasure network according to claim 1, wherein the expression of the loss function of the feature extraction network model is:
L t =λ 1 L s2 L g3 L b4 L c
wherein L is t Representing a loss function of the training feature extraction network model, L s Representing identity loss, L g Is the perceived loss, L b Is a markPoint loss, L c Lambda for reconstruction loss 1 、λ 2 、λ 3 And lambda (lambda) 4 The weight coefficients corresponding to the identity loss, the perception loss, the mark point loss and the reconstruction loss are respectively.
5. The stylized generation of high definition face changing method of claim 4 wherein the identity loss is expressed as:
L s =1-cos(Ψ s (Y s-t ),Ψ s (X s ))
wherein L is s Representing identity loss; psi s (Y s-t ) And psi is s (X s ) The pre-training models respectively representing the face recognition network ArcFace extract feature vectors from the predicted original face-changing image and the target image.
6. The method for generating high-definition face changes based on a stylized generation countermeasure network of claim 4, wherein the expression of perceived loss is:
L g =||Φ g (Y s-t )-Φ g (X t )|| 2
wherein L is g Is a perceived loss, phi g (Y s-t ) And phi is g (X t ) Representing feature vectors extracted from the target image and the predicted original face-change image, respectively, by the pre-trained VGG16 model.
7. The stylized generation of high definition face changing method of claim 4 wherein the expression of the marker point loss is:
wherein L is b Is the loss of the marker point,and->Representing feature vectors extracted by a pre-trained marker point predictor from the predicted original face-change image and the target image, respectively.
8. The method for generating high-definition face changes based on a stylized generation countermeasure network of claim 4, wherein the expression of the reconstruction loss is:
wherein L is c To reconstruct the loss, Y s-t 、X s And X t Respectively representing a predicted face-change image, a source face image and a target face image.
9. A method of generating high definition face changes based on a stylized generation countermeasure network as claimed in claim 1, wherein said specific step of reconstructing said original face change image comprises:
mapping the original face-change image and target mask into a hybrid Gao Weiqian space in the FS-Net network model;
searching a latent code which is matched with the target shade and is similar to the face image in the mixed Gao Weiqian space;
inputting the latent codes into a StyleGAN2 generator to generate a pre-corrected face-changing image which is completely consistent with the attribute characteristics of the target face image, and completing the reconstruction process of the original face-changing image.
10. The high definition face-changing method based on stylized generation of an countermeasure network of claim 1, wherein the object mask predictor is constructed based on a BiseNet network.
CN202311280255.3A 2023-09-28 2023-09-28 High-definition face changing method based on stylized generation countermeasure network Pending CN117391929A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311280255.3A CN117391929A (en) 2023-09-28 2023-09-28 High-definition face changing method based on stylized generation countermeasure network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311280255.3A CN117391929A (en) 2023-09-28 2023-09-28 High-definition face changing method based on stylized generation countermeasure network

Publications (1)

Publication Number Publication Date
CN117391929A true CN117391929A (en) 2024-01-12

Family

ID=89464042

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311280255.3A Pending CN117391929A (en) 2023-09-28 2023-09-28 High-definition face changing method based on stylized generation countermeasure network

Country Status (1)

Country Link
CN (1) CN117391929A (en)

Similar Documents

Publication Publication Date Title
Guo et al. Image inpainting via conditional texture and structure dual generation
Guo et al. Auto-embedding generative adversarial networks for high resolution image synthesis
CN113658051B (en) Image defogging method and system based on cyclic generation countermeasure network
Zeng et al. Coupled deep autoencoder for single image super-resolution
Sun et al. Lightweight image super-resolution via weighted multi-scale residual network
Wallace et al. Few-shot generalization for single-image 3d reconstruction via priors
CN111091493B (en) Image translation model training method, image translation method and device and electronic equipment
Li et al. SwapInpaint: Identity-specific face inpainting with identity swapping
Sun et al. Masked lip-sync prediction by audio-visual contextual exploitation in transformers
Li et al. Lightweight single image super-resolution with dense connection distillation network
Xu et al. Unsupervised deep background matting using deep matte prior
Liu et al. Facial image inpainting using multi-level generative network
Hu et al. Humanliff: Layer-wise 3d human generation with diffusion model
Junayed et al. Consistent video inpainting using axial attention-based style transformer
CN117391929A (en) High-definition face changing method based on stylized generation countermeasure network
Liu et al. Facial landmark detection using generative adversarial network combined with autoencoder for occlusion
Huang et al. CLSR: cross-layer interaction pyramid super-resolution network
CN116264606A (en) Method, apparatus and computer program product for processing video
Muqeet et al. Video Face Re-Aging: Toward Temporally Consistent Face Re-Aging
Liu et al. One-stage inpainting with bilateral attention and pyramid filling block
Wang et al. A novel multi-scale architecture driven by decoupled semantic attention transfer for person image generation
Zhang et al. Pluralistic face inpainting with transformation of attribute information
Xu et al. Joint learning of super-resolution and perceptual image enhancement for single image
Yang et al. Shapeediter: a stylegan encoder for face swapping
Yan et al. Optimized single-image super-resolution reconstruction: A multimodal approach based on reversible guidance and cyclical knowledge distillation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination