CN115527258A - Face exchange method based on identity information response - Google Patents

Face exchange method based on identity information response Download PDF

Info

Publication number
CN115527258A
CN115527258A CN202211224223.7A CN202211224223A CN115527258A CN 115527258 A CN115527258 A CN 115527258A CN 202211224223 A CN202211224223 A CN 202211224223A CN 115527258 A CN115527258 A CN 115527258A
Authority
CN
China
Prior art keywords
identity
feature
code
face
texture
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211224223.7A
Other languages
Chinese (zh)
Inventor
杨嘉琛
李新锋
程晨
肖帅
温家宝
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianjin University
Original Assignee
Tianjin University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianjin University filed Critical Tianjin University
Priority to CN202211224223.7A priority Critical patent/CN115527258A/en
Publication of CN115527258A publication Critical patent/CN115527258A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/54Extraction of image or video features relating to texture
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Artificial Intelligence (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Human Computer Interaction (AREA)
  • Biomedical Technology (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Biophysics (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Collating Specific Patterns (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)
  • Image Analysis (AREA)

Abstract

At present, styleGAN is used in a face changing task to ensure the quality and definition of pictures, and how to use the latent space code of StyleGAN to change faces so as to save a large amount of resources for face changing operation is a problem to be solved at present. The invention relates to a face changing technical method. In the aspect of GAN inversion, we propose a multi-scale feature pyramid hybrid encoder. Meanwhile, the texture and structural features of the face image are extracted efficiently. After GAN inversion, identity characteristic decoupling is carried out on the obtained potential space codes, the invention provides that identity attribute exchange is carried out after characteristic layer screening is carried out through identity characteristic response, and face changing operation is completed on the premise of keeping other attributes unchanged. The method is verified through an experimental platform. The invention can be widely applied to face changing technology in engineering.

Description

Face exchange method based on identity information response
Technical Field
The invention relates to a face image editing method, in particular to a face exchange method based on identity information response.
Background
People have more face editing methods, and have higher requirements on the definition of an edited picture and the accuracy of an editing technology. In this case, styleGAN, which can generate a large number of clear face images, stands out. A lot of attribute editing work is done on the basis of StyleGAN. The latent spatial encoding of the StyleGAN itself relies on the pixel scale for feature decoupling. Editing its underlying code may well accomplish some image property editing tasks. However, in the face-change task, the StyleGAN inversion works in a much more space-enhancing way, and its underlying spatial encoding cannot be used directly for the exchange. How to use StyleGAN in the face changing task to save a lot of resources for face changing operation is a problem to be solved at present.
Disclosure of Invention
Aiming at the defects in the prior art, the invention provides a face exchange method based on identity information response. In order to solve the technical problems, the invention adopts the following technical scheme:
an identity information response face exchange method based on StyleGAN comprises the following steps:
(1) Establishing a human face image generation model based on StyleGAN;
(2) Constructing a StyleGAN inversion model CTSNet, innovatively providing an inversion model combining a Transformer and a convolution for achieving a better image inversion effect, combining the advantages of a convolution network and a visual Transformer, designing a multi-scale encoder, capturing structural features of low pixel scale by the visual Transformer, and extracting detailed features such as texture, color and the like by the convolution network;
(3) Inputting the target face image and the background face image into the CTSNet to respectively obtain the potential space codes C of the corresponding images id And C back
(4) An identity characteristic response switching network is constructed, and identity characteristics of a potential space are directly extracted and exchanged, so that the method is compatible with other inversion methods;
(5) Latent spatial coding C of input target face image and background face image id And C back The identity attribute is exchanged to an identity characteristic response exchange network to obtain a potential space code C of the face-changed image mix
(6) Inputting a potential space code of the face changing image to a StyleGAN-based human face image generation model to obtain a target image;
the invention has the advantages and positive effects that:
(1) The invention designs a multi-scale encoder by combining the advantages of a convolution network and a visual Transformer, and can obtain better effect in the GAN image inversion field.
(2) The method of the invention provides the identity characteristic response switching network, can directly extract the identity information in the potential space code for switching, and has excellent effect.
(3) The inversion model and the identity exchange module of the method are mutually decoupled and compatible with other methods, and can be widely applied to the field of face image editing.
(4) The method is verified through a large number of experiments, and the reliability of the method is effectively improved.
Drawings
FIG. 1 is an overall flow diagram in an embodiment of the present invention;
FIG. 2 is a graph of face change results in accordance with an embodiment of the present invention;
Detailed Description
This example shows a specific implementation method of the present invention by taking two face images as an example.
In order to make the purpose and technical scheme of the invention more clear, the following detailed description of the specific implementation steps of the invention is provided with the accompanying drawings.
Referring to fig. 1, there is shown an overall flow chart of an embodiment of the present invention, described in detail below;
1. inverse network, we propose a multi-scale feature pyramid hybrid encoder. In the formation process of StyleGAN, the input image of the inversion network needs to be clipped to the nth power of 2, such as 256 × 256, and the like, and the maximum is 1024 × 1024.
After clipping, normalization is performed, and then input to the network. The whole network is divided into a coding and mapping module, the high-level network just input is a convolutional coding module, and the bottom level is a Transformer coding module. Taking a single image as an example, the dimensions of the input image layer by layer are (1, the number of feature layers, 4), respectively. Some low-level feature information is generated at a low pixel scale, such as 4 × 4,8 × 8 and the like, at this pixel scale, the information that the network can extract is mainly structural information, and the network can capture information such as texture and color of details along with the increase of the pixel scale. The structural design of StyleGAN is inspired by ProGAN, and the partial attention of the convolution network is better exerted. The visual Transformer is proved to have better characteristic extraction capability in the aspect of the overall structure of the object than the convolutional network, and the Transformer plays a better role in the low-scale structural information. Therefore, a multi-scale encoder is designed by combining the advantages of the convolutional network and the visual Transformer, the visual Transformer is used for capturing the structural features with low pixel scale, and the convolutional network is used for extracting the detailed features such as texture, color and the like. The inversion encoder group is composed of two parts, the first part is a texture feature extractor, and the second part is a structural feature extractor. Texture extractor, layers 16-17 in StyleGAN correspond to 1024 x 1024 images, i.e., 14-15 to 512 x 512 and so on until 0-1 to 4 x 4. Generally, feature layers containing detail information such as texture information are mainly concentrated in 12-17 layers, wherein the higher the layer number is, the more certain the feature layers contain the texture information, because the local operation of convolution includes a small area, the features are more approximate to the texture features, the 12-13 layers are relatively fuzzy, the internal structure information is still much, and through related experiments, the final determination is that 12-17 layers are extracted by a texture encoder, but 256 × 256 images corresponding to 12-13 layers are still input into a vision transformer of the encoding structure features. The texture feature encoder is a pure convolution structure, corresponds to StyleGA N, firstly obtains down-sampled images of input images I with different sizes, down-samples the down-sampled images to 256 × 256, namely I25, and inputs the original images into a texture feature extractor. The downsampled map I256 is input to the structural feature extractor to encode the 0-11 layer features. And then, carrying out feature mapping after obtaining all the hierarchy information, wherein the features of each scale are mapped into different styles according to the size of the input image, namely (1, 2 (n-1), 512).
2. Obtaining inversion vectors (1, 2 (n-1), 512) with different sizes according to the size of the input image, and obtaining corresponding inversion characteristics C id And C back
3. And identity decoupling, namely inputting the obtained inversion features into a face information feature extraction switching network, wherein the face identity decoupling network mainly comprises two parts, namely identity feature coding on the first part and background feature coding on the other part, namely irrelevant attributes. The second part is identity feature mixing, and identity attribute transformation is carried out through an adain method. Firstly, selecting a characteristic layer which has the largest influence on identity information, wherein the adopted method is an identity characteristic response method: firstly, 200 face images img X are inverted on the basis of a trained inversion encoder, and a corresponding late Code is obtained and is recorded as Code X. In order to find a characteristic layer which has the largest influence on identity information, the characteristic layer of Code X is replaced, and for any Code X, 400 face images are randomly extracted from a data set and inverted to obtain Code Y. For one Code X and one Code Y, 18 feature fused late codes can be obtained, namely, each layer of the Code X is replaced by the Code Y, the obtained mixed Code is input into StyleGAN for image generation, and identity cosine similarity is calculated for the generated images i mg MIX and imgX, so that the feature layer with the largest influence on identity information is judged. 8-10 layers screened in the invention are subjected to decoupling training.
4. Inputting the characteristics (1, 3, 512) of the identity decoupling network by taking a single image as an example, inputting 8-10 layers of corresponding codes into the identity decoupling network respectively, outputting (1, 3, 512) of the network as 8-10 layers after identity decoupling, and then directly replacing 5-7 layers to obtain the final potential spatial code Cmix, wherein the size of the potential spatial code Cmix is (1, 18, 512).
5. The final potential spatial code is input into the StyleGAN generator to obtain the corresponding image. Fig. 2 is a graph showing the face change result.
The above embodiments are merely preferred embodiments, and the description of the embodiments is used to help understanding the method and the core idea of the present invention, and modifications and equivalents of some of the technical solutions described in the foregoing embodiments are included in the scope of the present invention.

Claims (3)

1. A face exchange method based on identity information response is characterized by comprising the following steps:
(1) Establishing a human face image generation model based on StyleGAN;
(2) Constructing a StyleGAN inversion model CTSNet, innovatively providing an inversion model combining a Transformer and a convolution for achieving a better image inversion effect, combining the advantages of a convolution network and a visual Transformer, designing a multi-scale encoder, capturing structural features of low pixel scale by the visual Transformer, and extracting detailed features such as texture, color and the like by the convolution network;
(3) Inputting the target face image and the background face image into the CTSNet to respectively obtain the potential space codes C of the corresponding images id And C back
(4) An identity characteristic response switching network is constructed, and identity characteristics of a potential space are directly extracted and exchanged, so that the method is compatible with other inversion methods;
(5) Latent spatial coding C of input target face image and background face image id And C back The identity attribute is exchanged to an identity characteristic response exchange network to obtain a potential space code C of the face-changed image mix
(6) And inputting the face changing image latent space code to a StyleGAN-based human face image generation model to obtain a target image.
2. A face exchange method based on identity information response as claimed in claim 1, characterized in that: the inversion model in the step (2):
the inversion encoder group is composed of two parts, the first part is a texture feature extractor, and the second part is a structural feature extractor. First we look at the texture extractor, in StyleGAN layers 16-17 correspond to 1024 x 1024 images, i.e., 14-15 to 512 x 512 and so on until 0-1 to 4 x 4. Generally, feature layers containing detail information such as texture information are mainly concentrated in 12-17 layers, wherein the higher the layer number is, the more certain the feature layers contain the texture information, because the local operation of convolution includes a small area, the features are more approximate to the texture features, the 12-13 layers are relatively fuzzy, the internal structure information is still much, and through related experiments, the final determination is that 12-17 layers are extracted by a texture encoder, but 256 × 256 images corresponding to 12-13 layers are still input into a vision transformer of the encoding structure features.
Firstly, a texture feature encoder is used, the group is a pure convolution structure and corresponds to StyleGAN, and for input images I with different sizes, firstly, down-sampled images are obtained and down-sampled to 256 × 256I256, and the original images are input into a texture feature extractor. The downsampled map I256 is input to the structural feature extractor to encode the 0-11 layer features.
3. The identity response switching network method of claim 1, wherein:
in order to reduce the decoupling difficulty of the identity information, the method firstly carries out W + feature screening to select a feature layer which has the largest influence on the identity information, and the adopted method is an identity feature response method: firstly, 200 face images img X are inverted on the basis of a trained inversion encoder, and a corresponding late Code is obtained and is recorded as Code X. In order to find a characteristic layer which has the largest influence on identity information, the characteristic layer of Code X is replaced, and for any Code X, 400 face images are randomly extracted from a data set and inverted to obtain Code Y. For one Code X and one Code Y, 18 feature-fused late codes can be obtained, namely, each layer of the Code X is replaced by the Code Y, the obtained mixed Code is input into StyleGAN for image generation, and identity cosine similarity is calculated for the generated images img MIX and imgX so as to judge the feature layer with the largest influence on identity information.
The human face identity decoupling network mainly comprises two parts, wherein the first part is identity feature coding, and the other part is background feature coding, namely irrelevant attributes. The second part is identity feature mixing, and identity attribute transformation is carried out through an adain method.
CN202211224223.7A 2022-09-30 2022-09-30 Face exchange method based on identity information response Pending CN115527258A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211224223.7A CN115527258A (en) 2022-09-30 2022-09-30 Face exchange method based on identity information response

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211224223.7A CN115527258A (en) 2022-09-30 2022-09-30 Face exchange method based on identity information response

Publications (1)

Publication Number Publication Date
CN115527258A true CN115527258A (en) 2022-12-27

Family

ID=84701096

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211224223.7A Pending CN115527258A (en) 2022-09-30 2022-09-30 Face exchange method based on identity information response

Country Status (1)

Country Link
CN (1) CN115527258A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117095136A (en) * 2023-10-19 2023-11-21 中国科学技术大学 Multi-object and multi-attribute image reconstruction and editing method based on 3D GAN

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117095136A (en) * 2023-10-19 2023-11-21 中国科学技术大学 Multi-object and multi-attribute image reconstruction and editing method based on 3D GAN
CN117095136B (en) * 2023-10-19 2024-03-29 中国科学技术大学 Multi-object and multi-attribute image reconstruction and editing method based on 3D GAN

Similar Documents

Publication Publication Date Title
Lim et al. DSLR: Deep stacked Laplacian restorer for low-light image enhancement
CN109671023B (en) Face image super-resolution secondary reconstruction method
Chen et al. Cross aggregation transformer for image restoration
Yi et al. Efficient and accurate multi-scale topological network for single image dehazing
CN111210388B (en) Mosaic face image super-resolution reconstruction method based on generation countermeasure network
CN113837946B (en) Lightweight image super-resolution reconstruction method based on progressive distillation network
CN112884758B (en) Defect insulator sample generation method and system based on style migration method
CN109903373A (en) A kind of high quality human face generating method based on multiple dimensioned residual error network
CN112381716A (en) Image enhancement method based on generation type countermeasure network
CN115527258A (en) Face exchange method based on identity information response
Jiang et al. Lightweight dual-stream residual network for single image super-resolution
CN116091288A (en) Diffusion model-based image steganography method
Zheng et al. La-net: Layout-aware dense network for monocular depth estimation
Jiang et al. Mutual retinex: Combining transformer and cnn for image enhancement
Dai et al. CFGN: A Lightweight Context Feature Guided Network for Image Super-Resolution
CN114529450B (en) Face image super-resolution method based on improved depth iteration cooperative network
CN116188652A (en) Face gray image coloring method based on double-scale circulation generation countermeasure
Zhao et al. SVCNet: Scribble-based Video Colorization Network with Temporal Aggregation
Shen et al. Itsrn++: Stronger and better implicit transformer network for continuous screen content image super-resolution
Kang et al. Lightweight Image Matting via Efficient Non-Local Guidance
Jiang et al. Mask‐guided image person removal with data synthesis
Jia et al. Learning Rich Information for Quad Bayer Remosaicing and Denoising
Zeng et al. Swin-CasUNet: cascaded U-Net with Swin Transformer for masked face restoration
Ji et al. Information-Growth Swin transformer Network for Image Super-Resolution
Ya et al. Context feature guided network for image super-resolution

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination