CN112991484A - Intelligent face editing method and device, storage medium and equipment - Google Patents

Intelligent face editing method and device, storage medium and equipment Download PDF

Info

Publication number
CN112991484A
CN112991484A CN202110466411.XA CN202110466411A CN112991484A CN 112991484 A CN112991484 A CN 112991484A CN 202110466411 A CN202110466411 A CN 202110466411A CN 112991484 A CN112991484 A CN 112991484A
Authority
CN
China
Prior art keywords
image
geometric
face
appearance
local
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110466411.XA
Other languages
Chinese (zh)
Other versions
CN112991484B (en
Inventor
高林
陈姝宇
刘锋林
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute Of Digital Economy Industry Institute Of Computing Technology Chinese Academy Of Sciences
Original Assignee
Institute Of Digital Economy Industry Institute Of Computing Technology Chinese Academy Of Sciences
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute Of Digital Economy Industry Institute Of Computing Technology Chinese Academy Of Sciences filed Critical Institute Of Digital Economy Industry Institute Of Computing Technology Chinese Academy Of Sciences
Priority to CN202110466411.XA priority Critical patent/CN112991484B/en
Publication of CN112991484A publication Critical patent/CN112991484A/en
Application granted granted Critical
Publication of CN112991484B publication Critical patent/CN112991484B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T11/002D [Two Dimensional] image generation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/50Image enhancement or restoration using two or more images, e.g. averaging or subtraction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20212Image combination
    • G06T2207/20221Image fusion; Image merging
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person
    • G06T2207/30201Face

Landscapes

  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

The invention relates to an intelligent face editing method, an intelligent face editing device, a storage medium and intelligent face editing equipment. It is suitable for the fields of computer vision and computer graphics. An intelligent face editing method, an intelligent face editing device, a storage medium and equipment are provided. The technical scheme adopted by the invention is as follows: an intelligent face editing method is characterized in that: inputting the geometric feature image and the appearance feature image of the face into corresponding trained local decoupling modules according to the face parts, and extracting the corresponding geometric features and appearance features of each part of the face; the local decoupling module generates local images and local intermediate feature images corresponding to all parts of the human face based on the geometric features and the appearance features; and fusing the local intermediate feature images corresponding to all parts of the human face through the trained global fusion module to generate a final human face image with the geometric features and the appearance features.

Description

Intelligent face editing method and device, storage medium and equipment
Technical Field
The invention relates to an intelligent face editing method, an intelligent face editing device, a storage medium and intelligent face editing equipment. It is suitable for the fields of computer vision and computer graphics.
Background
The face synthesis technology is one of the important subjects in the field of digital image processing, and there are many related technologies around high-quality face synthesis. The face synthesis technology based on deep learning mainly includes the following two types: synthesizing a new face by Gaussian sampling and using a generative confrontation network (GAN); and (3) using a condition generating type confrontation network, inputting information such as a semantic tag graph, a sketch and an attribute label, and synthesizing a corresponding face. Although there are many techniques for synthesizing a real face by sketch, most of them cannot control the appearance of the generated face or the effect of face synthesis is poor.
For face editing, some prior art techniques use tagged face attribute tag data to decouple the hidden space of the GAN. And editing the attributes of the human face by operating the projection codes of the hidden space. However, these techniques can only edit a specific attribute, and cannot modify contents other than the attribute tag, and thus the degree of freedom is low. Some techniques use semantic markup to edit faces, but because of the lack of geometric information in their input, cannot edit geometric information of the face such as wrinkles, hair style trends, etc. Some technologies use sketches to edit faces, but the technologies are based on image completion technology and have more limitations.
Portenier et al, published in 2018 in ACM Transactions on Graphics, "facial: Deep Sketch-Based Face Image Editing," propose a Sketch-Based Face Editing system, which edits a Face using a masking mark, a Sketch, and a color stroke. Jo et al, 2019, "SC-FEGAN, Face Editing genetic adaptive Network With User's Sketch and color," published by Proceedings of the IEEE/CVF International Conference on Computer Vision, used style loss to generate higher quality, more robust results. However, the prior art cannot edit the overall appearance of the face, and cannot generate a real and vivid face when a pure sketch is used as input.
Disclosure of Invention
The technical problem to be solved by the invention is as follows: aiming at the existing problems, an intelligent face editing method, an intelligent face editing device, a storage medium and intelligent face editing equipment are provided.
The technical scheme adopted by the invention is as follows: an intelligent face editing method is characterized in that:
inputting the geometric feature image and the appearance feature image of the face into corresponding trained local decoupling modules according to the face parts, and extracting the corresponding geometric features and appearance features of each part of the face;
the local decoupling module generates local images and local intermediate feature images corresponding to all parts of the human face based on the geometric features and the appearance features;
and fusing the local intermediate feature images corresponding to all parts of the human face through the trained global fusion module to generate a final human face image with the geometric features and the appearance features.
The geometric feature image is a draft of a human face or a real image of the human face, and the local decoupling module comprises a draft encoder
Figure 108685DEST_PATH_IMAGE001
Image encoder
Figure 760246DEST_PATH_IMAGE002
Appearance encoder
Figure 274404DEST_PATH_IMAGE003
And image composition generator
Figure 669613DEST_PATH_IMAGE004
The extracting of the geometric features corresponding to each part of the human face from the geometric feature image includes:
training cursive chart encoder
Figure 218406DEST_PATH_IMAGE005
Sketch encoder
Figure 40869DEST_PATH_IMAGE006
The hidden space in the bottleneck layer is of dimension of
Figure 42323DEST_PATH_IMAGE007
Of low resolution, wherein
Figure 241223DEST_PATH_IMAGE008
Figure 893790DEST_PATH_IMAGE009
And
Figure 887154DEST_PATH_IMAGE010
the height, width and channel number of the geometric feature map;
order to
Figure 375904DEST_PATH_IMAGE011
Represents a true image of a human face, and
Figure 112916DEST_PATH_IMAGE012
representing the corresponding face sketch;
sketch encoder through pre-training
Figure 370722DEST_PATH_IMAGE013
Extracting
Figure 534987DEST_PATH_IMAGE012
The geometric feature of (A) is expressed as
Figure 776612DEST_PATH_IMAGE014
Training image encoder
Figure 786157DEST_PATH_IMAGE002
Corresponding human face real image
Figure 429628DEST_PATH_IMAGE011
A geometric hidden space mapped to the sketch representation, denoted as
Figure 499215DEST_PATH_IMAGE015
When in use
Figure 228136DEST_PATH_IMAGE016
And
Figure 41372DEST_PATH_IMAGE017
input pre-trained decoder
Figure 539349DEST_PATH_IMAGE018
Time, algorithm pair
Figure 45417DEST_PATH_IMAGE018
Each layer of (a) imposes constraints while also at the final output
Figure 996055DEST_PATH_IMAGE019
And
Figure 862249DEST_PATH_IMAGE020
with the addition of L1 losses.
The geometric feature extraction training of the local decoupling module comprises the following steps:
first, train
Figure 949154DEST_PATH_IMAGE005
And
Figure 891702DEST_PATH_IMAGE006
learning the geometric hidden space of the sketch using the L1 reconstruction loss function, once
Figure 64057DEST_PATH_IMAGE005
After training, the geometric characteristics are expressed as
Figure 484674DEST_PATH_IMAGE021
The network is then trained
Figure 691665DEST_PATH_IMAGE002
From the real image of the human face
Figure 805114DEST_PATH_IMAGE011
As input and predict geometric characteristics
Figure 464766DEST_PATH_IMAGE022
So that it follows the same distribution as the learned geometry space, the loss function is defined as follows:
Figure 689074DEST_PATH_IMAGE024
wherein the content of the first and second substances,Nis a decoder
Figure 484991DEST_PATH_IMAGE006
Index 0 corresponds to the input feature map, indexNThe other indices are intermediate feature maps corresponding to the output image.
The appearance encoder
Figure 503763DEST_PATH_IMAGE003
And eliminating spatial information on the face appearance characteristic image and extracting appearance characteristics irrelevant to the geometric characteristics by utilizing global average pooling.
The appearance encoder and the image synthesis generator adopt an exchange training strategy;
using geometric images of human faces
Figure 916290DEST_PATH_IMAGE025
Geometric characteristics of
Figure 678709DEST_PATH_IMAGE026
And human face appearance characteristic image
Figure 860292DEST_PATH_IMAGE027
Appearance characteristics of
Figure 33653DEST_PATH_IMAGE028
Generating an image
Figure 933476DEST_PATH_IMAGE029
I.e. by
Figure 234007DEST_PATH_IMAGE030
Use of
Figure 535676DEST_PATH_IMAGE029
The appearance characteristics of
Figure 630671DEST_PATH_IMAGE027
Geometric features of (1), cyclically reconstructing the image
Figure 17790DEST_PATH_IMAGE027
I.e. by
Figure 122012DEST_PATH_IMAGE031
Training a local decoupling module by adopting the following loss functions, comprising:
a. self-weight building loss:
Figure 12608DEST_PATH_IMAGE033
wherein:
Figure 544083DEST_PATH_IMAGE034
representing a loss of perception;
Figure 356181DEST_PATH_IMAGE035
a loss of feature matching representing a discriminator;
Figure 513362DEST_PATH_IMAGE036
expressing color loss, converting the image to CIE-Lab color space, and controlling hue by calculating chroma distance in a and b channels; the a and b channels contain color information in the CIE LAB color space;
Figure 258464DEST_PATH_IMAGE037
Figure 226420DEST_PATH_IMAGE038
Figure 56973DEST_PATH_IMAGE039
setting specific parameters according to experience;
b. loss of cyclic exchange:
Figure 768577DEST_PATH_IMAGE040
Figure 368185DEST_PATH_IMAGE041
Figure 241464DEST_PATH_IMAGE042
wherein
Figure 824892DEST_PATH_IMAGE043
Figure 809028DEST_PATH_IMAGE044
Setting specific parameters according to experience;
c. the resistance loss:
the distribution of the generated image is limited using a multi-scale discriminator D to match the distribution of the real image:
Figure 794302DEST_PATH_IMAGE045
Figure 838481DEST_PATH_IMAGE046
weighted value
Figure 909205DEST_PATH_IMAGE047
Figure 697033DEST_PATH_IMAGE048
An intelligent face editing device, comprising:
the characteristic extraction unit is used for inputting the human face geometric characteristic image and the human face appearance characteristic image into corresponding trained local decoupling modules according to human face parts and extracting the geometric characteristics and appearance characteristics corresponding to all parts of the human face;
the image generation unit is used for generating local images and local intermediate feature images corresponding to all parts of the human face by the local decoupling module based on the geometric features and the appearance features;
and the image fusion unit is used for fusing the local intermediate feature images corresponding to all parts of the human face through the trained global fusion module to generate a final human face image with the geometric features and the appearance features.
A storage medium having stored thereon a computer program executable by a processor, the computer program comprising: the computer program, when executed, implements the steps of the intelligent face editing method.
A computer device having a memory and a processor, the memory having stored thereon a computer program executable by the processor, the computer program comprising: the computer program, when executed, implements the steps of the intelligent face editing method.
The invention has the beneficial effects that: the invention relates to a face synthesis and editing technology based on a sketch, wherein the geometric information represented by the sketch is rich, the geometric details of a face can be controlled, and the face synthesis and editing technology is more flexible compared with the prior art. Meanwhile, the decoupling technology can replace the appearance corresponding to the human face and edit the information such as skin color, hair color and the like.
The invention divides the human face into five parts of a left eye, a right eye, a nose, a mouth and a background, combines a local decoupling module and a global fusion module, can respectively edit local geometric and appearance characteristics, and has higher quality of a synthetic result at a local detail position.
The local decoupling module of the invention codes the image and the sketch into the same space, thereby ensuring the decoupling of information. In the training process, the geometric and appearance of different images are combined by using exchange operation to generate an intermediate result, and the result is respectively subjected to geometric and appearance constraints, so that geometric information can be extracted from a human face or a real image and combined with the appearance of other images to generate a new local synthesis result.
According to the invention, a local decoupling module generates an intermediate feature map, a global fusion module splices the intermediate feature map according to a fixed position, and then a down-sampling network, a residual error network and an up-sampling network are used, and the result of block splicing is fused through a discriminator in the GAN and a related loss function optimization network. And splicing the intermediate feature maps synthesized by the local decoupling modules to generate the human face with high reality sense.
Drawings
Fig. 1 is a network framework structure diagram (showing a structured local-to-global training strategy, and showing a training strategy of exchange and loop reconstruction of a local decoupling module) of the embodiment.
Fig. 2 is a schematic diagram of the structure and training strategy of the geometric encoder in the embodiment (encoding the sketch geometry and the real image into the same potential space, and extracting geometric information from the real image and the sketch).
Fig. 3 shows the face generation results of the geometry and appearance exchange in the example (the first row provides appearance information and the first column provides geometry information).
Fig. 4 shows the partial editing result using the sketch in the embodiment (the input sketch is edited in a serialized manner, and the system generates a corresponding face editing result, and the system provides a high degree of freedom and creativity for the user).
Fig. 5 shows the result of the partial appearance editing in the embodiment (geometric features of the fixed image, which is generated by replacing the appearance reference image of the eyes and mouth).
Fig. 6 shows the result of interpolation of geometry and appearance in the embodiment (the images in the upper left corner and the lower right corner are real images, and the rest are interpolation generation results).
Detailed Description
As shown in fig. 1, the present embodiment provides an intelligent face editing method by using an image decoupling technology and adopting a local-to-global method.
The face is divided into 5 parts in this example: after image blocking is completed, a local decoupling module is designed to extract and generate decoupling characteristics of images of each part; after the generation result of each part is obtained, the blocking results are spliced and fused through the global fusion module, and a face image result with global consistency is obtained.
The network structure of the embodiment comprises 5 local decoupling modules for decoupling geometric and appearance information, and 1 global fusion module realizes the fusion of local features and generates a result with high quality and global consistency. In the process of network training, an exchange strategy is used, and the characteristic constraint with consistent cycle is designed, so that the robustness and the generalization of a network framework are ensured.
The intelligent face editing method in the embodiment specifically comprises the following steps:
1) local decoupling: inputting a human face geometric feature image (the image comprises the geometric features of the human face) and a human face appearance feature image (the image comprises the appearance features of the human face) into a corresponding trained local decoupling module according to the human face part, and extracting the geometric features and the appearance features corresponding to all parts of the human face, wherein the geometric feature image is a human face sketch or a human face real image.
The geometrical characteristics mainly comprise two aspects: 1. shape information such as the shape of the five sense organs, the face shape of a person, the length of hair, and the like; 2. geometric details, i.e. the representation of details of geometric features of a human face, such as wrinkles of the person's face, the trend of the hair, etc.
The appearance characteristics mainly comprise three contents: 1. color information such as color development, skin color, lip color, and the like of a human face; 2. material information, namely the texture of the hair and skin of the human face, such as the smoothness of the skin and the like; 3. the illumination information is information of the influence of the illumination condition on the brightness of the human face, such as the brightness of light, the change of shadow, and the like. In some cases, the effects of the above factors on appearance are mutual, for example, illumination changes may affect the expression of skin color, and appearance characteristics do not make clear division between each of the above factors.
And aiming at each local block, the local decoupling module extracts the geometric and appearance information of the local block and then fuses the geometric and appearance information to generate a local feature map. Therefore, the local decoupling module comprises a geometric encoder and an appearance encoder, which acquire geometric and appearance features, respectively.
1a) Geometric encoder: the sketch is a monochromatic outline of a real image, and geometric information can be extracted. Thus, the self-encoder network extracts the geometric information directly from the input sketch. It is difficult to directly extract geometric features from real images, and when a real face partial image is used as input, an intuitive method for extracting the geometric features is to convert the real image into a sketch by using a pre-trained image-to-sketch conversion network, and then apply the generated sketch to a geometric encoder of the sketch.
To simplify the network, this embodiment proposes a unified method for extracting geometric information from the sketch and the real image, which is achieved by training two autoencoders, the sketch encoder
Figure 536813DEST_PATH_IMAGE005
And an image encoder
Figure 1161DEST_PATH_IMAGE002
One for sketches and the other for images. In the present embodiment, the implicit distribution in the image space is aligned with the implicit distribution in the sketch space, so that only the geometric information is encoded.
First training the encoder by sketch
Figure 28023DEST_PATH_IMAGE013
Sketch encoder
Figure 885120DEST_PATH_IMAGE018
The composed network generates the intermediate features of the sketch, as shown in fig. 2. To preserve the necessary spatial information, the hidden space in the bottleneck layer is not in the form of vectors, but has dimensions of
Figure 313828DEST_PATH_IMAGE049
Of low resolution, wherein
Figure 965389DEST_PATH_IMAGE050
Figure 745126DEST_PATH_IMAGE009
And
Figure 140335DEST_PATH_IMAGE010
is the height, width and number of channels of the geometric feature map. The input and output of the network are sketches, which can be edge maps extracted from the images or hand-drawn sketches. For hand-drawn sketches, particularly incomplete sketches in the sketches drawing process, the inventor uses the sketches manifold projection in the preprocessing process to improve the robustness of the system.
Order to
Figure 892391DEST_PATH_IMAGE011
Represents a real image, and
Figure 698542DEST_PATH_IMAGE012
showing the corresponding sketch thereof. By pre-training
Figure 434416DEST_PATH_IMAGE013
Extracting
Figure 633317DEST_PATH_IMAGE012
Is given by the formula
Figure 505458DEST_PATH_IMAGE014
Then, the encoder needs to be trained
Figure 967663DEST_PATH_IMAGE002
Will correspond to the image
Figure 705681DEST_PATH_IMAGE011
A geometric hidden space mapped to the sketch representation, denoted as
Figure 177113DEST_PATH_IMAGE015
To ensure
Figure 434919DEST_PATH_IMAGE016
And
Figure 68026DEST_PATH_IMAGE017
follow the same scoreCloth, when
Figure 512914DEST_PATH_IMAGE016
And
Figure 49024DEST_PATH_IMAGE017
input pre-trained decoder
Figure 426915DEST_PATH_IMAGE018
Time, algorithm pair
Figure 230923DEST_PATH_IMAGE018
Each layer of (a) imposes constraints while also at the final output
Figure 694266DEST_PATH_IMAGE019
And
Figure 773080DEST_PATH_IMAGE020
with the addition of L1 losses.
1b) Appearance encoder: appearance is another important attribute of facial images. The mapping between the geometric sketch of the face and the real face image is one-to-many and by specifying the appearance, this ambiguity can be resolved.
The present embodiment uses an appearance encoder
Figure 5478DEST_PATH_IMAGE003
Appearance encoder for extracting appearance features
Figure 245967DEST_PATH_IMAGE003
Global average pooling (i.e., averaging all spatial positions in the feature map for each feature channel) is utilized to remove spatial information and extract appearance features that are independent of geometric features.
Since the appearance features are extracted for local regions, deleting spatial information does not result in a significant loss of useful information. The interpolation experiment of the appearance and the geometric characteristics of the human face, such as the graph 6, proves that
Figure 180294DEST_PATH_IMAGE003
A continuous face appearance space can be learned.
2) Image generation: and the local decoupling module generates local images and local intermediate feature images corresponding to all parts of the human face based on the geometric features and the appearance features.
The local decoupling module further comprises an image composition generator
Figure 62799DEST_PATH_IMAGE004
And combining the geometric features and appearance features in one image or two images (one providing the geometric features and the other providing the appearance features) to obtain the converted local images and the intermediate feature map.
An image composition generator: and inputting independent geometric and appearance characteristics to generate a result of reconstruction or geometric and appearance exchange. To control the appearance of the generated face image, the present embodiment employs adaptive instance normalization (AdaIN) in the face image synthesis generation network.
The image composition generator in this embodiment includes 4 residual blocks and 4 upsampling layers: firstly, injecting appearance characteristics into each residual block; then, obtaining feature maps through 4 times of upsampling operations, wherein each feature map has the same resolution as the input image but 64 channels; finally, an image is generated by a convolutional layer that is consistent with the input geometric and appearance characteristics.
3) And (3) global fusion: and fusing the local intermediate feature images corresponding to all parts of the human face through the trained global fusion module to generate a final human face image with the geometric features and the appearance features.
To convert the local image feature map into a complete and natural facial image, one feasible approach is to directly combine local image blocks generated by local decoupling (as generated in fig. 2)
Figure 884124DEST_PATH_IMAGE029
The process of (d). However, this intuitive approach is prone to show artifacts at the boundaries of local partitions.
In the embodiment, the local image blocks are not directly combined, but the intermediate characteristic diagram generated by the local decoupling module is input into the image generation network for combination, so that the network integrates more information streams and generates a high-quality image.
The global fusion module in this example comprises three units: an encoder, a residual block and a decoder. On the basis of the feature map of a given background part, corresponding blocks in the feature map of the background are replaced by feature maps generated by corresponding other blocks in the sequence of mouth, nose, left eye and right eye, and the influence of overlapping among the blocks is reduced. And sending the fused feature map into a global fusion module to generate a brand new face with the input specified appearance and geometric features.
In this example, the whole network framework is trained step by step, the local decoupling module is trained first, then the parameters of the local decoupling module are fixed, and the global fusion module is trained.
A-1) training data set: this embodiment requires a large-scale sketch to match the image data set training network. Meanwhile, the sketch in the sketch image pair requires higher quality, and is similar to a hand-drawn sketch. Conventional edge extraction techniques, such as HED and Canny, typically fail to produce an ideal edge map. Therefore, in the present embodiment, the edge image is obtained using the photocopy filter in Photoshop, then the edge image is simplified to obtain the sketch, and the training data set is constructed, and the resolution of both the image and the sketch is set to 512 × 512 using CelebA HQ as the training data.
A-2) decoupling training, wherein the training process of a local decoupling module comprises three steps:
first, train
Figure 295514DEST_PATH_IMAGE005
And
Figure 733449DEST_PATH_IMAGE006
the geometric hidden space of the sketch is learned using the L1 reconstruction loss function. Once the cover is closed
Figure 888487DEST_PATH_IMAGE005
After training, the geometric characteristics can be expressed as
Figure 829898DEST_PATH_IMAGE021
Then, training the network
Figure 395877DEST_PATH_IMAGE002
From a real image
Figure 321108DEST_PATH_IMAGE011
As input and predict geometric characteristics
Figure 279837DEST_PATH_IMAGE022
So that it follows the same distribution as the learned geometry space. The loss function is defined as follows:
Figure 341334DEST_PATH_IMAGE051
wherein the content of the first and second substances,N=7 is a decoder
Figure 625684DEST_PATH_IMAGE006
Index 0 corresponds to the input feature map, indexNThe other indices are intermediate feature maps corresponding to the output image. In this example, in optimizing
Figure 772632DEST_PATH_IMAGE002
When the parameters of (1) are fixed
Figure 269472DEST_PATH_IMAGE005
And
Figure 185476DEST_PATH_IMAGE006
the weight of (c).
Fixing
Figure 375149DEST_PATH_IMAGE005
And
Figure 274972DEST_PATH_IMAGE002
weight of (2)Training appearance encoder
Figure 824770DEST_PATH_IMAGE003
And image composition generator
Figure 860860DEST_PATH_IMAGE004
Geometric characteristics of sketch
Figure 221434DEST_PATH_IMAGE016
Or geometrical characteristics of real images
Figure 342974DEST_PATH_IMAGE017
Random input
Figure 181617DEST_PATH_IMAGE004
. In the following sections, will
Figure 72212DEST_PATH_IMAGE016
And
Figure 869267DEST_PATH_IMAGE017
are all shown as
Figure 212524DEST_PATH_IMAGE016
Without distinguishing the origin.
In the embodiment, the appearance encoder and the image synthesis generator adopt an exchange training strategy, and the appearance and the geometric structure of the real face image are decoupled by using a cycle consistency loss item; multi-scale discriminators and adversarial loss are also employed to ensure the realism of the generated images.
The exchange training strategy in this example is illustrated as follows: given two images in a training set
Figure 386016DEST_PATH_IMAGE025
Figure 131118DEST_PATH_IMAGE025
Is a real image or sketch (as a geometric feature image of a human face)) and
Figure 817183DEST_PATH_IMAGE027
Figure 647736DEST_PATH_IMAGE027
is a real image (as a face appearance feature image)), as shown in fig. 1, through pre-training
Figure 828182DEST_PATH_IMAGE005
Or
Figure 958949DEST_PATH_IMAGE002
From
Figure 832227DEST_PATH_IMAGE025
And
Figure 150075DEST_PATH_IMAGE027
extracting geometric features from
Figure 134212DEST_PATH_IMAGE052
And
Figure 119486DEST_PATH_IMAGE053
by using
Figure 163665DEST_PATH_IMAGE003
From
Figure 968810DEST_PATH_IMAGE054
Extracting appearance characteristics.
By mixing
Figure 22217DEST_PATH_IMAGE027
Geometric characteristics of
Figure 845685DEST_PATH_IMAGE025
Exchange of geometrical characteristics of, use of
Figure 60766DEST_PATH_IMAGE025
Geometric characteristics of
Figure 353207DEST_PATH_IMAGE026
And
Figure 944725DEST_PATH_IMAGE027
to generate an image
Figure 639012DEST_PATH_IMAGE029
I.e. by
Figure 24994DEST_PATH_IMAGE030
Use of
Figure 804731DEST_PATH_IMAGE029
The appearance characteristics of
Figure 199940DEST_PATH_IMAGE027
Geometric features of (1), cyclically reconstructing the image
Figure 483154DEST_PATH_IMAGE027
I.e. by
Figure 571195DEST_PATH_IMAGE031
The present embodiment also includes self-reconstruction loss: when in use
Figure 572650DEST_PATH_IMAGE002
And
Figure 240391DEST_PATH_IMAGE003
in the same image (e.g. in the same picture)
Figure 892958DEST_PATH_IMAGE025
) As input, it can be reconstructed by using its geometric and appearance features,
Figure 151901DEST_PATH_IMAGE025
can be expressed as
Figure 109493DEST_PATH_IMAGE055
In this embodiment, the following loss function is used to train the local decoupling module:
self-weight building loss:
when the geometric and appearance features are from the same image, i.e.
Figure 846505DEST_PATH_IMAGE056
The self-reconstruction consistency of the algorithm requires that it be reconstructable through the network frameworkI
The self-reconstruction loss function contains three terms: 1) loss of perception
Figure 104311DEST_PATH_IMAGE034
Generating visual similarity between the image and the input image by the pre-trained VGG-19 model metric; 2) loss of feature matching for discriminators
Figure 268576DEST_PATH_IMAGE035
Aiming at stabilizing the training process; 3) color loss
Figure 244622DEST_PATH_IMAGE036
Converting the image into CIE-Lab color space by calculationaAndbthe chrominance distance in the channel controls the hue. The self-reconstruction loss can be expressed as:
Figure 254167DEST_PATH_IMAGE032
whereinaAndbthe channels contain color information in the CIE LAB color space, set empirically
Figure 632058DEST_PATH_IMAGE037
Figure 216492DEST_PATH_IMAGE038
Figure 414255DEST_PATH_IMAGE057
Loss of cyclic exchange:
to completely decouple the geometric and appearance features, the present embodiment uses a swapping method to generate a face image from the geometric and appearance features of different images, i.e.
Figure 493070DEST_PATH_IMAGE058
Figure 991047DEST_PATH_IMAGE054
Loss of cyclic exchange
Figure 965957DEST_PATH_IMAGE059
Containing item
Figure 916595DEST_PATH_IMAGE060
And
Figure 533521DEST_PATH_IMAGE061
to completely decouple the geometric and appearance characteristics of the face image, the face image is processed
Figure 620426DEST_PATH_IMAGE025
Is replaced by
Figure 297395DEST_PATH_IMAGE027
After appearance of (2)
Figure 719018DEST_PATH_IMAGE029
After that, should be kept
Figure 874056DEST_PATH_IMAGE025
The geometry of (2). Thus, the algorithm introduces geometric losses
Figure 815467DEST_PATH_IMAGE060
(geometry loss)Constraining the geometry of the generated image to be invariant by comparison with the input image:
Figure 663337DEST_PATH_IMAGE062
the network uses a cyclic consistency penalty term to ensure that images are exchanged
Figure 322989DEST_PATH_IMAGE029
The appearance of (1)
Figure 547297DEST_PATH_IMAGE027
The same is true. Use of
Figure 608793DEST_PATH_IMAGE027
Geometry of (2) and exchange images
Figure 345674DEST_PATH_IMAGE029
The appearance of (1), the image produced
Figure 492622DEST_PATH_IMAGE063
Should cyclically reconstruct the image
Figure 255041DEST_PATH_IMAGE027
. Reconstruction loss formula before this example uses
Figure 171045DEST_PATH_IMAGE064
Constraints to achieve cycle consistency:
Figure 626297DEST_PATH_IMAGE065
Figure 260541DEST_PATH_IMAGE037
Figure 826651DEST_PATH_IMAGE038
Figure 597161DEST_PATH_IMAGE039
the same as before.
The cycle exchange loss was:
Figure 223314DEST_PATH_IMAGE066
Figure 79275DEST_PATH_IMAGE043
Figure 183497DEST_PATH_IMAGE044
the settings are 1 and 1.
The resistance loss:
in this embodiment, a multi-scale discriminator D is used to limit the distribution of the generated image to match the distribution of the real image:
Figure 339672DEST_PATH_IMAGE045
Figure 120415DEST_PATH_IMAGE067
weighted value
Figure 729251DEST_PATH_IMAGE047
Figure 106006DEST_PATH_IMAGE048
Optimization target of local decoupling module in embodiment
Figure 116687DEST_PATH_IMAGE068
Is the sum of the above 3 items, minimizing
Figure 819064DEST_PATH_IMAGE068
The following 3 networks will be optimized:
Figure 649617DEST_PATH_IMAGE004
Figure 361221DEST_PATH_IMAGE003
Figure 226408DEST_PATH_IMAGE069
Figure 365266DEST_PATH_IMAGE068
can be expressed as:
Figure 417535DEST_PATH_IMAGE070
a-3) global fusion training: after the local decoupling module is trained, a global fusion module needs to be trained, and a feature map generated by the local decoupling module is fused to generate a final result. Similar to the previous stage, the countermeasure penalty, feature matching penalty, and perceptual penalty are used as penalty functions for the global fusion module, which does not use a swapping strategy because it does not involve any decoupling operations.
The embodiment also provides an intelligent face editing device, which comprises a feature extraction unit, an image generation unit and an image fusion unit, wherein the feature extraction unit is used for inputting the geometric feature image and the appearance feature image of the face into corresponding trained local decoupling modules according to the face part and extracting the geometric features and the appearance features corresponding to all parts of the face; the image generation unit is used for generating local images and intermediate characteristic images corresponding to all parts of the human face by the local decoupling module based on the geometric characteristics and the appearance characteristics; the image fusion unit is used for fusing the local intermediate feature images corresponding to all parts of the human face through the trained global fusion module to generate a final human face image with the geometric features and the appearance features.
The present embodiment also provides a storage medium having stored thereon a computer program executable by a processor, the computer program, when executed, implementing the steps of the intelligent face editing method in the present embodiment.
The present embodiment also provides a computer device having a memory and a processor, the memory storing thereon a computer program executable by the processor, the computer program, when executed, implementing the steps of the intelligent face editing method in the present embodiment.

Claims (10)

1. An intelligent face editing method is characterized in that:
inputting the geometric feature image and the appearance feature image of the face into corresponding trained local decoupling modules according to the face parts, and extracting the corresponding geometric features and appearance features of each part of the face;
the local decoupling module generates local images and local intermediate feature images corresponding to all parts of the human face based on the geometric features and the appearance features;
and fusing the local intermediate feature images corresponding to all parts of the human face through the trained global fusion module to generate a final human face image with the geometric features and the appearance features.
2. The intelligent face editing method according to claim 1, characterized in that: the geometric feature image is a draft of a human face or a real image of the human face, and the local decoupling module comprises a draft encoder
Figure 875624DEST_PATH_IMAGE001
Image encoder
Figure 693407DEST_PATH_IMAGE002
Appearance encoder
Figure 302374DEST_PATH_IMAGE003
And image composition generator
Figure 914621DEST_PATH_IMAGE004
The extracting of the geometric features corresponding to each part of the human face from the geometric feature image includes:
training cursive chart encoder
Figure 327148DEST_PATH_IMAGE001
Sketch encoder
Figure 823988DEST_PATH_IMAGE005
The hidden space in the bottleneck layer is of dimension of
Figure 553041DEST_PATH_IMAGE006
Of low resolution, wherein
Figure 742714DEST_PATH_IMAGE007
Figure 236012DEST_PATH_IMAGE008
And
Figure 677489DEST_PATH_IMAGE009
the height, width and channel number of the geometric feature map;
order to
Figure 979157DEST_PATH_IMAGE010
Represents a true image of a human face, and
Figure 74152DEST_PATH_IMAGE011
representing the corresponding face sketch;
sketch encoder through pre-training
Figure 817198DEST_PATH_IMAGE012
Extracting
Figure 186999DEST_PATH_IMAGE011
The geometric feature of (A) is expressed as
Figure 874332DEST_PATH_IMAGE013
Training image encoder
Figure 671387DEST_PATH_IMAGE002
Corresponding human face real image
Figure 749065DEST_PATH_IMAGE010
A geometric hidden space mapped to the sketch representation, denoted as
Figure 532344DEST_PATH_IMAGE014
3. The intelligent face editing method according to claim 2, characterized in that:
when in use
Figure 339763DEST_PATH_IMAGE015
And
Figure 776560DEST_PATH_IMAGE016
input pre-trained decoder
Figure 482479DEST_PATH_IMAGE017
Time, algorithm pair
Figure 662925DEST_PATH_IMAGE017
Each layer of (a) imposes constraints while also at the final output
Figure 324850DEST_PATH_IMAGE018
And
Figure 807916DEST_PATH_IMAGE019
with the addition of L1 losses.
4. The intelligent face editing method according to claim 3, wherein the training of the local decoupling module comprises:
first, train
Figure 391344DEST_PATH_IMAGE001
And
Figure 375480DEST_PATH_IMAGE005
learning the geometric hidden space of the sketch using the L1 reconstruction loss function, once
Figure 954229DEST_PATH_IMAGE001
After training, the geometric characteristics are expressed as
Figure 811458DEST_PATH_IMAGE020
The network is then trained
Figure 882182DEST_PATH_IMAGE002
From the real image of the human face
Figure 670009DEST_PATH_IMAGE010
As input and predict geometric characteristics
Figure 103265DEST_PATH_IMAGE021
So that it follows the same distribution as the learned geometry space, the loss function is defined as follows:
Figure 131395DEST_PATH_IMAGE022
wherein the content of the first and second substances,Nis a decoder
Figure 892677DEST_PATH_IMAGE005
Index 0 corresponds to the input feature map, indexNThe other indices are intermediate feature maps corresponding to the output image.
5. The intelligent face editing method according to claim 2, 3 or 4, characterized in that: the appearance encoder
Figure 749775DEST_PATH_IMAGE003
And eliminating spatial information on the face appearance characteristic image and extracting appearance characteristics irrelevant to the geometric characteristics by utilizing global average pooling.
6. The intelligent face editing method of claim 5, wherein the appearance encoder and the image synthesis generator employ an exchange training strategy;
using geometric images of human faces
Figure 303116DEST_PATH_IMAGE023
Geometric characteristics of
Figure 423519DEST_PATH_IMAGE024
And human face appearance characteristic image
Figure 468835DEST_PATH_IMAGE025
Appearance characteristics of
Figure 411515DEST_PATH_IMAGE026
Generating an image
Figure 960308DEST_PATH_IMAGE027
I.e. by
Figure 641825DEST_PATH_IMAGE028
Use of
Figure 643279DEST_PATH_IMAGE027
The appearance characteristics of
Figure 842179DEST_PATH_IMAGE025
Geometric features of (1), cyclically reconstructing the image
Figure 792948DEST_PATH_IMAGE025
I.e. by
Figure 645367DEST_PATH_IMAGE029
7. The intelligent face editing method of claim 6, wherein the training of the local decoupling module using the following loss function comprises:
a. self-weight building loss:
Figure 399696DEST_PATH_IMAGE030
wherein:
Figure 871129DEST_PATH_IMAGE031
representing a loss of perception;
Figure 394514DEST_PATH_IMAGE032
a loss of feature matching representing a discriminator;
Figure 168566DEST_PATH_IMAGE033
expressing color loss, converting the image to CIE-Lab color space, and controlling hue by calculating chroma distance in a and b channels; the a and b channels contain color information in the CIE LAB color space;
Figure 941350DEST_PATH_IMAGE034
Figure 950894DEST_PATH_IMAGE035
Figure 204152DEST_PATH_IMAGE036
setting specific parameters according to experience;
b. loss of cyclic exchange:
Figure 539319DEST_PATH_IMAGE037
Figure 64978DEST_PATH_IMAGE038
Figure 612634DEST_PATH_IMAGE039
wherein
Figure 110611DEST_PATH_IMAGE040
Figure 226466DEST_PATH_IMAGE041
Setting specific parameters according to experience;
c. the resistance loss:
the distribution of the generated image is limited using a multi-scale discriminator D to match the distribution of the real image:
Figure 442684DEST_PATH_IMAGE042
Figure 794031DEST_PATH_IMAGE043
weighted value
Figure 943252DEST_PATH_IMAGE044
Figure 885800DEST_PATH_IMAGE045
8. An intelligent face editing device, comprising:
the characteristic extraction unit is used for inputting the human face geometric characteristic image and the human face appearance characteristic image into corresponding trained local decoupling modules according to human face parts and extracting the geometric characteristics and appearance characteristics corresponding to all parts of the human face;
the image generation unit is used for generating local images and local intermediate feature images corresponding to all parts of the human face by the local decoupling module based on the geometric features and the appearance features;
and the image fusion unit is used for fusing the local intermediate feature images corresponding to all parts of the human face through the trained global fusion module to generate a final human face image with the geometric features and the appearance features.
9. A storage medium having stored thereon a computer program executable by a processor, the computer program comprising: the computer program when executed implements the steps of the intelligent face editing method of any one of claims 1 to 7.
10. A computer device having a memory and a processor, the memory having stored thereon a computer program executable by the processor, the computer program comprising: the computer program when executed implements the steps of the intelligent face editing method of any one of claims 1 to 7.
CN202110466411.XA 2021-04-28 2021-04-28 Intelligent face editing method and device, storage medium and equipment Active CN112991484B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110466411.XA CN112991484B (en) 2021-04-28 2021-04-28 Intelligent face editing method and device, storage medium and equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110466411.XA CN112991484B (en) 2021-04-28 2021-04-28 Intelligent face editing method and device, storage medium and equipment

Publications (2)

Publication Number Publication Date
CN112991484A true CN112991484A (en) 2021-06-18
CN112991484B CN112991484B (en) 2021-09-03

Family

ID=76340521

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110466411.XA Active CN112991484B (en) 2021-04-28 2021-04-28 Intelligent face editing method and device, storage medium and equipment

Country Status (1)

Country Link
CN (1) CN112991484B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113470182A (en) * 2021-09-03 2021-10-01 中科计算技术创新研究院 Face geometric feature editing method and deep face remodeling editing method
CN114845067A (en) * 2022-07-04 2022-08-02 中科计算技术创新研究院 Hidden space decoupling-based depth video propagation method for face editing

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111160138A (en) * 2019-12-11 2020-05-15 杭州电子科技大学 Fast face exchange method based on convolutional neural network
CN111915693A (en) * 2020-05-22 2020-11-10 中国科学院计算技术研究所 Sketch-based face image generation method and system
CN112188234A (en) * 2019-07-03 2021-01-05 广州虎牙科技有限公司 Image processing and live broadcasting method and related device
CN112241708A (en) * 2020-10-19 2021-01-19 戴姆勒股份公司 Method and apparatus for generating new person image from original person image
CN112258387A (en) * 2020-10-30 2021-01-22 北京航空航天大学 Image conversion system and method for generating cartoon portrait based on face photo
CN112668401A (en) * 2020-12-09 2021-04-16 中国科学院信息工程研究所 Face privacy protection method and device based on feature decoupling
CN112734890A (en) * 2020-12-22 2021-04-30 上海影谱科技有限公司 Human face replacement method and device based on three-dimensional reconstruction
CN112837210A (en) * 2021-01-28 2021-05-25 南京大学 Multi-form-style face cartoon automatic generation method based on feature image blocks

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112188234A (en) * 2019-07-03 2021-01-05 广州虎牙科技有限公司 Image processing and live broadcasting method and related device
CN111160138A (en) * 2019-12-11 2020-05-15 杭州电子科技大学 Fast face exchange method based on convolutional neural network
CN111915693A (en) * 2020-05-22 2020-11-10 中国科学院计算技术研究所 Sketch-based face image generation method and system
CN112241708A (en) * 2020-10-19 2021-01-19 戴姆勒股份公司 Method and apparatus for generating new person image from original person image
CN112258387A (en) * 2020-10-30 2021-01-22 北京航空航天大学 Image conversion system and method for generating cartoon portrait based on face photo
CN112668401A (en) * 2020-12-09 2021-04-16 中国科学院信息工程研究所 Face privacy protection method and device based on feature decoupling
CN112734890A (en) * 2020-12-22 2021-04-30 上海影谱科技有限公司 Human face replacement method and device based on three-dimensional reconstruction
CN112837210A (en) * 2021-01-28 2021-05-25 南京大学 Multi-form-style face cartoon automatic generation method based on feature image blocks

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
SHUYU CHEN,ETC: "DeepFaceDrawing: Deep Generation of Face Images from Sketches", 《ACM TRANSACTIONS ON GRAPHICS》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113470182A (en) * 2021-09-03 2021-10-01 中科计算技术创新研究院 Face geometric feature editing method and deep face remodeling editing method
CN114845067A (en) * 2022-07-04 2022-08-02 中科计算技术创新研究院 Hidden space decoupling-based depth video propagation method for face editing

Also Published As

Publication number Publication date
CN112991484B (en) 2021-09-03

Similar Documents

Publication Publication Date Title
Chen et al. Deep generation of face images from sketches
Chai et al. Using latent space regression to analyze and leverage compositionality in gans
US11880766B2 (en) Techniques for domain to domain projection using a generative model
Chen et al. Beautyglow: On-demand makeup transfer framework with reversible generative network
Khakhulin et al. Realistic one-shot mesh-based head avatars
Wang et al. A survey on face data augmentation
CN108288072A (en) A kind of facial expression synthetic method based on generation confrontation network
Shi et al. Deep generative models on 3d representations: A survey
CN110222668A (en) Based on the multi-pose human facial expression recognition method for generating confrontation network
US11562536B2 (en) Methods and systems for personalized 3D head model deformation
Zhang et al. Hair-GAN: Recovering 3D hair structure from a single image using generative adversarial networks
CN113470182B (en) Face geometric feature editing method and deep face remodeling editing method
CN112991484B (en) Intelligent face editing method and device, storage medium and equipment
Singh et al. Neural style transfer: A critical review
CN113807265B (en) Diversified human face image synthesis method and system
US11587288B2 (en) Methods and systems for constructing facial position map
JP7462120B2 (en) Method, system and computer program for extracting color from two-dimensional (2D) facial images
JP2024506170A (en) Methods, electronic devices, and programs for forming personalized 3D head and face models
Xia et al. Controllable continuous gaze redirection
Zhao et al. Cartoon image processing: a survey
CN117635771A (en) Scene text editing method and device based on semi-supervised contrast learning
Hilsmann et al. Going beyond free viewpoint: creating animatable volumetric video of human performances
Huang et al. IA-FaceS: A bidirectional method for semantic face editing
Li et al. Learning disentangled representation for one-shot progressive face swapping
Liu et al. Transformer-based high-fidelity facial displacement completion for detailed 3d face reconstruction

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Address after: 12 / F, building 4, 108 Xiangyuan Road, Gongshu District, Hangzhou City, Zhejiang Province 310015

Applicant after: Zhongke Computing Technology Innovation Research Institute

Address before: 12 / F, building 4, 108 Xiangyuan Road, Gongshu District, Hangzhou City, Zhejiang Province 310015

Applicant before: Institute of digital economy industry, Institute of computing technology, Chinese Academy of Sciences

CB02 Change of applicant information
GR01 Patent grant
GR01 Patent grant