CN115375596A - Face photo-sketch portrait synthesis method based on two-way condition normalization - Google Patents

Face photo-sketch portrait synthesis method based on two-way condition normalization Download PDF

Info

Publication number
CN115375596A
CN115375596A CN202210885729.6A CN202210885729A CN115375596A CN 115375596 A CN115375596 A CN 115375596A CN 202210885729 A CN202210885729 A CN 202210885729A CN 115375596 A CN115375596 A CN 115375596A
Authority
CN
China
Prior art keywords
module
sketch
face photo
human face
synthesis
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210885729.6A
Other languages
Chinese (zh)
Inventor
王楠楠
吴子成
朱明瑞
易云
何潇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xidian University
Original Assignee
Xidian University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xidian University filed Critical Xidian University
Priority to CN202210885729.6A priority Critical patent/CN115375596A/en
Publication of CN115375596A publication Critical patent/CN115375596A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/50Image enhancement or restoration using two or more images, e.g. averaging or subtraction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20212Image combination
    • G06T2207/20221Image fusion; Image merging
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person
    • G06T2207/30201Face

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Multimedia (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a face photo-sketch portrait synthesis method based on two-way condition normalization, which comprises the following steps: acquiring a face photo-pixel drawing image pair to be synthesized; inputting the pair of the face photo and the sketch portrait to be synthesized into a trained face photo and sketch portrait synthesis network to obtain a synthesis result; the human face photo-sketch portrait synthesis network comprises an encoder, a generator and a semantic segmentation module, wherein the generator comprises a two-way normalization module, a gate control attention feature fusion module and a decoder; the trained human face photo-sketch image synthesis network is obtained by training a training set through the human face photo-sketch image. The invention improves the quality of the human face photo-sketch image synthesis effect.

Description

Face photo-sketch portrait synthesis method based on two-way condition normalization
Technical Field
The invention belongs to the technical field of artificial intelligence and image processing, and particularly relates to a human face photo-sketch portrait synthesis method based on two-way condition normalization.
Background
With the rapid development of the computer vision field, the human face photo-sketch portrait synthesis technology gradually becomes a current research hotspot. The human face photo-sketch image synthesis is a process for synthesizing a human face photo into a human face sketch image or synthesizing the human face sketch image into a human face photo. Through the conversion process, the photo information and the portrait information which are originally in different domains can be converted into the same domain. And because the manual drawing of the face sketch portrait requires a professional painter and a large amount of time, the computer image processing algorithm for automatically synthesizing the face sketch portrait according to the face photo has high application value.
The early generation methods were mostly based on examples and were unidirectional from photo to sketch. These methods use the idea of nearest neighbor block matching, but the generated result often has a large amount of blurring and artifacts. Zhang et al, in the literature "l.zhang, l.lin, x.wu, s.ding, and l.zhang," End-to-End photo-scattering generation via complete volumetric presentation learning, "in Proceedings of the 5th ACM on International Conference on Multimedia retrieval,2015, pp.627-634." propose an End-to-End full volume Network (FCN) to directly learn the mapping between a picture of a face and a sketch; isola et al, in the literature "P.Isola, J. -Y.Zhu, T.Zhou, and A.A.Efrost," Image-to-Image transfer with conditional adaptation of the network, "in Proceedings of the IEEE connection on computer vision and pattern recognition,2017, pp.1125-1134," propose a general method named pix2pix for Image-to-Image conversion on paired datasets; zhu et al, in the documents "j. -y.zhu, t.park, p.isola, and a.a.efros," unappered image-to-image transformation using a cycle-dependent adaptive network, "in Proceedings of the IEEE international conference on computer vision,2017, pp.2223-2232," originally proposed a CycleGAN network framework that can form a universal mapping from domain a to domain B, learn how to transform between two domains, rather than being tied to a specific picture transformation, that can be trained using Unpaired datasets, with strong adaptability; chen et al, in the references "C.Chen, W.Liu, X.Tan, and K. -Y.K.Wong," Semi-equipped learning for face sketch synthesis in the world, "in an assessment on Computer Vision.Springer,2018, pp.216-231," propose a Semi-supervised learning method for extending photo-pixel pairs by constructing a pseudo-image feature that is appended to a training photo; yu et al, in the documents "J.Yu, X.Xu, F.Gao, S.Shi, M.Wang, D.Tao, and Q.Huang," heated real face photo-skin synthesis video composition-aided gates, "IEEE transactions on cybernetics, vol.51, no.9, pp.4350-4362,2020" propose to concentrate training on a specific site with facial composition information as supplementary input and introducing site loss; nie et al, in the documents "L.Nie, L.Liu, Z.Wu, and W.kang," Unconstrained face mask synthesis via prediction-adaptive network and a new benchmark, "neuro-rendering, vol.494, pp.192-202,2022", propose a face sketch synthesis based on perceptual adaptive networks under both constrained and Unconstrained conditions.
However, the above conventional methods do not encode sufficient texture and spatial information during training, and have some defects in the visual quality of the generated image, such as blurring, artifacts, lack of detail description, and lack of a pixel-pen touch feeling in the generated image.
Disclosure of Invention
In order to solve the above problems in the prior art, the present invention provides a method for synthesizing a face photo-sketch portrait based on two-way condition normalization. The technical problem to be solved by the invention is realized by the following technical scheme:
the embodiment of the invention provides a face photo-sketch portrait synthesis method based on two-way condition normalization, which comprises the following steps:
acquiring a face photo-pixel drawing image pair to be synthesized;
inputting the pair of the face photo-sketch portrait to be synthesized into a trained face photo-sketch portrait synthesis network to obtain a synthesis result;
the human face photo-sketch image synthesis network comprises an encoder, a generator and a semantic segmentation module, wherein the generator comprises a two-way normalization module, a gated attention feature fusion module and a decoder; the trained human face photo-sketch image synthesis network is obtained by training a training set through the human face photo-sketch image; the corresponding training process comprises the following steps:
the encoder encodes the human face photo-sketch portrait to a training set and outputs depth characteristics; the semantic segmentation module extracts semantic labels of sketch portraits in a training set; the two-way normalization module performs reinforcement according to the semantic tags and the depth features and according to a spatial information branch and a texture information branch; the gating attention characteristic fusion module fuses output results of the spatial information branch and the texture information branch; the decoder decodes the fusion result and outputs the synthesis result corresponding to the human face photo-sketch portrait to the training set; constructing a loss function of the human face photo-sketch synthesis network according to the human face photo-sketch pair training set and the corresponding synthesis result; and updating parameters of the human face photo-sketch image synthesis network according to the loss function, and continuing training until an iteration stop condition is met to obtain the trained human face photo-sketch image synthesis network.
In one embodiment of the invention, the encoder comprises a plurality of sequentially connected convolutional layers;
in the encoder, the output corresponding to the last three convolutional layers of the encoder is taken as the depth feature.
In one embodiment of the invention, the two-way normalization module comprises a SPADE Resblock module and an AdaIN Resblock module, wherein,
the SPADE Resblock module performs spatial information enhancement according to semantic labels of sketch pictures in the training set and depth characteristics of face photos in the training set;
and the AdaIN Resblock module performs texture information enhancement according to depth characteristics of sketch pictures and face photos in the training set.
In an embodiment of the present invention, the AdaIN Resblock module includes a plurality of residual AdaIN modules connected in sequence; each residual AdaIN module comprises a basic AdaIN module and a residual module which are sequentially connected.
In one embodiment of the invention, the output of the SPADE Resblock module is represented as:
Figure BDA0003765580460000031
wherein p represents a photograph of a human face, A i Represents the output of the SPADE Resblock module corresponding to the i (i =1,2,3) th layer depth feature,
Figure BDA0003765580460000041
indicates the depth feature corresponding to the face picture p in the depth feature of the i (i =1,2,3) th layer, c indicates the channel, and (y, x) indicates the channel
Figure BDA0003765580460000042
At a position on the channel c, the position of the channel c,
Figure BDA0003765580460000043
to represent
Figure BDA0003765580460000044
Value at point (y, x) on the c-th channel, M s The representation of the semantic label is carried out,
Figure BDA0003765580460000045
respectively represent
Figure BDA0003765580460000046
From semantic tag M on the c-th channel s The learned zoom amount and offset amount,
Figure BDA0003765580460000047
respectively represent
Figure BDA0003765580460000048
Mean and variance on the c-th channel.
In one embodiment of the invention, the output of the AdaIN Resblock module is represented as:
Figure BDA0003765580460000049
wherein s represents a sketch image, B i Represents the output of the AdaIN Resblock module corresponding to the i (i =1,2,3) th layer depth feature,
Figure BDA00037655804600000410
indicating a depth feature corresponding to the sketch image s in the depth feature of the ith (i =1,2,3) layer,
Figure BDA00037655804600000411
to represent
Figure BDA00037655804600000412
On the c-th channel with
Figure BDA00037655804600000413
The value at the corresponding same point (y, x),
Figure BDA00037655804600000414
respectively represent F s i Mean and variance on the c-th channel.
In one embodiment of the invention, the gated attention feature fusion module comprises two gating modules, a channel attention module; wherein the content of the first and second substances,
the outputs of the SPADE Resblock module and the AdaIN Resblock module are respectively input into two gating modules;
the outputs of the SPADE Resblock module and the AdaIN Resblock module are superposed and then input into the channel attention module;
the outputs of the two gating modules and the output of the channel attention module are fused.
In one embodiment of the present invention, the fusion of the outputs of the two gating modules and the output of the channel attention module is represented as:
Figure BDA00037655804600000415
wherein, C i Representing the output of the gated attention feature fusion module corresponding to the i (i =1,2,3) th layer depth feature, CA (-) representing the channel attention function,
Figure BDA00037655804600000416
is represented by A i The corresponding gating function is set to a value that,
Figure BDA00037655804600000417
is shown as B i A corresponding gating function.
In one embodiment of the invention, the decoder is an AFF module based decoder;
in the decoder, after the output of the gating attention feature fusion module is up-sampled to obtain features with the same resolution, an AFF module is used for feature fusion, and decoding and outputting are carried out.
In one embodiment of the present invention, the constructed loss function of the face photo-sketch image synthesis network is represented as:
L full =λ 1 L adversarial2 L cycle3 L perceptual
wherein L is full A loss function representing the face photo-sketch compositing network; l is adversarial Indicates the generation of antagonistic losses, L cycle Denotes loss of cyclic consistency, L perceptual Representing the perceptual loss, λ 1 、λ 2 、λ 3 Indicating the balance parameters.
The invention has the beneficial effects that:
the invention provides a human face photo-sketch synthesis method based on two-way condition normalization, which provides a human face photo-sketch synthesis network comprising a two-way normalization module and a gated attention characteristic fusion module, wherein the two-way normalization module and a branch of the two-way normalization module fully encode texture and spatial information, so that the learning of the spatial information and the texture information is enhanced, more real edge and detail characteristics can be reserved, the gated attention characteristic fusion module fully screens and fuses useful information of the two branches, the redundancy of the information is avoided to a certain extent, the two paths jointly improve the quality of the human face photo-sketch synthesis effect, and through user voting investigation, the synthesized image user provided by the embodiment of the invention has better subjective feeling and better user experience.
The present invention will be described in further detail with reference to the accompanying drawings and examples.
Drawings
FIG. 1 is a schematic flow chart of a method for synthesizing a face photo-sketch portrait based on two-way condition normalization according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a network for synthesizing a photo-sketch of a human face according to an embodiment of the present invention;
FIG. 3 is a schematic structural diagram of a two-way condition normalization module according to an embodiment of the present invention;
FIG. 4 is a schematic structural diagram of a gated attention feature fusion module according to an embodiment of the present invention;
FIG. 5 is a schematic diagram of a training process of a face photo-sketch synthesis network according to an embodiment of the present invention;
FIG. 6 is a schematic diagram showing the comparison of sketch portrait composition results of the CUFS data set, the CUFSF data set and the WildScut data set respectively according to the method of the present invention and the prior 6 methods;
fig. 7 is a schematic diagram illustrating comparison of the face photo synthesis results of the CUFS data set and the CUFSF data set respectively by the method of the present invention according to the embodiment of the present invention and 3 existing methods;
fig. 8 is a schematic diagram showing a comparison of satisfaction voting results corresponding to user surveys performed by the method of the present invention provided in the embodiment of the present invention and 3 existing methods, respectively, using face photograph synthesis results in the CUFS data set and the CUFSF data set;
fig. 9 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to specific examples, but the embodiments of the present invention are not limited thereto.
In order to improve the quality of the synthesis effect of the face photo-sketch portrait, please refer to fig. 1, an embodiment of the present invention provides a method for synthesizing a face photo-sketch portrait based on two-way condition normalization, which specifically includes the following steps:
and S10, acquiring a picture-pixel drawing image pair of the human face to be synthesized.
Specifically, in the embodiment of the present invention, in the pair of the face photo to be synthesized and the sketch image, if the face photo is synthesized by the sketch image, the reference face photo is randomly selected, and the reference face photo and the sketch image form a pair of the face photo to be synthesized and the sketch image; if the sketch portrait is synthesized by the face photo, a reference sketch portrait is randomly selected, and the face photo and the reference sketch form a face photo-sketch image pair to be synthesized. Therefore, the human face photo-sketch portrait synthesis method can realize the synthesis of sketch portrait and human face photos. The randomly selected reference face picture or reference sketch can provide a large amount of texture and space prior information, so that the quality of the final face picture-sketch synthesis effect is improved.
S20, inputting the pair of the face photo and the sketch portrait to be synthesized into a trained face photo and sketch portrait synthesis network to obtain a synthesis result.
Specifically, referring to fig. 2, the human face photo-sketch portrait compositing network provided in the embodiment of the present invention includes an encoder, a generator and a semantic segmentation module, where the generator includes a two-way normalization module, a gated attention feature fusion module and a decoder. The same network structure is adopted in the process to be synthesized and the training process, and the same network structure is adopted in the process to be synthesized or the training process for realizing the synthesis of sketch pictures and the synthesis of face photos, but the input and the output are different, and the processing processes in the network are similar, so that whether the samples are to be synthesized or training samples is not determined in the following description. For each face photo-pixel drawing pair in each training set, or to be synthesized, the specific design for each part in fig. 2 is as follows:
the encoder comprises a plurality of convolution layers which are connected in sequence, and multi-scale depth features are extracted through a depth convolution network; specifically, in the encoder, the output corresponding to the last three convolutional layers of the encoder is used as the depth feature. Specifically, the method comprises the following steps:
an input sketch image s is propagated in the forward direction through an encoder composed of a series of convolution layers. Outputting depth features of different resolutions at the last three levels of the encoder as
Figure BDA0003765580460000071
Where i =1,2,3 represents three levels of output characteristics, c i 、h i 、w i Respectively showing the channel number, height and width of the ith layer characteristic diagram.
The input face picture p is subjected to forward propagation through the same encoder (encoder sharing parameter) as that for the sketch image s. Outputting depth features of different resolutions at the last three levels of the encoder as
Figure BDA0003765580460000072
Where i =1,2,3 represents three levels of output characteristics, c i 、h i 、w i Respectively showing the channel number, height and width of the ith layer characteristic diagram.
Here, in case of a face photograph to be composed by sketching an image s, the face photograph p in fig. 2 is a reference face photograph p selected at random ref If the sketch s is to be composed and is composed by the face picture p, the sketch s in FIG. 2 is a randomly selected reference sketch s ref (ii) a In the same way, if it is trainingIn the training process, and the face photo-sketch synthesis network for synthesizing the face photo is trained through the sketch s, the face photo p in fig. 2 is a random reference face photo p in the training set ref If the face photo-sketch image synthesis network for sketch image synthesis is trained by the face photo p in the training process, the sketch image s in fig. 2 is a random reference face photo s in the training set ref
Further, the embodiment of the invention is used for extracting the depth features
Figure BDA0003765580460000073
And
Figure BDA0003765580460000074
and carrying out two-way normalization, wherein the two-way normalization operation comprises two conditional normalization branches in total, and explicitly decomposing the overall mapping into two independent mappings to respectively enhance the learning of spatial information and texture information. Referring to fig. 3, the embodiment of the present invention designs a two-way normalization module for such a design concept, where the two-way normalization module includes two branches formed by a SPADE Resblock module and an AdaIN Resblock module, where,
the first spatial information branch is realized by a SPADE Resblock module, and the SPADE Resblock module reinforces the spatial information according to the semantic labels of the sketch in the training set and the depth characteristics of the face photos in the training set; and the second texture information branch is realized by an AdaIN Resblock module, and the AdaIN Resblock module performs texture information enhancement according to the depth characteristics of the sketch image and the face photo in the training set.
For the first spatial information branch, the SPADE reslock module according to the embodiment of the present invention may directly use the existing SPADE reslock module, which is described in detail in "t.park, m. -y.liu, t. -c.wang, and j. -y.zhu," sensory image synthesis with activity-adaptive modulation, "in Proceedings of the IEEE/CVF con on video and pattern registration, 2019, pp.2337-2346", and is not described herein again, and a network structure of the SPADE reslock module is specifically shown in fig. 3. The inventionExtracting semantic label M of input sketch portrait s by using semantic segmentation model BiSeNet trained on CelebA-HQ database in advance s Then will be
Figure BDA0003765580460000081
And M s Input into SPADE Resblock module, semantic tag M is used at channel level s The learned mean and variance pair
Figure BDA0003765580460000082
Modulation is performed to enhance the learning of spatial information. The output of the final SPADE Resblock module is expressed as:
Figure BDA0003765580460000083
wherein p represents a photograph of a human face, A i Represents the output of the SPADE Resblock module corresponding to the i (i =1,2,3) th layer depth feature,
Figure BDA0003765580460000084
indicates the depth feature corresponding to the face picture p in the depth feature of the i (i =1,2,3) th layer, c indicates the channel, and (y, x) indicates the channel
Figure BDA0003765580460000085
At a position on the channel c, the position of the channel c,
Figure BDA0003765580460000086
to represent
Figure BDA0003765580460000087
Value at point (y, x) on the c-th channel, M s The representation of the semantic label is carried out,
Figure BDA0003765580460000088
respectively represent
Figure BDA0003765580460000089
From semantic tag M on the c-th channel s Amount of zoom inAnd an offset amount of the first and second optical fibers,
Figure BDA0003765580460000091
respectively represent
Figure BDA0003765580460000092
Mean and variance on the c-th channel. Wherein the content of the first and second substances,
Figure BDA0003765580460000093
the mean and variance on the c-th channel are expressed as:
Figure BDA0003765580460000094
Figure BDA0003765580460000095
wherein H i 、W i Respectively represent
Figure BDA0003765580460000096
Corresponding overall width and overall height.
However, due to semantic tag M s Only the semantic information, the use of this branch alone inevitably loses fine texture information. Therefore, the embodiment of the invention also designs another branch, and the AdaIN Resblock module realizes the enhancement of the texture information.
For the second texture information branch, the embodiment of the present invention is obtained by improving the existing AdaIN module. The conventional AdaIN module is described in detail in "X.Huang and S.Belongie", "Archerary style transfer in real-time with adaptive instance knowledge", "in Proceedings of the IEEE international conference on computer vision,2017, pp.1501-1510", and will not be described herein again. Since the AdaIN module is a real-time style migration module, by introducing this module, by adjusting the mean and variance, it is possible to transform the AdaIN module into a real-time style migration module
Figure BDA0003765580460000097
To the style information of
Figure BDA0003765580460000098
In (3), the characterization capability of the texture is enhanced.
However, the direct use of the existing AdaIN module is not suitable for this task due to the large modulation gap between the face picture and the sketch picture and the insufficient adaptive capability of the parameterless operation. Therefore, in the embodiment of the present invention, the residual error block is introduced to improve the learning ability, and please refer to fig. 3 again, the AdaIN Resblock module in the embodiment of the present invention includes a plurality of residual error AdaIN modules connected in sequence; each residual AdaIN module comprises a basic AdaIN module and a residual module which are sequentially connected. For example, the AdaIN ResBlock module in the embodiment of the present invention is composed of 9 residual AdaIN modules connected in sequence, each residual AdaIN module is composed of an existing AdaIN module and a residual block, and the residual block added behind the existing AdaIN module can improve the receptive field and enhance the adaptive capability of the branch. The output of the final AdaIN Resblock module is represented as:
Figure BDA0003765580460000101
wherein s represents a sketch image, B i Represents the output of the AdaIN Resblock module corresponding to the i (i =1,2,3) th layer depth feature,
Figure BDA0003765580460000102
indicates the depth characteristic corresponding to the sketch portrait s in the depth characteristic of the i (i =1,2,3) th layer,
Figure BDA0003765580460000103
to represent
Figure BDA0003765580460000104
On the c-th channel with
Figure BDA0003765580460000105
Correspond toThe value at the same point (y, x),
Figure BDA0003765580460000106
respectively represent
Figure BDA0003765580460000107
Mean and variance on the c-th channel. Wherein the content of the first and second substances,
Figure BDA0003765580460000108
mean and variance calculations on the c-th channel are similar
Figure BDA0003765580460000109
Respectively expressed as:
Figure BDA00037655804600001010
Figure BDA00037655804600001011
wherein, the formula (5) and the formula (6) are respectively utilized
Figure BDA00037655804600001012
The corresponding total width and total height are calculated. Here, the first and second liquid crystal display panels are,
Figure BDA00037655804600001013
the corresponding total width and total height are respectively taken as H i 、W i And is prepared by
Figure BDA00037655804600001014
The corresponding overall width and overall height are the same.
Further, the embodiment of the present invention provides a gated channel attention fusion module, which integrates a gating mechanism and a channel attention mechanism, and fuses two information branches, allowing a network to selectively amplify useful feature channels based on global information, and suppressing useless feature channels. Referring to fig. 4, the gated attention feature fusion module according to the embodiment of the present invention includes two gating modules and a channel attention module; wherein the content of the first and second substances,
the outputs of the SPADE Resblock module and the AdaIN Resblock module are respectively input into two gating modules; the outputs of the SPADE Resblock module and the AdaIN Resblock module are superposed and then input into the channel attention module; the outputs of the two gating modules and the output of the channel attention module are fused, and the fusion of the outputs of the two gating modules and the output of the channel attention module is specifically expressed as:
Figure BDA00037655804600001015
wherein A is i And B i Denotes the output of the SPADE Resblock module and the output of the AdaIN Resblock module corresponding to the i-th (i =1,2,3) layer depth feature, respectively, ". Denotes element-by-element multiplication, which is shown in FIG. 3 by
Figure BDA00037655804600001016
Denotes the addition element by element, which is denoted by "+" in FIG. 3
Figure BDA00037655804600001017
Is represented by C i Represents the output of the gated attention feature fusion module corresponding to the i-th (i =1,2,3) layer depth feature, CA (-) represents the channel attention function,
Figure BDA00037655804600001018
is represented by A i The corresponding gating function is set to be,
Figure BDA0003765580460000111
is represented by B i The corresponding gating function. Gating module input A in FIG. 3 i Corresponding output
Figure BDA0003765580460000112
Gating module input B i Corresponding output
Figure BDA0003765580460000113
In FIG. 3, α is CA (A) i +B i )。
Here, the gating function
Figure BDA0003765580460000114
Respectively expressed as:
Figure BDA0003765580460000115
Figure BDA0003765580460000116
where Conv denotes a convolution operation of 1 × 1, and σ denotes a sigmoid function.
The channel attention function CA (-) is expressed as:
CA(A i +B i )=σ(Conv 2 (δ(Conv 1 (A i +B i )))) (10)
wherein, conv 1 And Conv 2 Each represents a 1 × 1 convolution operation, and δ represents a modified linear unit ReLU.
The gating function of the embodiment of the invention can judge the importance of each feature vector in the feature map, effectively control information flow, and clearly model the interdependency relationship between channels by channel attention, and globally filter the information flow, thereby realizing that a network is allowed to selectively amplify useful feature channels based on global information, and inhibiting useless feature channels.
Further, referring to fig. 2 again, the decoder according to the embodiment of the present invention is based on an AFF module; in a decoder, after the output of the gating attention feature fusion module is up-sampled to obtain features with the same resolution, the AFF module is used for feature fusion, and decoding and outputting are carried out. Specifically, the method comprises the following steps:
the outputs of the three-level gated attention feature fusion module are integrated using an Attention Feature Fusion (AFF) based decoder to generate the final composite result. AFF moduleIs a conventional attention feature fusion module, which is described in detail in "Y.Dai, F.Gieseke, S.Oehmcke, Y.Wu, and K.Barnard," Attentional feature fusion, "in Proceedings of the IEEE/CVF Window Conference on Applications of Computer Vision,2021, pp.3560-3569", and will not be described herein again. The embodiment of the invention samples three characteristic graphs obtained by a gating attention characteristic fusion module to the same resolution ratio, uses an AFF module to perform characteristic fusion pairwise, and decodes to generate a final synthetic result, wherein the synthetic result is a face photo
Figure BDA0003765580460000117
Or sketch of an image
Figure BDA0003765580460000118
Furthermore, the trained human face photo-sketch portrait synthesis network adopted in the synthesis process of the embodiment of the invention is obtained by training a training set through the human face photo-sketch portrait; the method comprises the steps of selecting M images from a face photo data set to form a face photo training set, selecting M face sketch images corresponding to the M face photos from a face sketch image data set to form an M-to-face photo-sketch image pair together, and using the M-to-face photo-sketch image pair as a face photo-sketch image pair training set.
Referring to fig. 5, the detailed training process includes the following steps:
s201, an encoder encodes a face photo-sketch portrait to a training set and outputs depth characteristics;
s202, a semantic segmentation module extracts semantic labels of sketch portraits in a training set;
s203, the two-way normalization module performs reinforcement according to the semantic tags and the depth features and the spatial information branch and the texture information branch;
s204, the gated attention feature fusion module fuses output results of the spatial information branch and the texture information branch;
s205, decoding the fusion result by a decoder to output a synthetic result corresponding to the training set by the face photo-sketch portrait;
the implementation of S201-S205 is described with reference to the detailed design of each part above.
S206, constructing a loss function of the human face photo-sketch synthesis network according to the human face photo-sketch pair training set and the corresponding synthesis result.
Specifically, the loss function of the human face photo-sketch image synthesis network constructed by the embodiment of the invention is expressed as follows:
L full =λ 1 L adversarial2 L cycle3 L perceptual (11)
wherein L is full A loss function representing a face photo-sketch portrait compositing network; l is adversarial Indicates the creation of a countermeasure loss, L cycle Denotes loss of cyclic consistency, L perceptual Representing the perceptual loss, λ 1 、λ 2 、λ 3 Expressing the balance parameters to balance different loss values, in the embodiment of the invention, setting lambda 1 =1,λ 2 =0.5,λ 3 =0.5. It can be seen that the loss function of the embodiment of the present invention is composed of three parts:
generating the confrontation loss: generating the antagonistic loss aims to guide the generator to obtain a more realistic generated result. When the generation of the damage to the resistance is calculated, a discriminator is connected after the generation, and the discriminator in the CycleGAN in the structural references of J. -Y.Zhu, T.park, P.Isola, and A.A.Efrons, "Unperared image-to-image transformation using cycle-dependent adaptive networks," in Proceedings of the IEEE international conference on computer vision,2017, pp.2223-2232 "is trained. The resulting opposing losses for the generator and the arbiter are expressed as:
L adversarial =E p~Pdata(p) [(D(p)) 2 ]+E s~Pdata(s) [(1-D(G(s,p ref ,M(s)))) 2 ] (12)
wherein E (-) represents the expected value of the distribution function, E p~Pdata(p) (. E) represents the expectation of correspondence of the face photograph p in the face photograph dataset, E s~Pdata(s) (. Table)Showing the corresponding expectation of sketch portrait s in sketch portrait data set, D (-) represents a discriminator, D (p) represents a discrimination result obtained by inputting face picture p into the discriminator, M (-) represents a face semantic segmentation network BiSeNet, M(s) represents a semantic label result obtained by inputting sketch portrait s into the semantic segmentation network, G (-) represents a generator for generating face picture by sketch portrait s, G (s, p) represents a semantic label result obtained by inputting sketch portrait s into the semantic segmentation network ref M (s)) represents a sketch image s and a reference sample picture p ref The semantic label M(s) is input into the synthesis result obtained in the generator, i.e. the photo generated in FIG. 2
Figure BDA0003765580460000131
D(G(s,p ref M (s))) represents the generation of a photograph
Figure BDA0003765580460000132
The discrimination result obtained in the discriminator is input.
Loss of cycle consistency: given a sketch S and its face label representation M (S), after the cyclic transformation S → P and P → S of the S and P domains, the sketch S should be transformed back to the original domain, with the corresponding cyclic consistency penalty expressed as:
L cycle =E s~Pdata(s) [||F(G(s,p ref ,M(s)),s ref ,M(G(s,p ref ,M(s))))-s|| 1 ] (13)
wherein s is ref 、p ref Reference sample images, G (S, P), representing S and P domains, respectively ref M (s)) means that the generator generates a photograph
Figure BDA0003765580460000133
M(G(s,p ref M (s)))) indicates that a photograph will be generated
Figure BDA0003765580460000134
Inputting semantic labeling results obtained from a semantic segmentation network, F (-) represents a generator for generating sketch portrait by face photo, F (G (s, p) ref ,M(s)),s ref ,M(G(s,p ref M (s)))) represents G (s, p) ref Production of photographs of M (s))
Figure BDA0003765580460000135
Reference sample sketch s ref Semantic tag M (G (s, p) ref M (s))) input generator, F (-) has the same network structure as G (-) to | Bai | | Liu 1 The L1 norm is obtained.
Loss of perception: the perception loss is introduced, the generated photos can be similar to the real photos on the semantic feature level, the perception loss is designed through a pre-trained VGG-19 model, the real photos and the generated photos are input into the pre-trained VGG-19 model, and the corresponding perception loss is expressed as follows:
Figure BDA0003765580460000141
wherein phi is j (. C) represents the output characteristic diagram of the j layer in the pre-trained VGG-19 model j 、H j And W j Respectively representing the channel number, the height and the width of the output characteristic diagram of the j-th layer.
Further, updating network parameters of the human face photo-sketch synthesis network according to the loss function, and continuing training until an iteration stop condition is met, then:
s207, outputting the current human face photo-sketch image synthetic network as a trained human face photo-sketch image synthetic network.
In the whole iterative calculation process, parameters can be updated by using a gradient descent algorithm until the human face photo-sketch portrait synthetic network model converges, but the method is not limited to the gradient descent algorithm.
It should be noted that, in the embodiment of the present invention, regardless of whether the sketch image is synthesized or the human face photograph is synthesized, the training process may be adopted to train the corresponding human face photograph-sketch image synthesis network model, and the training process selects the corresponding reference human face photograph or reference sketch image for training, so that a large amount of texture and spatial prior information may be provided, and the human face photograph-sketch image synthesis network model obtained by training may better synthesize the sketch image or the human face photograph.
In order to verify the effectiveness of the method for synthesizing the face photo-sketch portrait based on two-way condition normalization provided by the embodiment of the invention, the following experiment is performed for verification.
1. Simulation conditions
The embodiment of the invention uses a Pythrch frame to simulate the CPU which is an Inter (R) Xeon (R) Gold 6226R 2.90GHz CPU, an NVIDIA GeForce RTX 3090GPU and an Ubuntu 16.04 operating system. Training is performed on the CUFS dataset, CUFSF dataset, and the WildSketch dataset, respectively.
The methods compared in the experiment were as follows:
one is FCN, referred to as "L.Zhang, L.Lin, X.Wu, S.Ding, and L.Zhang," End-to-End photo-deletion generation via complete collaborative presentation learning, "in Proceedings of the 5th ACM on International Conference on Multimedia retrieval,2015, pp.627-634." which proposes an End-to-End full convolution network to directly learn the mapping of a picture of a human face to a portrait. However, the network is too shallow to dig out deep semantic information.
Second is pix2pix, referenced as "p.isola, j. -y.zhu, t.zhou, and a.a.efrost," Image-to-Image transformation with conditional adaptation network, "in Proceedings of the IEEE connection on computer vision and pattern recognition,2017, pp.1125-1134," which uses the condition GAN (cGAN) as a unified solution for Image-to-Image transformation on paired datasets.
Thirdly, cycleGAN, referenced as "j. -y.zhu, t.park, p.isola, and a.a.efros," unknown image-to-image transformation using cycle-dependent adaptive networks, "in Proceedings of the IEEE international conference on computer vision,2017, pp.2223-2232", which proposes a CycleGAN network framework that can form a universal mapping from domain a to domain B, learn how to transform between two domains, rather than being tied to a specific picture transformation, that can be trained using Unpaired datasets, with strong adaptability.
Fourth, a field human face sketch synthesis method based on Semi-supervised learning, which is denoted as Wild in the experiment, and the reference is "c.chen, w.liu, x.tan, and k. -y.k.wong," Semi-supervised learning for face sketch synthesis in the world, "in assistant Conference on Computer vision, spring, 2018, pp.216-231", and the method extends photo-picture pairs by constructing pseudo-picture features of additional training photos.
Fifth, SCAGAN, referenced as "j.yu, x.xu, f.gao, s.shi, m.wang, d.tao, and q.huang," heated responsive face photo-skin synthesis vision composition-aided gates, "IEEE transactions on cybernetics, vol.51, no.9, pp.4350-4362,2020", proposes using facial composition information as a supplemental input and introducing site loss to focus training on a specific site.
Sixthly, PANET, the reference is "L.Nie, L.Liu, Z.Wu, and W.kang," Unconstrained face mask synthesis via prediction-adaptive network and a new benchmark, "neuro-rendering, vol.494, pp.192-202,2022", and the method provides face sketch synthesis based on perception adaptive network under the condition of constraint and no constraint.
2. Emulated content
A partial human face photo-sketch image pair is selected from a CUFS data set, a CUFSF data set and a WildSkey data set to be used as a human face photo-sketch image pair to be synthesized, and the result of sketch image synthesis of the method and 6 existing methods is shown in FIG. 6, specifically: FIG. 6, line 1, is a sketch image composition result for the cuhk sub data set in the CUFS data set, line 2, line 3, line xm2vts sub data set in the CUFS data set, line 4, line 5, is a sketch image composition result for the WildSketch data set; line 1 of fig. 7 is the result of synthesizing a face photo of the cuhk subdata set in the CUFS dataset, line 2 is the result of synthesizing a face photo of the ar subdata set in the CUFS dataset, line 3 is the result of synthesizing a face photo of the xm2vts subdata set in the CUFS dataset, and line 4 is the result of synthesizing a face photo of the CUFSF dataset. In fig. 6 and 7, the rightmost group Truth represents a synthetic image corresponding to the leftmost Test Photo in the training set, and the closer the synthetic result of all the methods is to the group Truth, the better the synthetic effect of the method is; in FIGS. 6 and 7, the 2 nd column on the right is the synthesis result of the method used in the present invention.
As can be seen from fig. 6 and 7, the synthesis result of the method adopted in the embodiment of the present invention well restores texture information and structure information, retains more real edge and detail features, and better overcomes the problems of blurring and artifacts.
Meanwhile, in the embodiment of the present invention, 3 existing methods and the method used in the present invention (denoted as DCNP in fig. 8) are selected to perform user survey on the face photograph synthesis results on the CUFS data set and the WildSketch data set, 14 groups of questionnaires are set in total (7 groups on the CUFS data set and 7 groups on the WildSketch data set), and the user selects the most satisfied result from the questionnaires to vote. 700 votes from 50 users are collected, and the specific proportion of the votes in each method is shown in FIG. 8.
In summary, the method for synthesizing a face photo-sketch map based on two-way conditional normalization provided by the embodiment of the invention provides a face photo-sketch map synthesis network comprising a two-way normalization module and a gated attention feature fusion module, wherein a branch of the two-way normalization module fully encodes texture and spatial information, so that the learning of the spatial information and the texture information is enhanced, more real edge and detail features can be retained, the gated attention feature fusion module fully screens and fuses useful information of the two branches, the redundancy of the information is avoided to a certain extent, the two paths jointly improve the quality of a face photo-sketch map synthesis effect, and through user investigation, the user subjective feeling of the synthesized image provided by the embodiment of the invention is better, and the synthesized image has better user experience.
Meanwhile, in the embodiment of the invention, reference sample images (reference face photos p) are additionally introduced in the processes of synthesis and training ref Or with reference to a sketch image s ref ) To provide a large amount of texture and spatial prior information, furtherThe quality of the synthesis effect of the human face photo-sketch portrait is improved.
Referring to fig. 9, an embodiment of the present invention provides an electronic device, which includes a processor 901, a communication interface 902, a memory 903, and a communication bus 904, where the processor 901, the communication interface 902, and the memory 903 complete mutual communication through the communication bus 904;
a memory 903 for storing computer programs;
the processor 901 is configured to implement the steps of the above-mentioned method for synthesizing a human face photo-sketch image based on two-way condition normalization when executing the program stored in the memory 903.
The embodiment of the invention provides a computer-readable storage medium, wherein a computer program is stored in the computer-readable storage medium, and when the computer program is executed by a processor, the steps of the human face photo-sketch portrait synthesis method based on two-way condition normalization are realized.
For the electronic device/storage medium embodiment, since it is basically similar to the method embodiment, the description is simple, and for the relevant points, refer to the partial description of the method embodiment.
In the description of the present invention, it is to be understood that the terms "first", "second" and the like are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implying any number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include one or more of that feature. In the description of the present invention, "a plurality" means two or more unless specifically defined otherwise.
While the present application has been described in connection with various embodiments, other variations to the disclosed embodiments can be understood and effected by those skilled in the art in practicing the claimed application, from a review of the drawings, the disclosure, and the appended claims. In the claims, the word "comprising" does not exclude other elements or steps, and the word "a" or "an" does not exclude a plurality. A single processor or other unit may fulfill the functions of several items recited in the claims. The mere fact that certain measures are recited in mutually different dependent claims does not indicate that a combination of these measures cannot be used to advantage.
The foregoing is a more detailed description of the invention in connection with specific preferred embodiments and it is not intended that the invention be limited to these specific details. For those skilled in the art to which the invention pertains, several simple deductions or substitutions can be made without departing from the spirit of the invention, and all shall be considered as belonging to the protection scope of the invention.

Claims (10)

1. A face photo-sketch portrait synthesis method based on two-way condition normalization is characterized by comprising the following steps:
acquiring a face photo-pixel drawing image pair to be synthesized;
inputting the pair of the face photo-sketch portrait to be synthesized into a trained face photo-sketch portrait synthesis network to obtain a synthesis result;
the human face photo-sketch portrait synthesis network comprises an encoder, a generator and a semantic segmentation module, wherein the generator comprises a double-path normalization module, a gated attention feature fusion module and a decoder; the trained human face photo-sketch portrait synthesis network is obtained by training a training set through the human face photo-sketch portrait; the corresponding training process comprises the following steps:
the encoder encodes the face photo-sketch portrait to a training set and outputs depth characteristics; the semantic segmentation module extracts semantic labels of sketch portraits in a training set; the two-way normalization module reinforces according to the semantic tag and the depth characteristic according to a spatial information branch and a texture information branch; the gating attention characteristic fusion module fuses output results of the spatial information branch and the texture information branch; the decoder decodes the fusion result and outputs the synthesis result corresponding to the human face photo-sketch portrait to the training set; constructing a loss function of the human face photo-sketch synthesis network according to the human face photo-sketch pair training set and the corresponding synthesis result; and updating parameters of the human face photo-sketch image synthesis network according to the loss function, and continuing training until an iteration stop condition is met to obtain the trained human face photo-sketch image synthesis network.
2. The method of claim 1, wherein the encoder comprises a plurality of sequentially connected convolution layers;
in the encoder, the output corresponding to the last three convolutional layers of the encoder is taken as the depth feature.
3. The method of two-way conditional normalization-based human face photo-sketch portrait compositing of claim 1, wherein said two-way normalization module comprises a SPADE Resblock module and an AdaIN Resblock module, wherein,
the SPADE Resblock module performs spatial information enhancement according to semantic labels of sketch pictures in the training set and depth characteristics of face photos in the training set;
and the AdaIN Resblock module performs texture information enhancement according to the depth characteristics of the sketch image and the face photo in the training set.
4. The method of claim 3, wherein the AdaIN reblock module comprises a plurality of residual AdaIN modules connected in sequence; each residual AdaIN module comprises a basic AdaIN module and a residual module which are sequentially connected.
5. The method of claim 3, wherein the output of the SPADE Resblock module is represented as:
Figure FDA0003765580450000021
wherein p represents a photograph of a human face, A i Represents the output of the SPADE Resblock module corresponding to the i (i =1,2,3) th layer depth feature,
Figure FDA0003765580450000022
represents the depth feature corresponding to the face picture p in the depth feature of the ith (i =1,2,3), c represents the channel, and (y, x) represents the channel
Figure FDA0003765580450000023
At a position on the channel c, the position of the channel c,
Figure FDA0003765580450000024
to represent
Figure FDA0003765580450000025
Value at point (y, x) on the c-th channel, M s The representation of the semantic label is carried out,
Figure FDA0003765580450000026
respectively represent
Figure FDA0003765580450000027
From semantic tag M on the c-th channel s The zoom amount and the offset amount learned in (1),
Figure FDA0003765580450000028
respectively represent
Figure FDA0003765580450000029
Mean and variance on the c-th channel.
6. The method of two-way conditional normalization-based face photo-sketch synthesis of claim 5, wherein the output of the AdaIN reblock module is represented as:
Figure FDA00037655804500000210
wherein s represents a sketch image, B i Represents the output of the AdaIN Resblock module corresponding to the i (i =1,2,3) th layer depth feature,
Figure FDA00037655804500000211
indicating a depth feature corresponding to the sketch image s in the depth feature of the ith (i =1,2,3) layer,
Figure FDA00037655804500000212
represent
Figure FDA00037655804500000213
On the c-th channel with
Figure FDA00037655804500000214
The value at the corresponding same point (y, x),
Figure FDA00037655804500000215
respectively represent
Figure FDA00037655804500000216
Mean and variance on the c-th channel.
7. The method of claim 6, wherein the gated attention feature fusion module comprises two gated modules, a channel attention module; wherein, the first and the second end of the pipe are connected with each other,
the outputs of the SPADE Resblock module and the AdaIN Resblock module are respectively input into two gating modules;
the outputs of the SPADE Resblock module and the AdaIN Resblock module are superposed and then input into the channel attention module;
the outputs of the two gating modules and the output of the channel attention module are fused.
8. The method of claim 7, wherein the output of two gating modules and the output of the channel attention module are fused as:
Figure FDA0003765580450000031
wherein, C i Representing the output of the gated attention feature fusion module corresponding to the i (i =1,2,3) th layer depth feature, CA (-) representing the channel attention function,
Figure FDA0003765580450000032
is represented by A i The corresponding gating function is set to a value that,
Figure FDA0003765580450000033
is shown as B i The corresponding gating function.
9. The method for two-way normalization based human face photo-sketch portrait synthesis method as claimed in claim 1, wherein said decoder is an AFF module based decoder;
in the decoder, after the output of the gated attention feature fusion module is up-sampled to obtain features with the same resolution, an AFF module is used for feature fusion, and decoding and outputting are performed.
10. The method for synthesizing a human face photo-sketch image based on two-way condition normalization according to claim 1, wherein the constructed loss function of the human face photo-sketch image synthesis network is represented as follows:
L full =λ 1 L adversarial2 L cycle3 L perceptual
wherein L is full Represents the aboveLoss function of human face photo-sketch portrait synthesis network; l is adversarial Indicates the creation of a countermeasure loss, L cycle Represents a loss of cyclic consistency, L perceptual Representing the perceptual loss, λ 1 、λ 2 、λ 3 Indicating the balance parameters.
CN202210885729.6A 2022-07-26 2022-07-26 Face photo-sketch portrait synthesis method based on two-way condition normalization Pending CN115375596A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210885729.6A CN115375596A (en) 2022-07-26 2022-07-26 Face photo-sketch portrait synthesis method based on two-way condition normalization

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210885729.6A CN115375596A (en) 2022-07-26 2022-07-26 Face photo-sketch portrait synthesis method based on two-way condition normalization

Publications (1)

Publication Number Publication Date
CN115375596A true CN115375596A (en) 2022-11-22

Family

ID=84063410

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210885729.6A Pending CN115375596A (en) 2022-07-26 2022-07-26 Face photo-sketch portrait synthesis method based on two-way condition normalization

Country Status (1)

Country Link
CN (1) CN115375596A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117392247A (en) * 2023-09-25 2024-01-12 清华大学 Image video semantic coding and decoding method and device based on sketch

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117392247A (en) * 2023-09-25 2024-01-12 清华大学 Image video semantic coding and decoding method and device based on sketch

Similar Documents

Publication Publication Date Title
US10593021B1 (en) Motion deblurring using neural network architectures
Li et al. Zero-shot image dehazing
Zhu et al. One shot face swapping on megapixels
Xia et al. Gan inversion: A survey
CN111291212B (en) Zero sample sketch image retrieval method and system based on graph convolution neural network
CN111047548B (en) Attitude transformation data processing method and device, computer equipment and storage medium
CN111079601A (en) Video content description method, system and device based on multi-mode attention mechanism
CN110728219A (en) 3D face generation method based on multi-column multi-scale graph convolution neural network
CN109410135B (en) Anti-learning image defogging and fogging method
CN113780149A (en) Method for efficiently extracting building target of remote sensing image based on attention mechanism
CN112837215B (en) Image shape transformation method based on generation countermeasure network
CN114783034A (en) Facial expression recognition method based on fusion of local sensitive features and global features
CN114339409A (en) Video processing method, video processing device, computer equipment and storage medium
CN111986105A (en) Video time sequence consistency enhancing method based on time domain denoising mask
Hu et al. Dear-gan: Degradation-aware face restoration with gan prior
CN113129234A (en) Incomplete image fine repairing method based on intra-field and extra-field feature fusion
Chen et al. MICU: Image super-resolution via multi-level information compensation and U-net
CN115375596A (en) Face photo-sketch portrait synthesis method based on two-way condition normalization
Zhou et al. A superior image inpainting scheme using Transformer-based self-supervised attention GAN model
CN116523985B (en) Structure and texture feature guided double-encoder image restoration method
Zhou et al. Cloud removal for optical remote sensing imagery using distortion coding network combined with compound loss functions
CN112686830A (en) Super-resolution method of single depth map based on image decomposition
Liu et al. Diverse Hyperspectral Remote Sensing Image Synthesis With Diffusion Models
CN116975347A (en) Image generation model training method and related device
Huang et al. DSRD: deep sparse representation with learnable dictionary for remotely sensed image denoising

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination