CN115375596A - Face photo-sketch portrait synthesis method based on two-way condition normalization - Google Patents
Face photo-sketch portrait synthesis method based on two-way condition normalization Download PDFInfo
- Publication number
- CN115375596A CN115375596A CN202210885729.6A CN202210885729A CN115375596A CN 115375596 A CN115375596 A CN 115375596A CN 202210885729 A CN202210885729 A CN 202210885729A CN 115375596 A CN115375596 A CN 115375596A
- Authority
- CN
- China
- Prior art keywords
- module
- sketch
- face photo
- human face
- synthesis
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000010606 normalization Methods 0.000 title claims abstract description 38
- 238000001308 synthesis method Methods 0.000 title claims abstract description 11
- 230000015572 biosynthetic process Effects 0.000 claims abstract description 69
- 238000003786 synthesis reaction Methods 0.000 claims abstract description 69
- 238000012549 training Methods 0.000 claims abstract description 58
- 230000004927 fusion Effects 0.000 claims abstract description 36
- 230000011218 segmentation Effects 0.000 claims abstract description 12
- 238000000034 method Methods 0.000 claims description 60
- 230000008569 process Effects 0.000 claims description 18
- 230000002194 synthesizing effect Effects 0.000 claims description 16
- 125000004122 cyclic group Chemical group 0.000 claims description 5
- 239000000284 extract Substances 0.000 claims description 3
- 230000000694 effects Effects 0.000 abstract description 7
- 230000006870 function Effects 0.000 description 19
- 238000010586 diagram Methods 0.000 description 12
- 230000003044 adaptive effect Effects 0.000 description 8
- 230000009466 transformation Effects 0.000 description 8
- 238000013507 mapping Methods 0.000 description 6
- 238000004891 communication Methods 0.000 description 5
- 238000013461 design Methods 0.000 description 5
- 239000000203 mixture Substances 0.000 description 5
- 230000008447 perception Effects 0.000 description 5
- 230000001419 dependent effect Effects 0.000 description 4
- 239000013074 reference sample Substances 0.000 description 4
- 239000000126 substance Substances 0.000 description 4
- 238000004422 calculation algorithm Methods 0.000 description 3
- 238000004590 computer program Methods 0.000 description 3
- 238000002474 experimental method Methods 0.000 description 3
- 238000012545 processing Methods 0.000 description 3
- 238000003860 storage Methods 0.000 description 3
- 208000037170 Delayed Emergence from Anesthesia Diseases 0.000 description 2
- 230000006978 adaptation Effects 0.000 description 2
- 230000003042 antagnostic effect Effects 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 2
- 238000006243 chemical reaction Methods 0.000 description 2
- 230000001815 facial effect Effects 0.000 description 2
- 238000011835 investigation Methods 0.000 description 2
- 230000007246 mechanism Effects 0.000 description 2
- 238000013508 migration Methods 0.000 description 2
- 230000005012 migration Effects 0.000 description 2
- 238000003909 pattern recognition Methods 0.000 description 2
- 230000002787 reinforcement Effects 0.000 description 2
- 238000009877 rendering Methods 0.000 description 2
- 238000012546 transfer Methods 0.000 description 2
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 238000012512 characterization method Methods 0.000 description 1
- 239000002131 composite material Substances 0.000 description 1
- 239000012141 concentrate Substances 0.000 description 1
- 238000007796 conventional method Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000012217 deletion Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 238000005315 distribution function Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000007429 general method Methods 0.000 description 1
- PCHJSUWPFVWCPO-UHFFFAOYSA-N gold Chemical compound [Au] PCHJSUWPFVWCPO-UHFFFAOYSA-N 0.000 description 1
- 239000010931 gold Substances 0.000 description 1
- 229910052737 gold Inorganic materials 0.000 description 1
- 230000002401 inhibitory effect Effects 0.000 description 1
- 238000002372 labelling Methods 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 230000029553 photosynthesis Effects 0.000 description 1
- 238000010672 photosynthesis Methods 0.000 description 1
- 230000000644 propagated effect Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000000717 retained effect Effects 0.000 description 1
- 238000012552 review Methods 0.000 description 1
- 230000001953 sensory effect Effects 0.000 description 1
- 238000004088 simulation Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000000153 supplemental effect Effects 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T5/00—Image enhancement or restoration
- G06T5/50—Image enhancement or restoration using two or more images, e.g. averaging or subtraction
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/774—Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/80—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
- G06V10/806—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20212—Image combination
- G06T2207/20221—Image fusion; Image merging
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30196—Human being; Person
- G06T2207/30201—Face
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Computing Systems (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Software Systems (AREA)
- Databases & Information Systems (AREA)
- Medical Informatics (AREA)
- Multimedia (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Molecular Biology (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a face photo-sketch portrait synthesis method based on two-way condition normalization, which comprises the following steps: acquiring a face photo-pixel drawing image pair to be synthesized; inputting the pair of the face photo and the sketch portrait to be synthesized into a trained face photo and sketch portrait synthesis network to obtain a synthesis result; the human face photo-sketch portrait synthesis network comprises an encoder, a generator and a semantic segmentation module, wherein the generator comprises a two-way normalization module, a gate control attention feature fusion module and a decoder; the trained human face photo-sketch image synthesis network is obtained by training a training set through the human face photo-sketch image. The invention improves the quality of the human face photo-sketch image synthesis effect.
Description
Technical Field
The invention belongs to the technical field of artificial intelligence and image processing, and particularly relates to a human face photo-sketch portrait synthesis method based on two-way condition normalization.
Background
With the rapid development of the computer vision field, the human face photo-sketch portrait synthesis technology gradually becomes a current research hotspot. The human face photo-sketch image synthesis is a process for synthesizing a human face photo into a human face sketch image or synthesizing the human face sketch image into a human face photo. Through the conversion process, the photo information and the portrait information which are originally in different domains can be converted into the same domain. And because the manual drawing of the face sketch portrait requires a professional painter and a large amount of time, the computer image processing algorithm for automatically synthesizing the face sketch portrait according to the face photo has high application value.
The early generation methods were mostly based on examples and were unidirectional from photo to sketch. These methods use the idea of nearest neighbor block matching, but the generated result often has a large amount of blurring and artifacts. Zhang et al, in the literature "l.zhang, l.lin, x.wu, s.ding, and l.zhang," End-to-End photo-scattering generation via complete volumetric presentation learning, "in Proceedings of the 5th ACM on International Conference on Multimedia retrieval,2015, pp.627-634." propose an End-to-End full volume Network (FCN) to directly learn the mapping between a picture of a face and a sketch; isola et al, in the literature "P.Isola, J. -Y.Zhu, T.Zhou, and A.A.Efrost," Image-to-Image transfer with conditional adaptation of the network, "in Proceedings of the IEEE connection on computer vision and pattern recognition,2017, pp.1125-1134," propose a general method named pix2pix for Image-to-Image conversion on paired datasets; zhu et al, in the documents "j. -y.zhu, t.park, p.isola, and a.a.efros," unappered image-to-image transformation using a cycle-dependent adaptive network, "in Proceedings of the IEEE international conference on computer vision,2017, pp.2223-2232," originally proposed a CycleGAN network framework that can form a universal mapping from domain a to domain B, learn how to transform between two domains, rather than being tied to a specific picture transformation, that can be trained using Unpaired datasets, with strong adaptability; chen et al, in the references "C.Chen, W.Liu, X.Tan, and K. -Y.K.Wong," Semi-equipped learning for face sketch synthesis in the world, "in an assessment on Computer Vision.Springer,2018, pp.216-231," propose a Semi-supervised learning method for extending photo-pixel pairs by constructing a pseudo-image feature that is appended to a training photo; yu et al, in the documents "J.Yu, X.Xu, F.Gao, S.Shi, M.Wang, D.Tao, and Q.Huang," heated real face photo-skin synthesis video composition-aided gates, "IEEE transactions on cybernetics, vol.51, no.9, pp.4350-4362,2020" propose to concentrate training on a specific site with facial composition information as supplementary input and introducing site loss; nie et al, in the documents "L.Nie, L.Liu, Z.Wu, and W.kang," Unconstrained face mask synthesis via prediction-adaptive network and a new benchmark, "neuro-rendering, vol.494, pp.192-202,2022", propose a face sketch synthesis based on perceptual adaptive networks under both constrained and Unconstrained conditions.
However, the above conventional methods do not encode sufficient texture and spatial information during training, and have some defects in the visual quality of the generated image, such as blurring, artifacts, lack of detail description, and lack of a pixel-pen touch feeling in the generated image.
Disclosure of Invention
In order to solve the above problems in the prior art, the present invention provides a method for synthesizing a face photo-sketch portrait based on two-way condition normalization. The technical problem to be solved by the invention is realized by the following technical scheme:
the embodiment of the invention provides a face photo-sketch portrait synthesis method based on two-way condition normalization, which comprises the following steps:
acquiring a face photo-pixel drawing image pair to be synthesized;
inputting the pair of the face photo-sketch portrait to be synthesized into a trained face photo-sketch portrait synthesis network to obtain a synthesis result;
the human face photo-sketch image synthesis network comprises an encoder, a generator and a semantic segmentation module, wherein the generator comprises a two-way normalization module, a gated attention feature fusion module and a decoder; the trained human face photo-sketch image synthesis network is obtained by training a training set through the human face photo-sketch image; the corresponding training process comprises the following steps:
the encoder encodes the human face photo-sketch portrait to a training set and outputs depth characteristics; the semantic segmentation module extracts semantic labels of sketch portraits in a training set; the two-way normalization module performs reinforcement according to the semantic tags and the depth features and according to a spatial information branch and a texture information branch; the gating attention characteristic fusion module fuses output results of the spatial information branch and the texture information branch; the decoder decodes the fusion result and outputs the synthesis result corresponding to the human face photo-sketch portrait to the training set; constructing a loss function of the human face photo-sketch synthesis network according to the human face photo-sketch pair training set and the corresponding synthesis result; and updating parameters of the human face photo-sketch image synthesis network according to the loss function, and continuing training until an iteration stop condition is met to obtain the trained human face photo-sketch image synthesis network.
In one embodiment of the invention, the encoder comprises a plurality of sequentially connected convolutional layers;
in the encoder, the output corresponding to the last three convolutional layers of the encoder is taken as the depth feature.
In one embodiment of the invention, the two-way normalization module comprises a SPADE Resblock module and an AdaIN Resblock module, wherein,
the SPADE Resblock module performs spatial information enhancement according to semantic labels of sketch pictures in the training set and depth characteristics of face photos in the training set;
and the AdaIN Resblock module performs texture information enhancement according to depth characteristics of sketch pictures and face photos in the training set.
In an embodiment of the present invention, the AdaIN Resblock module includes a plurality of residual AdaIN modules connected in sequence; each residual AdaIN module comprises a basic AdaIN module and a residual module which are sequentially connected.
In one embodiment of the invention, the output of the SPADE Resblock module is represented as:
wherein p represents a photograph of a human face, A i Represents the output of the SPADE Resblock module corresponding to the i (i =1,2,3) th layer depth feature,indicates the depth feature corresponding to the face picture p in the depth feature of the i (i =1,2,3) th layer, c indicates the channel, and (y, x) indicates the channelAt a position on the channel c, the position of the channel c,to representValue at point (y, x) on the c-th channel, M s The representation of the semantic label is carried out,respectively representFrom semantic tag M on the c-th channel s The learned zoom amount and offset amount,respectively representMean and variance on the c-th channel.
In one embodiment of the invention, the output of the AdaIN Resblock module is represented as:
wherein s represents a sketch image, B i Represents the output of the AdaIN Resblock module corresponding to the i (i =1,2,3) th layer depth feature,indicating a depth feature corresponding to the sketch image s in the depth feature of the ith (i =1,2,3) layer,to representOn the c-th channel withThe value at the corresponding same point (y, x),respectively represent F s i Mean and variance on the c-th channel.
In one embodiment of the invention, the gated attention feature fusion module comprises two gating modules, a channel attention module; wherein the content of the first and second substances,
the outputs of the SPADE Resblock module and the AdaIN Resblock module are respectively input into two gating modules;
the outputs of the SPADE Resblock module and the AdaIN Resblock module are superposed and then input into the channel attention module;
the outputs of the two gating modules and the output of the channel attention module are fused.
In one embodiment of the present invention, the fusion of the outputs of the two gating modules and the output of the channel attention module is represented as:
wherein, C i Representing the output of the gated attention feature fusion module corresponding to the i (i =1,2,3) th layer depth feature, CA (-) representing the channel attention function,is represented by A i The corresponding gating function is set to a value that,is shown as B i A corresponding gating function.
In one embodiment of the invention, the decoder is an AFF module based decoder;
in the decoder, after the output of the gating attention feature fusion module is up-sampled to obtain features with the same resolution, an AFF module is used for feature fusion, and decoding and outputting are carried out.
In one embodiment of the present invention, the constructed loss function of the face photo-sketch image synthesis network is represented as:
L full =λ 1 L adversarial +λ 2 L cycle +λ 3 L perceptual ;
wherein L is full A loss function representing the face photo-sketch compositing network; l is adversarial Indicates the generation of antagonistic losses, L cycle Denotes loss of cyclic consistency, L perceptual Representing the perceptual loss, λ 1 、λ 2 、λ 3 Indicating the balance parameters.
The invention has the beneficial effects that:
the invention provides a human face photo-sketch synthesis method based on two-way condition normalization, which provides a human face photo-sketch synthesis network comprising a two-way normalization module and a gated attention characteristic fusion module, wherein the two-way normalization module and a branch of the two-way normalization module fully encode texture and spatial information, so that the learning of the spatial information and the texture information is enhanced, more real edge and detail characteristics can be reserved, the gated attention characteristic fusion module fully screens and fuses useful information of the two branches, the redundancy of the information is avoided to a certain extent, the two paths jointly improve the quality of the human face photo-sketch synthesis effect, and through user voting investigation, the synthesized image user provided by the embodiment of the invention has better subjective feeling and better user experience.
The present invention will be described in further detail with reference to the accompanying drawings and examples.
Drawings
FIG. 1 is a schematic flow chart of a method for synthesizing a face photo-sketch portrait based on two-way condition normalization according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a network for synthesizing a photo-sketch of a human face according to an embodiment of the present invention;
FIG. 3 is a schematic structural diagram of a two-way condition normalization module according to an embodiment of the present invention;
FIG. 4 is a schematic structural diagram of a gated attention feature fusion module according to an embodiment of the present invention;
FIG. 5 is a schematic diagram of a training process of a face photo-sketch synthesis network according to an embodiment of the present invention;
FIG. 6 is a schematic diagram showing the comparison of sketch portrait composition results of the CUFS data set, the CUFSF data set and the WildScut data set respectively according to the method of the present invention and the prior 6 methods;
fig. 7 is a schematic diagram illustrating comparison of the face photo synthesis results of the CUFS data set and the CUFSF data set respectively by the method of the present invention according to the embodiment of the present invention and 3 existing methods;
fig. 8 is a schematic diagram showing a comparison of satisfaction voting results corresponding to user surveys performed by the method of the present invention provided in the embodiment of the present invention and 3 existing methods, respectively, using face photograph synthesis results in the CUFS data set and the CUFSF data set;
fig. 9 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to specific examples, but the embodiments of the present invention are not limited thereto.
In order to improve the quality of the synthesis effect of the face photo-sketch portrait, please refer to fig. 1, an embodiment of the present invention provides a method for synthesizing a face photo-sketch portrait based on two-way condition normalization, which specifically includes the following steps:
and S10, acquiring a picture-pixel drawing image pair of the human face to be synthesized.
Specifically, in the embodiment of the present invention, in the pair of the face photo to be synthesized and the sketch image, if the face photo is synthesized by the sketch image, the reference face photo is randomly selected, and the reference face photo and the sketch image form a pair of the face photo to be synthesized and the sketch image; if the sketch portrait is synthesized by the face photo, a reference sketch portrait is randomly selected, and the face photo and the reference sketch form a face photo-sketch image pair to be synthesized. Therefore, the human face photo-sketch portrait synthesis method can realize the synthesis of sketch portrait and human face photos. The randomly selected reference face picture or reference sketch can provide a large amount of texture and space prior information, so that the quality of the final face picture-sketch synthesis effect is improved.
S20, inputting the pair of the face photo and the sketch portrait to be synthesized into a trained face photo and sketch portrait synthesis network to obtain a synthesis result.
Specifically, referring to fig. 2, the human face photo-sketch portrait compositing network provided in the embodiment of the present invention includes an encoder, a generator and a semantic segmentation module, where the generator includes a two-way normalization module, a gated attention feature fusion module and a decoder. The same network structure is adopted in the process to be synthesized and the training process, and the same network structure is adopted in the process to be synthesized or the training process for realizing the synthesis of sketch pictures and the synthesis of face photos, but the input and the output are different, and the processing processes in the network are similar, so that whether the samples are to be synthesized or training samples is not determined in the following description. For each face photo-pixel drawing pair in each training set, or to be synthesized, the specific design for each part in fig. 2 is as follows:
the encoder comprises a plurality of convolution layers which are connected in sequence, and multi-scale depth features are extracted through a depth convolution network; specifically, in the encoder, the output corresponding to the last three convolutional layers of the encoder is used as the depth feature. Specifically, the method comprises the following steps:
an input sketch image s is propagated in the forward direction through an encoder composed of a series of convolution layers. Outputting depth features of different resolutions at the last three levels of the encoder asWhere i =1,2,3 represents three levels of output characteristics, c i 、h i 、w i Respectively showing the channel number, height and width of the ith layer characteristic diagram.
The input face picture p is subjected to forward propagation through the same encoder (encoder sharing parameter) as that for the sketch image s. Outputting depth features of different resolutions at the last three levels of the encoder asWhere i =1,2,3 represents three levels of output characteristics, c i 、h i 、w i Respectively showing the channel number, height and width of the ith layer characteristic diagram.
Here, in case of a face photograph to be composed by sketching an image s, the face photograph p in fig. 2 is a reference face photograph p selected at random ref If the sketch s is to be composed and is composed by the face picture p, the sketch s in FIG. 2 is a randomly selected reference sketch s ref (ii) a In the same way, if it is trainingIn the training process, and the face photo-sketch synthesis network for synthesizing the face photo is trained through the sketch s, the face photo p in fig. 2 is a random reference face photo p in the training set ref If the face photo-sketch image synthesis network for sketch image synthesis is trained by the face photo p in the training process, the sketch image s in fig. 2 is a random reference face photo s in the training set ref 。
Further, the embodiment of the invention is used for extracting the depth featuresAndand carrying out two-way normalization, wherein the two-way normalization operation comprises two conditional normalization branches in total, and explicitly decomposing the overall mapping into two independent mappings to respectively enhance the learning of spatial information and texture information. Referring to fig. 3, the embodiment of the present invention designs a two-way normalization module for such a design concept, where the two-way normalization module includes two branches formed by a SPADE Resblock module and an AdaIN Resblock module, where,
the first spatial information branch is realized by a SPADE Resblock module, and the SPADE Resblock module reinforces the spatial information according to the semantic labels of the sketch in the training set and the depth characteristics of the face photos in the training set; and the second texture information branch is realized by an AdaIN Resblock module, and the AdaIN Resblock module performs texture information enhancement according to the depth characteristics of the sketch image and the face photo in the training set.
For the first spatial information branch, the SPADE reslock module according to the embodiment of the present invention may directly use the existing SPADE reslock module, which is described in detail in "t.park, m. -y.liu, t. -c.wang, and j. -y.zhu," sensory image synthesis with activity-adaptive modulation, "in Proceedings of the IEEE/CVF con on video and pattern registration, 2019, pp.2337-2346", and is not described herein again, and a network structure of the SPADE reslock module is specifically shown in fig. 3. The inventionExtracting semantic label M of input sketch portrait s by using semantic segmentation model BiSeNet trained on CelebA-HQ database in advance s Then will beAnd M s Input into SPADE Resblock module, semantic tag M is used at channel level s The learned mean and variance pairModulation is performed to enhance the learning of spatial information. The output of the final SPADE Resblock module is expressed as:
wherein p represents a photograph of a human face, A i Represents the output of the SPADE Resblock module corresponding to the i (i =1,2,3) th layer depth feature,indicates the depth feature corresponding to the face picture p in the depth feature of the i (i =1,2,3) th layer, c indicates the channel, and (y, x) indicates the channelAt a position on the channel c, the position of the channel c,to representValue at point (y, x) on the c-th channel, M s The representation of the semantic label is carried out,respectively representFrom semantic tag M on the c-th channel s Amount of zoom inAnd an offset amount of the first and second optical fibers,respectively representMean and variance on the c-th channel. Wherein the content of the first and second substances,
However, due to semantic tag M s Only the semantic information, the use of this branch alone inevitably loses fine texture information. Therefore, the embodiment of the invention also designs another branch, and the AdaIN Resblock module realizes the enhancement of the texture information.
For the second texture information branch, the embodiment of the present invention is obtained by improving the existing AdaIN module. The conventional AdaIN module is described in detail in "X.Huang and S.Belongie", "Archerary style transfer in real-time with adaptive instance knowledge", "in Proceedings of the IEEE international conference on computer vision,2017, pp.1501-1510", and will not be described herein again. Since the AdaIN module is a real-time style migration module, by introducing this module, by adjusting the mean and variance, it is possible to transform the AdaIN module into a real-time style migration moduleTo the style information ofIn (3), the characterization capability of the texture is enhanced.
However, the direct use of the existing AdaIN module is not suitable for this task due to the large modulation gap between the face picture and the sketch picture and the insufficient adaptive capability of the parameterless operation. Therefore, in the embodiment of the present invention, the residual error block is introduced to improve the learning ability, and please refer to fig. 3 again, the AdaIN Resblock module in the embodiment of the present invention includes a plurality of residual error AdaIN modules connected in sequence; each residual AdaIN module comprises a basic AdaIN module and a residual module which are sequentially connected. For example, the AdaIN ResBlock module in the embodiment of the present invention is composed of 9 residual AdaIN modules connected in sequence, each residual AdaIN module is composed of an existing AdaIN module and a residual block, and the residual block added behind the existing AdaIN module can improve the receptive field and enhance the adaptive capability of the branch. The output of the final AdaIN Resblock module is represented as:
wherein s represents a sketch image, B i Represents the output of the AdaIN Resblock module corresponding to the i (i =1,2,3) th layer depth feature,indicates the depth characteristic corresponding to the sketch portrait s in the depth characteristic of the i (i =1,2,3) th layer,to representOn the c-th channel withCorrespond toThe value at the same point (y, x),respectively representMean and variance on the c-th channel. Wherein the content of the first and second substances,
wherein, the formula (5) and the formula (6) are respectively utilizedThe corresponding total width and total height are calculated. Here, the first and second liquid crystal display panels are,the corresponding total width and total height are respectively taken as H i 、W i And is prepared byThe corresponding overall width and overall height are the same.
Further, the embodiment of the present invention provides a gated channel attention fusion module, which integrates a gating mechanism and a channel attention mechanism, and fuses two information branches, allowing a network to selectively amplify useful feature channels based on global information, and suppressing useless feature channels. Referring to fig. 4, the gated attention feature fusion module according to the embodiment of the present invention includes two gating modules and a channel attention module; wherein the content of the first and second substances,
the outputs of the SPADE Resblock module and the AdaIN Resblock module are respectively input into two gating modules; the outputs of the SPADE Resblock module and the AdaIN Resblock module are superposed and then input into the channel attention module; the outputs of the two gating modules and the output of the channel attention module are fused, and the fusion of the outputs of the two gating modules and the output of the channel attention module is specifically expressed as:
wherein A is i And B i Denotes the output of the SPADE Resblock module and the output of the AdaIN Resblock module corresponding to the i-th (i =1,2,3) layer depth feature, respectively, ". Denotes element-by-element multiplication, which is shown in FIG. 3 byDenotes the addition element by element, which is denoted by "+" in FIG. 3Is represented by C i Represents the output of the gated attention feature fusion module corresponding to the i-th (i =1,2,3) layer depth feature, CA (-) represents the channel attention function,is represented by A i The corresponding gating function is set to be,is represented by B i The corresponding gating function. Gating module input A in FIG. 3 i Corresponding outputGating module input B i Corresponding outputIn FIG. 3, α is CA (A) i +B i )。
where Conv denotes a convolution operation of 1 × 1, and σ denotes a sigmoid function.
The channel attention function CA (-) is expressed as:
CA(A i +B i )=σ(Conv 2 (δ(Conv 1 (A i +B i )))) (10)
wherein, conv 1 And Conv 2 Each represents a 1 × 1 convolution operation, and δ represents a modified linear unit ReLU.
The gating function of the embodiment of the invention can judge the importance of each feature vector in the feature map, effectively control information flow, and clearly model the interdependency relationship between channels by channel attention, and globally filter the information flow, thereby realizing that a network is allowed to selectively amplify useful feature channels based on global information, and inhibiting useless feature channels.
Further, referring to fig. 2 again, the decoder according to the embodiment of the present invention is based on an AFF module; in a decoder, after the output of the gating attention feature fusion module is up-sampled to obtain features with the same resolution, the AFF module is used for feature fusion, and decoding and outputting are carried out. Specifically, the method comprises the following steps:
the outputs of the three-level gated attention feature fusion module are integrated using an Attention Feature Fusion (AFF) based decoder to generate the final composite result. AFF moduleIs a conventional attention feature fusion module, which is described in detail in "Y.Dai, F.Gieseke, S.Oehmcke, Y.Wu, and K.Barnard," Attentional feature fusion, "in Proceedings of the IEEE/CVF Window Conference on Applications of Computer Vision,2021, pp.3560-3569", and will not be described herein again. The embodiment of the invention samples three characteristic graphs obtained by a gating attention characteristic fusion module to the same resolution ratio, uses an AFF module to perform characteristic fusion pairwise, and decodes to generate a final synthetic result, wherein the synthetic result is a face photoOr sketch of an image
Furthermore, the trained human face photo-sketch portrait synthesis network adopted in the synthesis process of the embodiment of the invention is obtained by training a training set through the human face photo-sketch portrait; the method comprises the steps of selecting M images from a face photo data set to form a face photo training set, selecting M face sketch images corresponding to the M face photos from a face sketch image data set to form an M-to-face photo-sketch image pair together, and using the M-to-face photo-sketch image pair as a face photo-sketch image pair training set.
Referring to fig. 5, the detailed training process includes the following steps:
s201, an encoder encodes a face photo-sketch portrait to a training set and outputs depth characteristics;
s202, a semantic segmentation module extracts semantic labels of sketch portraits in a training set;
s203, the two-way normalization module performs reinforcement according to the semantic tags and the depth features and the spatial information branch and the texture information branch;
s204, the gated attention feature fusion module fuses output results of the spatial information branch and the texture information branch;
s205, decoding the fusion result by a decoder to output a synthetic result corresponding to the training set by the face photo-sketch portrait;
the implementation of S201-S205 is described with reference to the detailed design of each part above.
S206, constructing a loss function of the human face photo-sketch synthesis network according to the human face photo-sketch pair training set and the corresponding synthesis result.
Specifically, the loss function of the human face photo-sketch image synthesis network constructed by the embodiment of the invention is expressed as follows:
L full =λ 1 L adversarial +λ 2 L cycle +λ 3 L perceptual (11)
wherein L is full A loss function representing a face photo-sketch portrait compositing network; l is adversarial Indicates the creation of a countermeasure loss, L cycle Denotes loss of cyclic consistency, L perceptual Representing the perceptual loss, λ 1 、λ 2 、λ 3 Expressing the balance parameters to balance different loss values, in the embodiment of the invention, setting lambda 1 =1,λ 2 =0.5,λ 3 =0.5. It can be seen that the loss function of the embodiment of the present invention is composed of three parts:
generating the confrontation loss: generating the antagonistic loss aims to guide the generator to obtain a more realistic generated result. When the generation of the damage to the resistance is calculated, a discriminator is connected after the generation, and the discriminator in the CycleGAN in the structural references of J. -Y.Zhu, T.park, P.Isola, and A.A.Efrons, "Unperared image-to-image transformation using cycle-dependent adaptive networks," in Proceedings of the IEEE international conference on computer vision,2017, pp.2223-2232 "is trained. The resulting opposing losses for the generator and the arbiter are expressed as:
L adversarial =E p~Pdata(p) [(D(p)) 2 ]+E s~Pdata(s) [(1-D(G(s,p ref ,M(s)))) 2 ] (12)
wherein E (-) represents the expected value of the distribution function, E p~Pdata(p) (. E) represents the expectation of correspondence of the face photograph p in the face photograph dataset, E s~Pdata(s) (. Table)Showing the corresponding expectation of sketch portrait s in sketch portrait data set, D (-) represents a discriminator, D (p) represents a discrimination result obtained by inputting face picture p into the discriminator, M (-) represents a face semantic segmentation network BiSeNet, M(s) represents a semantic label result obtained by inputting sketch portrait s into the semantic segmentation network, G (-) represents a generator for generating face picture by sketch portrait s, G (s, p) represents a semantic label result obtained by inputting sketch portrait s into the semantic segmentation network ref M (s)) represents a sketch image s and a reference sample picture p ref The semantic label M(s) is input into the synthesis result obtained in the generator, i.e. the photo generated in FIG. 2D(G(s,p ref M (s))) represents the generation of a photographThe discrimination result obtained in the discriminator is input.
Loss of cycle consistency: given a sketch S and its face label representation M (S), after the cyclic transformation S → P and P → S of the S and P domains, the sketch S should be transformed back to the original domain, with the corresponding cyclic consistency penalty expressed as:
L cycle =E s~Pdata(s) [||F(G(s,p ref ,M(s)),s ref ,M(G(s,p ref ,M(s))))-s|| 1 ] (13)
wherein s is ref 、p ref Reference sample images, G (S, P), representing S and P domains, respectively ref M (s)) means that the generator generates a photographM(G(s,p ref M (s)))) indicates that a photograph will be generatedInputting semantic labeling results obtained from a semantic segmentation network, F (-) represents a generator for generating sketch portrait by face photo, F (G (s, p) ref ,M(s)),s ref ,M(G(s,p ref M (s)))) represents G (s, p) ref Production of photographs of M (s))Reference sample sketch s ref Semantic tag M (G (s, p) ref M (s))) input generator, F (-) has the same network structure as G (-) to | Bai | | Liu 1 The L1 norm is obtained.
Loss of perception: the perception loss is introduced, the generated photos can be similar to the real photos on the semantic feature level, the perception loss is designed through a pre-trained VGG-19 model, the real photos and the generated photos are input into the pre-trained VGG-19 model, and the corresponding perception loss is expressed as follows:
wherein phi is j (. C) represents the output characteristic diagram of the j layer in the pre-trained VGG-19 model j 、H j And W j Respectively representing the channel number, the height and the width of the output characteristic diagram of the j-th layer.
Further, updating network parameters of the human face photo-sketch synthesis network according to the loss function, and continuing training until an iteration stop condition is met, then:
s207, outputting the current human face photo-sketch image synthetic network as a trained human face photo-sketch image synthetic network.
In the whole iterative calculation process, parameters can be updated by using a gradient descent algorithm until the human face photo-sketch portrait synthetic network model converges, but the method is not limited to the gradient descent algorithm.
It should be noted that, in the embodiment of the present invention, regardless of whether the sketch image is synthesized or the human face photograph is synthesized, the training process may be adopted to train the corresponding human face photograph-sketch image synthesis network model, and the training process selects the corresponding reference human face photograph or reference sketch image for training, so that a large amount of texture and spatial prior information may be provided, and the human face photograph-sketch image synthesis network model obtained by training may better synthesize the sketch image or the human face photograph.
In order to verify the effectiveness of the method for synthesizing the face photo-sketch portrait based on two-way condition normalization provided by the embodiment of the invention, the following experiment is performed for verification.
1. Simulation conditions
The embodiment of the invention uses a Pythrch frame to simulate the CPU which is an Inter (R) Xeon (R) Gold 6226R 2.90GHz CPU, an NVIDIA GeForce RTX 3090GPU and an Ubuntu 16.04 operating system. Training is performed on the CUFS dataset, CUFSF dataset, and the WildSketch dataset, respectively.
The methods compared in the experiment were as follows:
one is FCN, referred to as "L.Zhang, L.Lin, X.Wu, S.Ding, and L.Zhang," End-to-End photo-deletion generation via complete collaborative presentation learning, "in Proceedings of the 5th ACM on International Conference on Multimedia retrieval,2015, pp.627-634." which proposes an End-to-End full convolution network to directly learn the mapping of a picture of a human face to a portrait. However, the network is too shallow to dig out deep semantic information.
Second is pix2pix, referenced as "p.isola, j. -y.zhu, t.zhou, and a.a.efrost," Image-to-Image transformation with conditional adaptation network, "in Proceedings of the IEEE connection on computer vision and pattern recognition,2017, pp.1125-1134," which uses the condition GAN (cGAN) as a unified solution for Image-to-Image transformation on paired datasets.
Thirdly, cycleGAN, referenced as "j. -y.zhu, t.park, p.isola, and a.a.efros," unknown image-to-image transformation using cycle-dependent adaptive networks, "in Proceedings of the IEEE international conference on computer vision,2017, pp.2223-2232", which proposes a CycleGAN network framework that can form a universal mapping from domain a to domain B, learn how to transform between two domains, rather than being tied to a specific picture transformation, that can be trained using Unpaired datasets, with strong adaptability.
Fourth, a field human face sketch synthesis method based on Semi-supervised learning, which is denoted as Wild in the experiment, and the reference is "c.chen, w.liu, x.tan, and k. -y.k.wong," Semi-supervised learning for face sketch synthesis in the world, "in assistant Conference on Computer vision, spring, 2018, pp.216-231", and the method extends photo-picture pairs by constructing pseudo-picture features of additional training photos.
Fifth, SCAGAN, referenced as "j.yu, x.xu, f.gao, s.shi, m.wang, d.tao, and q.huang," heated responsive face photo-skin synthesis vision composition-aided gates, "IEEE transactions on cybernetics, vol.51, no.9, pp.4350-4362,2020", proposes using facial composition information as a supplemental input and introducing site loss to focus training on a specific site.
Sixthly, PANET, the reference is "L.Nie, L.Liu, Z.Wu, and W.kang," Unconstrained face mask synthesis via prediction-adaptive network and a new benchmark, "neuro-rendering, vol.494, pp.192-202,2022", and the method provides face sketch synthesis based on perception adaptive network under the condition of constraint and no constraint.
2. Emulated content
A partial human face photo-sketch image pair is selected from a CUFS data set, a CUFSF data set and a WildSkey data set to be used as a human face photo-sketch image pair to be synthesized, and the result of sketch image synthesis of the method and 6 existing methods is shown in FIG. 6, specifically: FIG. 6, line 1, is a sketch image composition result for the cuhk sub data set in the CUFS data set, line 2, line 3, line xm2vts sub data set in the CUFS data set, line 4, line 5, is a sketch image composition result for the WildSketch data set; line 1 of fig. 7 is the result of synthesizing a face photo of the cuhk subdata set in the CUFS dataset, line 2 is the result of synthesizing a face photo of the ar subdata set in the CUFS dataset, line 3 is the result of synthesizing a face photo of the xm2vts subdata set in the CUFS dataset, and line 4 is the result of synthesizing a face photo of the CUFSF dataset. In fig. 6 and 7, the rightmost group Truth represents a synthetic image corresponding to the leftmost Test Photo in the training set, and the closer the synthetic result of all the methods is to the group Truth, the better the synthetic effect of the method is; in FIGS. 6 and 7, the 2 nd column on the right is the synthesis result of the method used in the present invention.
As can be seen from fig. 6 and 7, the synthesis result of the method adopted in the embodiment of the present invention well restores texture information and structure information, retains more real edge and detail features, and better overcomes the problems of blurring and artifacts.
Meanwhile, in the embodiment of the present invention, 3 existing methods and the method used in the present invention (denoted as DCNP in fig. 8) are selected to perform user survey on the face photograph synthesis results on the CUFS data set and the WildSketch data set, 14 groups of questionnaires are set in total (7 groups on the CUFS data set and 7 groups on the WildSketch data set), and the user selects the most satisfied result from the questionnaires to vote. 700 votes from 50 users are collected, and the specific proportion of the votes in each method is shown in FIG. 8.
In summary, the method for synthesizing a face photo-sketch map based on two-way conditional normalization provided by the embodiment of the invention provides a face photo-sketch map synthesis network comprising a two-way normalization module and a gated attention feature fusion module, wherein a branch of the two-way normalization module fully encodes texture and spatial information, so that the learning of the spatial information and the texture information is enhanced, more real edge and detail features can be retained, the gated attention feature fusion module fully screens and fuses useful information of the two branches, the redundancy of the information is avoided to a certain extent, the two paths jointly improve the quality of a face photo-sketch map synthesis effect, and through user investigation, the user subjective feeling of the synthesized image provided by the embodiment of the invention is better, and the synthesized image has better user experience.
Meanwhile, in the embodiment of the invention, reference sample images (reference face photos p) are additionally introduced in the processes of synthesis and training ref Or with reference to a sketch image s ref ) To provide a large amount of texture and spatial prior information, furtherThe quality of the synthesis effect of the human face photo-sketch portrait is improved.
Referring to fig. 9, an embodiment of the present invention provides an electronic device, which includes a processor 901, a communication interface 902, a memory 903, and a communication bus 904, where the processor 901, the communication interface 902, and the memory 903 complete mutual communication through the communication bus 904;
a memory 903 for storing computer programs;
the processor 901 is configured to implement the steps of the above-mentioned method for synthesizing a human face photo-sketch image based on two-way condition normalization when executing the program stored in the memory 903.
The embodiment of the invention provides a computer-readable storage medium, wherein a computer program is stored in the computer-readable storage medium, and when the computer program is executed by a processor, the steps of the human face photo-sketch portrait synthesis method based on two-way condition normalization are realized.
For the electronic device/storage medium embodiment, since it is basically similar to the method embodiment, the description is simple, and for the relevant points, refer to the partial description of the method embodiment.
In the description of the present invention, it is to be understood that the terms "first", "second" and the like are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implying any number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include one or more of that feature. In the description of the present invention, "a plurality" means two or more unless specifically defined otherwise.
While the present application has been described in connection with various embodiments, other variations to the disclosed embodiments can be understood and effected by those skilled in the art in practicing the claimed application, from a review of the drawings, the disclosure, and the appended claims. In the claims, the word "comprising" does not exclude other elements or steps, and the word "a" or "an" does not exclude a plurality. A single processor or other unit may fulfill the functions of several items recited in the claims. The mere fact that certain measures are recited in mutually different dependent claims does not indicate that a combination of these measures cannot be used to advantage.
The foregoing is a more detailed description of the invention in connection with specific preferred embodiments and it is not intended that the invention be limited to these specific details. For those skilled in the art to which the invention pertains, several simple deductions or substitutions can be made without departing from the spirit of the invention, and all shall be considered as belonging to the protection scope of the invention.
Claims (10)
1. A face photo-sketch portrait synthesis method based on two-way condition normalization is characterized by comprising the following steps:
acquiring a face photo-pixel drawing image pair to be synthesized;
inputting the pair of the face photo-sketch portrait to be synthesized into a trained face photo-sketch portrait synthesis network to obtain a synthesis result;
the human face photo-sketch portrait synthesis network comprises an encoder, a generator and a semantic segmentation module, wherein the generator comprises a double-path normalization module, a gated attention feature fusion module and a decoder; the trained human face photo-sketch portrait synthesis network is obtained by training a training set through the human face photo-sketch portrait; the corresponding training process comprises the following steps:
the encoder encodes the face photo-sketch portrait to a training set and outputs depth characteristics; the semantic segmentation module extracts semantic labels of sketch portraits in a training set; the two-way normalization module reinforces according to the semantic tag and the depth characteristic according to a spatial information branch and a texture information branch; the gating attention characteristic fusion module fuses output results of the spatial information branch and the texture information branch; the decoder decodes the fusion result and outputs the synthesis result corresponding to the human face photo-sketch portrait to the training set; constructing a loss function of the human face photo-sketch synthesis network according to the human face photo-sketch pair training set and the corresponding synthesis result; and updating parameters of the human face photo-sketch image synthesis network according to the loss function, and continuing training until an iteration stop condition is met to obtain the trained human face photo-sketch image synthesis network.
2. The method of claim 1, wherein the encoder comprises a plurality of sequentially connected convolution layers;
in the encoder, the output corresponding to the last three convolutional layers of the encoder is taken as the depth feature.
3. The method of two-way conditional normalization-based human face photo-sketch portrait compositing of claim 1, wherein said two-way normalization module comprises a SPADE Resblock module and an AdaIN Resblock module, wherein,
the SPADE Resblock module performs spatial information enhancement according to semantic labels of sketch pictures in the training set and depth characteristics of face photos in the training set;
and the AdaIN Resblock module performs texture information enhancement according to the depth characteristics of the sketch image and the face photo in the training set.
4. The method of claim 3, wherein the AdaIN reblock module comprises a plurality of residual AdaIN modules connected in sequence; each residual AdaIN module comprises a basic AdaIN module and a residual module which are sequentially connected.
5. The method of claim 3, wherein the output of the SPADE Resblock module is represented as:
wherein p represents a photograph of a human face, A i Represents the output of the SPADE Resblock module corresponding to the i (i =1,2,3) th layer depth feature,represents the depth feature corresponding to the face picture p in the depth feature of the ith (i =1,2,3), c represents the channel, and (y, x) represents the channelAt a position on the channel c, the position of the channel c,to representValue at point (y, x) on the c-th channel, M s The representation of the semantic label is carried out,respectively representFrom semantic tag M on the c-th channel s The zoom amount and the offset amount learned in (1),respectively representMean and variance on the c-th channel.
6. The method of two-way conditional normalization-based face photo-sketch synthesis of claim 5, wherein the output of the AdaIN reblock module is represented as:
wherein s represents a sketch image, B i Represents the output of the AdaIN Resblock module corresponding to the i (i =1,2,3) th layer depth feature,indicating a depth feature corresponding to the sketch image s in the depth feature of the ith (i =1,2,3) layer,representOn the c-th channel withThe value at the corresponding same point (y, x),respectively representMean and variance on the c-th channel.
7. The method of claim 6, wherein the gated attention feature fusion module comprises two gated modules, a channel attention module; wherein, the first and the second end of the pipe are connected with each other,
the outputs of the SPADE Resblock module and the AdaIN Resblock module are respectively input into two gating modules;
the outputs of the SPADE Resblock module and the AdaIN Resblock module are superposed and then input into the channel attention module;
the outputs of the two gating modules and the output of the channel attention module are fused.
8. The method of claim 7, wherein the output of two gating modules and the output of the channel attention module are fused as:
wherein, C i Representing the output of the gated attention feature fusion module corresponding to the i (i =1,2,3) th layer depth feature, CA (-) representing the channel attention function,is represented by A i The corresponding gating function is set to a value that,is shown as B i The corresponding gating function.
9. The method for two-way normalization based human face photo-sketch portrait synthesis method as claimed in claim 1, wherein said decoder is an AFF module based decoder;
in the decoder, after the output of the gated attention feature fusion module is up-sampled to obtain features with the same resolution, an AFF module is used for feature fusion, and decoding and outputting are performed.
10. The method for synthesizing a human face photo-sketch image based on two-way condition normalization according to claim 1, wherein the constructed loss function of the human face photo-sketch image synthesis network is represented as follows:
L full =λ 1 L adversarial +λ 2 L cycle +λ 3 L perceptual ;
wherein L is full Represents the aboveLoss function of human face photo-sketch portrait synthesis network; l is adversarial Indicates the creation of a countermeasure loss, L cycle Represents a loss of cyclic consistency, L perceptual Representing the perceptual loss, λ 1 、λ 2 、λ 3 Indicating the balance parameters.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210885729.6A CN115375596A (en) | 2022-07-26 | 2022-07-26 | Face photo-sketch portrait synthesis method based on two-way condition normalization |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210885729.6A CN115375596A (en) | 2022-07-26 | 2022-07-26 | Face photo-sketch portrait synthesis method based on two-way condition normalization |
Publications (1)
Publication Number | Publication Date |
---|---|
CN115375596A true CN115375596A (en) | 2022-11-22 |
Family
ID=84063410
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210885729.6A Pending CN115375596A (en) | 2022-07-26 | 2022-07-26 | Face photo-sketch portrait synthesis method based on two-way condition normalization |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115375596A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117392247A (en) * | 2023-09-25 | 2024-01-12 | 清华大学 | Image video semantic coding and decoding method and device based on sketch |
-
2022
- 2022-07-26 CN CN202210885729.6A patent/CN115375596A/en active Pending
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117392247A (en) * | 2023-09-25 | 2024-01-12 | 清华大学 | Image video semantic coding and decoding method and device based on sketch |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10593021B1 (en) | Motion deblurring using neural network architectures | |
Li et al. | Zero-shot image dehazing | |
Zhu et al. | One shot face swapping on megapixels | |
Xia et al. | Gan inversion: A survey | |
CN111291212B (en) | Zero sample sketch image retrieval method and system based on graph convolution neural network | |
CN111047548B (en) | Attitude transformation data processing method and device, computer equipment and storage medium | |
CN111079601A (en) | Video content description method, system and device based on multi-mode attention mechanism | |
CN110728219A (en) | 3D face generation method based on multi-column multi-scale graph convolution neural network | |
CN109410135B (en) | Anti-learning image defogging and fogging method | |
CN113780149A (en) | Method for efficiently extracting building target of remote sensing image based on attention mechanism | |
CN112837215B (en) | Image shape transformation method based on generation countermeasure network | |
CN114783034A (en) | Facial expression recognition method based on fusion of local sensitive features and global features | |
CN114339409A (en) | Video processing method, video processing device, computer equipment and storage medium | |
CN111986105A (en) | Video time sequence consistency enhancing method based on time domain denoising mask | |
Hu et al. | Dear-gan: Degradation-aware face restoration with gan prior | |
CN113129234A (en) | Incomplete image fine repairing method based on intra-field and extra-field feature fusion | |
Chen et al. | MICU: Image super-resolution via multi-level information compensation and U-net | |
CN115375596A (en) | Face photo-sketch portrait synthesis method based on two-way condition normalization | |
Zhou et al. | A superior image inpainting scheme using Transformer-based self-supervised attention GAN model | |
CN116523985B (en) | Structure and texture feature guided double-encoder image restoration method | |
Zhou et al. | Cloud removal for optical remote sensing imagery using distortion coding network combined with compound loss functions | |
CN112686830A (en) | Super-resolution method of single depth map based on image decomposition | |
Liu et al. | Diverse Hyperspectral Remote Sensing Image Synthesis With Diffusion Models | |
CN116975347A (en) | Image generation model training method and related device | |
Huang et al. | DSRD: deep sparse representation with learnable dictionary for remotely sensed image denoising |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |