CN110659727B - Sketch-based image generation method - Google Patents

Sketch-based image generation method Download PDF

Info

Publication number
CN110659727B
CN110659727B CN201910909387.5A CN201910909387A CN110659727B CN 110659727 B CN110659727 B CN 110659727B CN 201910909387 A CN201910909387 A CN 201910909387A CN 110659727 B CN110659727 B CN 110659727B
Authority
CN
China
Prior art keywords
sketch
training
module
map
training sample
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910909387.5A
Other languages
Chinese (zh)
Other versions
CN110659727A (en
Inventor
陈雪锦
李宇航
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Science and Technology of China USTC
Original Assignee
University of Science and Technology of China USTC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Science and Technology of China USTC filed Critical University of Science and Technology of China USTC
Priority to CN201910909387.5A priority Critical patent/CN110659727B/en
Publication of CN110659727A publication Critical patent/CN110659727A/en
Application granted granted Critical
Publication of CN110659727B publication Critical patent/CN110659727B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/04Context-preserving transformations, e.g. by using an importance map
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)

Abstract

A sketch-based image generation method, comprising: s1, constructing a confrontation generation model, wherein the confrontation generation model comprises a generator and a discriminator, the generator comprises a down-sampling mask residual error module, a residual error module, an up-sampling mask residual error module and a condition self-attention module, and the discriminator comprises more than one sub-discrimination network with different depths; s2, inputting one or more than one training sample sketch into the generator to generate a generated image corresponding to each training sample sketch; s3, inputting the real image and the generated image corresponding to the draft of the training sample into a discriminator to calculate a loss function, and calculating a training target function according to the loss function; s4, training the parameters of the generator and the discriminator according to the training target function to minimize the training target function; and S5, generating a target generation image corresponding to the target sketch by using the trained generator. The method ensures that the generated face image has real local texture and complete face structure.

Description

Sketch-based image generation method
Technical Field
The present disclosure relates to the field of image processing technologies, and in particular, to an image generation method based on a sketch.
Background
Sketch-based image generation is a special case of image-to-image translation, and plays an important role in computer graphics. The early draft-based image generation algorithm adopts a search fusion method, firstly searches image blocks related to a draft from a large-scale image database, and then fuses the image blocks. With the rapid development of deep learning in recent years, generation countermeasure networks are increasingly applied in image-to-image translation. Isola et al propose a general model for image-to-image translation for generating a countermeasure network based on supervised training conditions, which is only for dense images and is unsatisfactory for generating sparse images such as sketches for input images.
Because the sketch has the characteristics of diversity, abstraction, sparsity and the like, the face image generation based on the sketch faces great challenges. The existing method still cannot generate an ideal face image, and particularly when the input sketch does not contain complete face structures (eyes, nose, mouth and the like), the generated face image also tends to correspondingly lack part of the face structures. The network model of the existing method is based on the convolutional layer, the perception field of the convolutional layer is very limited, the global perception field can be achieved only through the multilayer convolutional layer, but the global structure information cannot be easily learned by the network due to the multilayer convolutional layer; in addition, the reality discriminator used by the generation countermeasure network of the existing method is used for locally discriminating the image block, only the local reality of the generated image can be ensured, and the structural integrity of the human face image cannot be directly discriminated.
Disclosure of Invention
Technical problem to be solved
The present disclosure has been made in view of the above problems, and provides a sketch-based image generation method that solves the above technical problems by introducing a self-attentiveness mechanism and a multi-scale discriminator in a condition generation countermeasure network.
(II) technical scheme
The present disclosure provides a sketch-based image generation method, including: s1, constructing a confrontation generation model, wherein the confrontation generation model comprises a generator and a discriminator, the generator comprises a down-sampling mask residual error module, a residual error module, an up-sampling mask residual error module and a condition self-attention module, and the discriminator comprises more than one sub-discrimination network with different depths; s2, inputting one or more than one training sample sketch into the generator to generate a generated image corresponding to each training sample sketch; s3, inputting the training sample sketch, the corresponding real image and the generated image into the discriminator to calculate a loss function, and calculating a training target function according to the loss function; s4, training the parameters of the generator and the discriminator according to the training objective function so as to reduce the training objective function to the minimum; and S5, generating a target generation image corresponding to the target sketch by using the trained generator.
Optionally, the number of the down-sampling mask residual modules and the up-sampling mask residual modules is N, the down-sampling mask residual module, the first N-1 up-sampling mask residual modules, the conditional self-attention module, and the last up-sampling mask residual module are sequentially connected, outputs of the first to N-1 down-sampling mask residual modules are further respectively connected to the nth to second up-sampling mask residual modules, and the step S2 includes: and inputting the training sample sketch into a first down-sampling mask residual error module, and outputting a generated image corresponding to each training sample sketch by a last up-sampling mask residual error module after the processing of the down-sampling mask residual error module, the first N-1 up-sampling mask residual error modules, the conditional self-attention module and the last up-sampling mask residual error module in sequence.
Optionally, the processing of the conditional self-attention module comprises: connecting the received input feature graph with the training sample sketch in series to obtain condition features; respectively mapping the condition characteristics according to three mapping matrixes to obtain three mapping characteristic graphs, wherein the mapping matrixes are composed of trainable parameters; processing the three mapping characteristic graphs to obtain a response graph; and adding the response graph and the input feature graph to obtain an output feature graph.
Optionally, the mapping feature maps are respectively: f ([ a, x)])=Wf[a,x]、g([a,x])=Wg[a,x]、h([a,x])=Wh[a,x]Wherein a is the input feature map, x is a zoom map of the training sample sketch with equal resolution according to the feature map, and f ([ a, x [ ])])、g([a,x])、h([a,x]) Are the three mapping feature maps, W, respectivelyf、Wg、WhIs the mapping matrix, a ∈ RC×H×W,x∈R1×H×W,Wf∈RD×(C+1),Wg∈RD×(C+1),Wh∈RC×(C+11)And D ═ C/8, C, H, W are the number of channels, height, and width of the input feature map, respectively.
Optionally, the processing the three mapping feature maps to obtain a response map includes: processing the mapping characteristic map f ([ a, x ]) and the mapping characteristic map g ([ a, x ]) to obtain an attention map; the attention map is processed with a map signature h ([ a, x ]) to obtain the response map.
Optionally, the response map is: r ═ r (r)1,r2,……,rN)∈RC×NWherein r is the response map, N ═ H × W,
Figure BDA0002213593430000031
Figure BDA0002213593430000032
si,j=f([a,x])Tg([a,x])。
optionally, the output feature map is: oj=γri+ajWherein, gamma is a trainable weight parameter with an initial value of 0, ojIs the jth pixel, r, of the output feature mapjIs the jth pixel of the response map, ajThe jth pixel of the input feature map received from the attention module for the condition.
Optionally, the discriminator is composed of more than one convolution layer, the number of convolution layers of different sub-discrimination networks is different, and the hyper-parameters of each sub-discrimination network are the same.
Optionally, the loss function comprises a penalty loss function Ladv(G; D), reconstruction loss function LL1(G) Sum-feature matching loss function Lfm(G) Wherein:
Figure BDA0002213593430000033
Figure BDA0002213593430000034
Figure BDA0002213593430000035
the training objective function is
Figure BDA0002213593430000036
Wherein x is the sketch of the training sample, y is the real image, NDG (x) is the number of the sub-discriminant networks, and G (x) is a generated image corresponding to the draft of the training sample, Dk(x, y) is the output obtained by the kth sub-discrimination network according to the draft of the training sample and the real image, Dk(xG (x)) is the output obtained by the kth sub-discrimination network according to the training sample sketch and the generated image,
Figure BDA0002213593430000041
to expect the (x, y) data distribution,
Figure BDA0002213593430000042
to expect the data distribution of x, Q is the set of feature layers of the selected sub-discriminant network, NQFor each sub-decision network element number, nqThe number of elements of the q-th characteristic layer,
Figure BDA0002213593430000043
for the kth sub-discrimination network from the intermediate output feature map of the qth layer of the generated image,
Figure BDA0002213593430000044
outputting a characteristic diagram for the kth sub-discrimination network according to the middle of the q layer of the real image, wherein lambda is a reconstruction loss function LL1(G) Mu is the feature matching loss function Lfm(G) The weight of (c).
Optionally, the step S4 includes: training parameters except a self-attention module in the confrontation generation model according to the training objective function; fixing parameters except a self-attention module in the confrontation generation model, and training the parameters of the self-attention module; parameters in the challenge-generating model are simultaneously trained to minimize the target training function.
(III) advantageous effects
The sketch-based image generation method provided by the disclosure has the following beneficial effects:
(1) by introducing a conditional self-attention module into a conditional generation countermeasure network, long-distance dependence of images can be directly learned;
(2) by introducing a multi-scale discriminator into the condition generation countermeasure network, the sense of reality of the generated image can be discriminated on different scales so as to ensure that the local texture and the detail of the generated image have the sense of reality and ensure that the structure of the generated image is complete when the sketch is incomplete;
(3) the sub-judgment networks share the parameters of the previous layers, so that the number of the parameters of the networks can be reduced, and network convergence is facilitated.
Drawings
FIG. 1 schematically illustrates a flow chart of a sketch-based image generation method provided by an embodiment of the present disclosure;
fig. 2A schematically illustrates a structural diagram of a generator for constructing a confrontation generation model in the sketch-based image generation method provided by the embodiment of the disclosure;
fig. 2B schematically illustrates a structural diagram of a discriminator for constructing a confrontation generating model in the sketch-based image generating method provided in the embodiment of the present disclosure;
FIG. 3 schematically illustrates a schematic diagram of a conditional self-attention module in the generator shown in FIG. 2A.
Detailed Description
For the purpose of promoting a better understanding of the objects, aspects and advantages of the present disclosure, reference is made to the following detailed description taken in conjunction with the accompanying drawings.
The embodiment provides a sketch-based image generation method, and with reference to fig. 1, with reference to fig. 2A, fig. 2B, and fig. 3, the method shown in fig. 1 is described in detail, and the method includes the following operations.
And S1, constructing a confrontation generation model, wherein the confrontation generation model comprises a generator and a discriminator, the generator comprises a down-sampling mask residual error module, a residual error module, an up-sampling mask residual error module and a condition self-attention module, and the discriminator comprises more than one sub-discrimination network with different depths.
In this embodiment, the generator is a network of "encoding-decoding" structures. The encoding portion includes N downsampling mask residual modules, the decoding portion includes N upsampling mask residual modules, and a conditional self-attention module is inserted before a last upsampling mask residual module. The conditional self-attention module can effectively learn long-distance dependence information in the input feature map and help a network learn structural information of the image. Several residual modules can be added between the coding part and the decoding part to enhance the capacity and the fitting capability of the network.
Specifically, referring to fig. 2A, the number of down-sampling mask residual modules is the same as that of the up-sampling mask residual modules, and is N, the down-sampling mask residual modules, the first N-1 up-sampling mask residual modules, the conditional self-attention module, and the last up-sampling mask residual module are sequentially connected, outputs of the first to N-1 down-sampling mask residual modules are further respectively connected to the nth to the second up-sampling mask residual modules, that is, the decoding portion and the encoding portion are connected by skipping, so as to ensure that the low-layer information (i.e., the down-sampling module output information) can be directly transmitted to the high-layer (i.e., the up-sampling module). In the embodiment of the present disclosure, the sequential connection means that the output of the previous module is connected to the input of the next module, that is, the output characteristic diagram of the previous module is used as the input characteristic diagram of the next module.
It is understood that, in the countermeasure generation model of the embodiment of the present disclosure, the residual modules may not be included, that is, the number of the residual modules is 0. The larger the number of residual modules, the stronger the fitting ability of the generator, but the excessive residual modules affect the convergence speed and calculation speed of training against the generated model, and therefore, it is preferable to set the number of residual modules within 1 to 8.
In the embodiment of the present disclosure, taking N as 6 and the number of residual modules as 8 as an example, parameters of the generator are schematically illustrated, and taking the size of the sketch of the training sample as 256 × 256 × 1 as an example, the output size of each module of the generator is schematically illustrated, as shown in table 1. Meanwhile, a person skilled in the art can obtain a generator composed of other numbers and parameters of the downsampling mask residual module, the upsampling mask residual module, the residual module, and the conditional self-attention module according to the description of the embodiment.
TABLE 1
Figure BDA0002213593430000061
The input of the discriminator is the generated image and the real image corresponding to the training sample sketch, and the output is the discrimination of the reality sense of the input image. Referring to fig. 2B, according to an embodiment of the disclosure, the arbiter is composed of a plurality of sub-arbitration networks, each sub-arbitration network has more than three convolutional layers, and weights of the first three convolutional layers in each sub-arbitration network are shared. The hyper-parameters of each sub-discrimination network are the same, and only the convolution layer number is different. The hyper-parameter is a preset parameter such as convolution kernel size, step size, etc. Because the depth of each sub-discrimination network is different, the perception fields of the input images corresponding to each pixel of the output characteristic diagram are different, and therefore the sub-discrimination networks with different depths respectively discriminate the reality of the generated images under different scales; and because the bottom layer characteristics of each sub-discrimination network are consistent, each sub-discrimination network shares the parameters of the previous layers, thereby reducing the number of the parameters of the network and being beneficial to network convergence.
In the embodiment of the disclosure, the discriminator is a multi-scale discriminator, and the structures of the sub-discrimination networks are consistent and the depths of the sub-discrimination networks are different. Taking the maximum depth sub-discriminant network including 8 convolution modules as an example, the structure of the multi-scale discriminator is schematically illustrated, and as shown in table 2, only the maximum depth sub-discriminant network is shown in table 2. In the embodiment shown in table 2, the parameters of the convolution module 1, the convolution module 2 and the convolution module 3 are shared by each sub-discriminant network, and on the basis, other convolution modules are added to form the sub-discriminant network. In practical applications, the number of the sub-decision networks can be selected according to requirements, for example, only two sub-decision networks are provided, which are respectively the sub-decision network formed by the convolution modules 1, 2, 3 and 4 and the sub-decision network formed by the convolution modules 1, 2, 3, 4, 5, 6, 7 and 8.
TABLE 2
Figure BDA0002213593430000071
S2, inputting one or more training sample sketches into the generator to generate a generated image corresponding to each training sample sketch.
Specifically, a draft of a training sample is input into a first downsampling mask residual module, and after the draft of the training sample is processed by the downsampling mask residual module, the first N-1 upsampling mask residual modules, the conditional self-attention module and the last upsampling mask residual module in sequence, a generated image corresponding to each draft of the training sample is output by the last upsampling mask residual module.
Referring to fig. 3, the process of the conditional self-attention module is: connecting the received input feature graph with a training sample sketch in series to obtain condition features; mapping the condition characteristics according to three mapping matrixes to obtain three mapping characteristic graphs, wherein the mapping matrixes are composed of a large number of trainable parameters; processing the three mapping characteristic graphs to obtain a response graph; and adding the response graph to the input feature graph element by element to obtain an output feature graph.
Specifically, the input feature map a received from the attention module is the output feature map of the (N-1) th up-sampling module and the output feature map of the first down-sampling module, and a ∈ RC×H×W. First, the size of the training sample sketch needs to be modified so that the size of the training sample sketch x is x ∈ R1×H×WAnd connected in series in the channel direction to obtain the condition characteristics [ a, x]。
Mapping the condition characteristics according to the three mapping matrixes to map the condition characteristics to three new characteristic spaces to obtain three mapping characteristic maps, wherein the mapping characteristic maps are respectively as follows:
f([a,x])=Wf[a,x]
g([a,x])=Wg[a,x]
h([a,x])=Wh[a,x]
wherein a is the input characteristic diagram, x is the scaling diagram of the training sample sketch according to the same resolution of the characteristic diagram, and f ([ a, x)])、g([a,x])、h([a,x]) Three mapping feature maps, Wf、Wg、WhIs a mapping matrix, a ∈ RC×H×W,x∈R1 ×H×W,Wf∈RD×(C+1),Wg∈RD×(C+1),Wh∈RC×(C+1)D ═ C/8, C, H, W are the number of channels, height, and width, respectively, of the input profile.In the disclosed embodiments, W is implemented not explicitly using a matrixf、Wg、WhInstead, W is realized using 1 × 1 convolutional layers, respectivelyf、Wg、Wh
Further, the map feature map f ([ a, x ]) and the map feature map g ([ a, x ]) are processed to obtain an attention map, and the attention map and the map feature map g ([ a, x ]) are processed to obtain a response map.
Specifically, the mapping feature map f ([ a, x ]) is transformed, multiplied by the mapping feature map g ([ a, x ]), normalized to obtain an attention map, and multiplied by the mapping feature map h ([ a, x ]) to obtain a response map r.
r=(r1,r2,……,rN)∈RC×N
Wherein, N is H multiplied by W,
Figure BDA0002213593430000091
Figure BDA0002213593430000092
si,j=f([a,x])Tg([a,x]). Let B be RN×NFor the attention diagram, each element of B is denoted as Bi,jIndicating that the feature map h ([ a, x ] is currently mapped at the time of synthesizing the jth pixel of the response map]) The weight of the ith pixel.
Further, the response graph is added to the input feature graph element by element to form a residual structure, and the output feature graph of the conditional self-attention module is obtained:
oj=γrj+dj
wherein gamma is a trainable weight parameter with an initial value of 0, ojIs the jth pixel of the output feature map, rjIn response to the jth pixel of the map, ajIs conditioned on the jth pixel of the input feature map received from the attention module.
Through the conditional self-attention module, the generator can learn the long-distance dependence of the image step by step.
And S3, inputting the training sample sketch, the corresponding real image and the generated image into the discriminator to calculate a loss function, and calculating a training target function according to the loss function.
In the disclosed embodiment, the loss function includes a countering loss function Ladv(G; D), reconstruction loss function LL1(G) Sum-feature matching loss function Lfm(G) Wherein:
Figure BDA0002213593430000093
Figure BDA0002213593430000094
Figure BDA0002213593430000095
x is a sketch of a training sample, y is a real image, NDG (x) is a generated image corresponding to the sketch of the training sample, Dk(x, y) is the output obtained by the kth sub-discrimination network according to the draft of the training sample and the real image, Dk(x, G (x)) is the output obtained by the kth sub-discrimination network according to the training sample sketch and the generated image,
Figure BDA0002213593430000101
to expect the (x, y) data distribution,
Figure BDA0002213593430000102
to expect the data distribution of x, Q is the set of feature layers of the selected sub-discriminant network, NQFor each sub-decision network element number, nqThe number of elements of the q-th characteristic layer,
Figure BDA0002213593430000103
for the kth sub-discrimination network from the intermediate output feature map of the qth layer of the generated image,
Figure BDA0002213593430000104
and outputting the characteristic diagram for the kth sub-discrimination network according to the middle of the qth layer of the real image.
Penalty function Ladv(G; D) is used to make the generation countermeasure network (i.e., the countermeasure generation model) complete the countermeasure training to ensure the realism of the generated image. Reconstruction loss function LL1(G) For bringing the generated image closer to the real image. Feature matching loss function Lfm(G) For bringing the generated image closer to the real image in the feature space.
Further, it can also be based on Ladv(G;D)、LL1(G) And Lfm(G) Calculating a training objective function of the confrontation generation model:
Figure BDA0002213593430000105
where λ and μ are the reconstruction loss function L, respectivelyL1(G) Sum-feature matching loss function Lfm(G) For example, λ ═ 100.0 and μ ═ 1.0 are selected in the experiment.
S4, training the parameters of the generator and the discriminator according to the training objective function to minimize the training objective function.
Alternately training the parameters of the generator and the discriminator, namely firstly fixing the parameters of the discriminator and training the generator for one iteration; in the fixed generator, training the arbiter one iteration, and repeating alternately.
Further, in order to better converge against the generative model, operation S4 is divided into three phases, respectively including the following operations:
training parameters except the attention module in the confrontation generation model according to the training objective function, namely training the model except the attention module under the condition; fixing parameters except for a self-attention module in the confrontation generation model, and training the parameters of the self-attention module, namely only training the condition self-attention module; parameters in the antagonistic generative model are trained simultaneously such that the training objective function is minimized, i.e., all trainable parameters in the model are trained simultaneously.
And S5, generating a target generation image corresponding to the target sketch by using the trained generator.
After the training of the confrontation generation model in the embodiment of the present disclosure is completed according to the above operations S1-S4, the target generation image corresponding to the target sketch may be generated according to the generator in the trained confrontation generation model.
The superiority of the method of the present disclosure will be described below by comparing the sketch-based image generation method of the present disclosure with the existing methods pix2pix, SketchyGAN with better performance.
IS, FID, KID are three commonly used indicators to evaluate the quality of the generated images. The higher the IS score IS, the higher the sense of reality and diversity of the generated image are; the lower the FID and KID values, the higher the realism of the generated image. Table 3 objectively shows the IS, FID, KID indices of the method of the present disclosure and pix2pix, SketchyGAN, and it can be seen that the performance of the method of the present disclosure IS superior to pix2pix, SketchyGAN even when the conditional self-attention module or the multi-scale discriminator IS not used, and IS far superior to pix2pix, SketchyGAN when the conditional self-attention module and the multi-scale discriminator are used simultaneously. Thus, it is also demonstrated that both the conditional attention module and the multi-scale arbiter in the present disclosure can effectively enhance the generation effect of the confrontation generation model.
TABLE 3
Model (model) IS FID KID
pix2pix 2.55±0.20 605.97±13.95 3.05±0.08
SketchyGAN 2.75±0.17 479.09±15.25 2.11±0.09
The present disclosure method (removal condition self-attention module) 3.73±0.14 451.58±14.91 1.88±0.07
The present disclosure method (remove multi-scale discriminator) 2.83±0.15 409.36±17.04 1.37±0.03
The method of the disclosure 2.99±0.21 333.19±16.75 0.91±0.05
In summary, the sketch-based image generation method provided by the disclosure introduces a condition self-attention module and a multi-scale discriminator in a condition generation confrontation network, wherein the condition self-attention module enables a confrontation generation model to directly learn long-distance dependence information of an image, and the multi-scale discriminator ensures authenticity of local texture and integrity of a global structure of the generated image, so that the confrontation generation model has strong robustness.
The above-mentioned embodiments are intended to illustrate the objects, aspects and advantages of the present disclosure in further detail, and it should be understood that the above-mentioned embodiments are only illustrative of the present disclosure and are not intended to limit the present disclosure, and any modifications, equivalents, improvements and the like made within the spirit and principle of the present disclosure should be included in the scope of the present disclosure.

Claims (10)

1. A sketch-based image generation method, comprising:
s1, constructing a countermeasure generation model which comprises a generator and a discriminator, wherein the generator comprises a down-sampling mask residual module, a residual module, an up-sampling mask residual module and a condition self-attention module, the discriminator comprises a plurality of sub-discrimination networks with different depths, and the condition self-attention module is used for learning the long-distance dependence of an input feature image input into the condition self-attention module;
s2, inputting more than one training sample sketch into the generator to generate a generated image corresponding to each training sample sketch;
s3, inputting the training sample sketch, the corresponding real image and the generated image into the discriminator to calculate a loss function, and calculating a training target function according to the loss function;
s4, training the parameters of the generator and the discriminator according to the training objective function so as to reduce the training objective function to the minimum;
and S5, generating a target generation image corresponding to the target sketch by using the trained generator.
2. The sketch-based image generating method as claimed in claim 1, wherein the number of said downsampling mask residual modules and said upsampling mask residual modules is N, said downsampling mask residual modules, first N-1 upsampling mask residual modules, conditional attention module and last upsampling mask residual module are connected in sequence, outputs of said first to N-1 downsampling mask residual modules are further connected to said nth to second upsampling mask residual modules, respectively, said step S2 comprises:
and inputting the training sample sketch into a first down-sampling mask residual error module, and outputting a generated image corresponding to each training sample sketch by a last up-sampling mask residual error module after the processing of the down-sampling mask residual error module, the first N-1 up-sampling mask residual error modules, the conditional self-attention module and the last up-sampling mask residual error module in sequence.
3. The sketch-based image generation method of claim 1, wherein the processing of the conditional self-attention module comprises:
connecting the received input feature graph with the training sample sketch in series to obtain condition features;
respectively mapping the condition characteristics according to three mapping matrixes to obtain three mapping characteristic graphs, wherein the mapping matrixes are composed of trainable parameters;
processing the three mapping characteristic graphs to obtain a response graph;
and adding the response graph and the input feature graph to obtain an output feature graph.
4. The sketch-based image generation method as claimed in claim 3, wherein said mapping feature maps are respectively:
f([a,x])=Wf[a,x]
g([a,x])=Wg[a,x]
h([a,x])=Wh[a,x]
wherein a is the input feature map, x is a zoom map of the training sample sketch with equal resolution according to the feature map, and f ([ a, x)])、g([a,x])、h([a,x]) Are the three mapping feature maps, W, respectivelyf、Wg、WhIs the mapping matrix, a ∈ RC ×H×W,x∈R1×H×W,Wf∈RD×(C+1),Wg∈RD×(C+1),Wh∈RC×(C+1),D=C/8,C、H、And W is the channel number, the height and the width of the input feature map respectively.
5. The sketch-based image generation method as claimed in claim 4, wherein said processing said three mapping feature maps to obtain a response map comprises:
processing the mapping characteristic map f ([ a, x ]) and the mapping characteristic map g ([ a, x ]) to obtain an attention map;
the attention map is processed with a map signature h ([ a, x ]) to obtain the response map.
6. The sketch-based image generation method of claim 5, wherein the response map is:
r=(r1,r2,……,rN)∈RC×N
wherein r is the response map, N ═ H × W,
Figure FDA0003523833300000021
Figure FDA0003523833300000031
si,j=f([a,x])Tg([a,x])。
7. the sketch-based image generation method of claim 6, wherein the output feature map is:
oj=γrj+aj
wherein gamma is a trainable weight parameter with an initial value of 0, ojIs the jth pixel, r, of the output feature mapjIs the jth pixel of the response map, ajThe jth pixel of the input feature map received from the attention module for the condition.
8. The sketch-based image generation method as claimed in claim 1, wherein said discriminator is composed of more than one convolutional layer, the convolutional layers of different sub-discrimination networks are different, and the hyper-parameters of each sub-discrimination network are the same.
9. The sketch-based image generation method of claim 1, wherein the loss function comprises a countering loss function Ladv(G; D), reconstruction loss function LL1(G) Sum-feature matching loss function Lfm(G) Wherein:
Figure FDA0003523833300000032
Figure FDA0003523833300000033
Figure FDA0003523833300000034
the training objective function is
Figure FDA0003523833300000035
Wherein x is the sketch of the training sample, y is the real image, NDDetermining the number of networks for said sub-network, NDIs an integer greater than 1, G (x) is a generated image corresponding to the sketch of the training sample, Dk(x, y) is the output obtained by the kth sub-discrimination network according to the draft of the training sample and the real image, Dk(x, G (x)) is the output obtained by the kth sub-discrimination network according to the training sample sketch and the generated image,
Figure FDA0003523833300000036
to expect the (x, y) data distribution,
Figure FDA0003523833300000037
to expect the data distribution of x, Q is the set of feature layers of the selected sub-discriminant network, NQFor each sub-decision network element number, nqThe number of elements of the q-th characteristic layer,
Figure FDA0003523833300000041
for the kth sub-discrimination network from the intermediate output feature map of the qth layer of the generated image,
Figure FDA0003523833300000042
outputting a characteristic diagram for the kth sub-discrimination network according to the middle of the q layer of the real image, wherein lambda is a reconstruction loss function LL1(G) Mu is the feature matching loss function Lfm(G) The weight of (c).
10. The sketch-based image generating method as claimed in claim 1, wherein said step S4 comprises:
training parameters except a self-attention module in the confrontation generation model according to the training objective function;
fixing parameters except a self-attention module in the confrontation generation model, and training the parameters of the self-attention module;
parameters in the challenge-generating model are simultaneously trained to minimize the target training function.
CN201910909387.5A 2019-09-24 2019-09-24 Sketch-based image generation method Active CN110659727B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910909387.5A CN110659727B (en) 2019-09-24 2019-09-24 Sketch-based image generation method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910909387.5A CN110659727B (en) 2019-09-24 2019-09-24 Sketch-based image generation method

Publications (2)

Publication Number Publication Date
CN110659727A CN110659727A (en) 2020-01-07
CN110659727B true CN110659727B (en) 2022-05-13

Family

ID=69039033

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910909387.5A Active CN110659727B (en) 2019-09-24 2019-09-24 Sketch-based image generation method

Country Status (1)

Country Link
CN (1) CN110659727B (en)

Families Citing this family (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113313133A (en) * 2020-02-25 2021-08-27 武汉Tcl集团工业研究院有限公司 Training method for generating countermeasure network and animation image generation method
CN111428761B (en) * 2020-03-11 2023-03-28 深圳先进技术研究院 Image feature visualization method, image feature visualization device and electronic equipment
CN111382845B (en) * 2020-03-12 2022-09-02 成都信息工程大学 Template reconstruction method based on self-attention mechanism
CN111489405B (en) * 2020-03-21 2022-09-16 复旦大学 Face sketch synthesis system for generating confrontation network based on condition enhancement
CN111489287B (en) * 2020-04-10 2024-02-09 腾讯科技(深圳)有限公司 Image conversion method, device, computer equipment and storage medium
CN113592724A (en) * 2020-04-30 2021-11-02 北京金山云网络技术有限公司 Target face image restoration method and device
CN111508069B (en) * 2020-05-22 2023-03-21 南京大学 Three-dimensional face reconstruction method based on single hand-drawn sketch
CN112132172A (en) * 2020-08-04 2020-12-25 绍兴埃瓦科技有限公司 Model training method, device, equipment and medium based on image processing
CN112070658B (en) * 2020-08-25 2024-04-16 西安理工大学 Deep learning-based Chinese character font style migration method
CN112149802B (en) * 2020-09-17 2022-08-09 广西大学 Image content conversion method with consistent semantic structure
CN112862110B (en) * 2021-02-11 2024-01-30 脸萌有限公司 Model generation method and device and electronic equipment
CN112949553A (en) * 2021-03-22 2021-06-11 陈懋宁 Face image restoration method based on self-attention cascade generation countermeasure network
CN112837215B (en) * 2021-03-31 2022-10-18 电子科技大学 Image shape transformation method based on generation countermeasure network
CN113205521A (en) * 2021-04-23 2021-08-03 复旦大学 Image segmentation method of medical image data
CN113269256B (en) * 2021-05-26 2024-08-27 广州密码营地信息科技有限公司 Construction method and application of MiSrc-GAN medical image model
CN113823296A (en) * 2021-06-15 2021-12-21 腾讯科技(深圳)有限公司 Voice data processing method and device, computer equipment and storage medium
CN114299218A (en) * 2021-12-13 2022-04-08 吉林大学 System for searching real human face based on hand-drawing sketch

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109145992A (en) * 2018-08-27 2019-01-04 西安电子科技大学 Cooperation generates confrontation network and sky composes united hyperspectral image classification method
CN109978165A (en) * 2019-04-04 2019-07-05 重庆大学 A kind of generation confrontation network method merged from attention mechanism

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109145992A (en) * 2018-08-27 2019-01-04 西安电子科技大学 Cooperation generates confrontation network and sky composes united hyperspectral image classification method
CN109978165A (en) * 2019-04-04 2019-07-05 重庆大学 A kind of generation confrontation network method merged from attention mechanism

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Self-Attention Generative Adversarial Networks;Han Zhang et al.;《arXiv》;20190114;第1-10页 *
SketchyGAN: Towards Diverse and Realistic Sketch to Image Synthesis;Wengling Chen et al.;《arXiv》;20180412;第1-19页 *

Also Published As

Publication number Publication date
CN110659727A (en) 2020-01-07

Similar Documents

Publication Publication Date Title
CN110659727B (en) Sketch-based image generation method
CN111047548B (en) Attitude transformation data processing method and device, computer equipment and storage medium
CN106462724B (en) Method and system based on normalized images verification face-image
CN110533712A (en) A kind of binocular solid matching process based on convolutional neural networks
CN110544297A (en) Three-dimensional model reconstruction method for single image
CN109685819A (en) A kind of three-dimensional medical image segmentation method based on feature enhancing
CN110728219A (en) 3D face generation method based on multi-column multi-scale graph convolution neural network
CN113658322B (en) Three-dimensional voxel reconstruction method based on visual transducer
CN105981050A (en) Method and system for exacting face features from data of face images
CN111127538A (en) Multi-view image three-dimensional reconstruction method based on convolution cyclic coding-decoding structure
CN114998525A (en) Action identification method based on dynamic local-global graph convolutional neural network
CN113792641A (en) High-resolution lightweight human body posture estimation method combined with multispectral attention mechanism
CN113096239B (en) Three-dimensional point cloud reconstruction method based on deep learning
CN114419412A (en) Multi-modal feature fusion method and system for point cloud registration
CN113449612B (en) Three-dimensional target point cloud identification method based on sub-flow sparse convolution
CN112634438A (en) Single-frame depth image three-dimensional model reconstruction method and device based on countermeasure network
CN114004847A (en) Medical image segmentation method based on graph reversible neural network
CN113688765A (en) Attention mechanism-based action recognition method for adaptive graph convolution network
CN113344869A (en) Driving environment real-time stereo matching method and device based on candidate parallax
CN115546032A (en) Single-frame image super-resolution method based on feature fusion and attention mechanism
CN114758152A (en) Feature matching method based on attention mechanism and neighborhood consistency
CN112509021A (en) Parallax optimization method based on attention mechanism
CN114463235A (en) Infrared and visible light image fusion method and device and storage medium
CN115222998A (en) Image classification method
CN114612902A (en) Image semantic segmentation method, device, equipment, storage medium and program product

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant