CN110659727B - Sketch-based image generation method - Google Patents
Sketch-based image generation method Download PDFInfo
- Publication number
- CN110659727B CN110659727B CN201910909387.5A CN201910909387A CN110659727B CN 110659727 B CN110659727 B CN 110659727B CN 201910909387 A CN201910909387 A CN 201910909387A CN 110659727 B CN110659727 B CN 110659727B
- Authority
- CN
- China
- Prior art keywords
- sketch
- training
- module
- map
- training sample
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 37
- 238000012549 training Methods 0.000 claims abstract description 81
- 238000005070 sampling Methods 0.000 claims abstract description 36
- 230000006870 function Effects 0.000 claims description 45
- 238000013507 mapping Methods 0.000 claims description 36
- 230000004044 response Effects 0.000 claims description 21
- 238000010586 diagram Methods 0.000 claims description 13
- 238000012545 processing Methods 0.000 claims description 12
- 239000011159 matrix material Substances 0.000 claims description 4
- 230000008447 perception Effects 0.000 description 3
- 238000013519 translation Methods 0.000 description 3
- 230000009286 beneficial effect Effects 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 230000003042 antagnostic effect Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000001143 conditioned effect Effects 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000007500 overflow downdraw method Methods 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 230000001737 promoting effect Effects 0.000 description 1
- 230000002194 synthesizing effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T3/00—Geometric image transformations in the plane of the image
- G06T3/04—Context-preserving transformations, e.g. by using an importance map
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Molecular Biology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Image Analysis (AREA)
Abstract
A sketch-based image generation method, comprising: s1, constructing a confrontation generation model, wherein the confrontation generation model comprises a generator and a discriminator, the generator comprises a down-sampling mask residual error module, a residual error module, an up-sampling mask residual error module and a condition self-attention module, and the discriminator comprises more than one sub-discrimination network with different depths; s2, inputting one or more than one training sample sketch into the generator to generate a generated image corresponding to each training sample sketch; s3, inputting the real image and the generated image corresponding to the draft of the training sample into a discriminator to calculate a loss function, and calculating a training target function according to the loss function; s4, training the parameters of the generator and the discriminator according to the training target function to minimize the training target function; and S5, generating a target generation image corresponding to the target sketch by using the trained generator. The method ensures that the generated face image has real local texture and complete face structure.
Description
Technical Field
The present disclosure relates to the field of image processing technologies, and in particular, to an image generation method based on a sketch.
Background
Sketch-based image generation is a special case of image-to-image translation, and plays an important role in computer graphics. The early draft-based image generation algorithm adopts a search fusion method, firstly searches image blocks related to a draft from a large-scale image database, and then fuses the image blocks. With the rapid development of deep learning in recent years, generation countermeasure networks are increasingly applied in image-to-image translation. Isola et al propose a general model for image-to-image translation for generating a countermeasure network based on supervised training conditions, which is only for dense images and is unsatisfactory for generating sparse images such as sketches for input images.
Because the sketch has the characteristics of diversity, abstraction, sparsity and the like, the face image generation based on the sketch faces great challenges. The existing method still cannot generate an ideal face image, and particularly when the input sketch does not contain complete face structures (eyes, nose, mouth and the like), the generated face image also tends to correspondingly lack part of the face structures. The network model of the existing method is based on the convolutional layer, the perception field of the convolutional layer is very limited, the global perception field can be achieved only through the multilayer convolutional layer, but the global structure information cannot be easily learned by the network due to the multilayer convolutional layer; in addition, the reality discriminator used by the generation countermeasure network of the existing method is used for locally discriminating the image block, only the local reality of the generated image can be ensured, and the structural integrity of the human face image cannot be directly discriminated.
Disclosure of Invention
Technical problem to be solved
The present disclosure has been made in view of the above problems, and provides a sketch-based image generation method that solves the above technical problems by introducing a self-attentiveness mechanism and a multi-scale discriminator in a condition generation countermeasure network.
(II) technical scheme
The present disclosure provides a sketch-based image generation method, including: s1, constructing a confrontation generation model, wherein the confrontation generation model comprises a generator and a discriminator, the generator comprises a down-sampling mask residual error module, a residual error module, an up-sampling mask residual error module and a condition self-attention module, and the discriminator comprises more than one sub-discrimination network with different depths; s2, inputting one or more than one training sample sketch into the generator to generate a generated image corresponding to each training sample sketch; s3, inputting the training sample sketch, the corresponding real image and the generated image into the discriminator to calculate a loss function, and calculating a training target function according to the loss function; s4, training the parameters of the generator and the discriminator according to the training objective function so as to reduce the training objective function to the minimum; and S5, generating a target generation image corresponding to the target sketch by using the trained generator.
Optionally, the number of the down-sampling mask residual modules and the up-sampling mask residual modules is N, the down-sampling mask residual module, the first N-1 up-sampling mask residual modules, the conditional self-attention module, and the last up-sampling mask residual module are sequentially connected, outputs of the first to N-1 down-sampling mask residual modules are further respectively connected to the nth to second up-sampling mask residual modules, and the step S2 includes: and inputting the training sample sketch into a first down-sampling mask residual error module, and outputting a generated image corresponding to each training sample sketch by a last up-sampling mask residual error module after the processing of the down-sampling mask residual error module, the first N-1 up-sampling mask residual error modules, the conditional self-attention module and the last up-sampling mask residual error module in sequence.
Optionally, the processing of the conditional self-attention module comprises: connecting the received input feature graph with the training sample sketch in series to obtain condition features; respectively mapping the condition characteristics according to three mapping matrixes to obtain three mapping characteristic graphs, wherein the mapping matrixes are composed of trainable parameters; processing the three mapping characteristic graphs to obtain a response graph; and adding the response graph and the input feature graph to obtain an output feature graph.
Optionally, the mapping feature maps are respectively: f ([ a, x)])=Wf[a,x]、g([a,x])=Wg[a,x]、h([a,x])=Wh[a,x]Wherein a is the input feature map, x is a zoom map of the training sample sketch with equal resolution according to the feature map, and f ([ a, x [ ])])、g([a,x])、h([a,x]) Are the three mapping feature maps, W, respectivelyf、Wg、WhIs the mapping matrix, a ∈ RC×H×W,x∈R1×H×W,Wf∈RD×(C+1),Wg∈RD×(C+1),Wh∈RC×(C+11)And D ═ C/8, C, H, W are the number of channels, height, and width of the input feature map, respectively.
Optionally, the processing the three mapping feature maps to obtain a response map includes: processing the mapping characteristic map f ([ a, x ]) and the mapping characteristic map g ([ a, x ]) to obtain an attention map; the attention map is processed with a map signature h ([ a, x ]) to obtain the response map.
Optionally, the response map is: r ═ r (r)1,r2,……,rN)∈RC×NWherein r is the response map, N ═ H × W, si,j=f([a,x])Tg([a,x])。
optionally, the output feature map is: oj=γri+ajWherein, gamma is a trainable weight parameter with an initial value of 0, ojIs the jth pixel, r, of the output feature mapjIs the jth pixel of the response map, ajThe jth pixel of the input feature map received from the attention module for the condition.
Optionally, the discriminator is composed of more than one convolution layer, the number of convolution layers of different sub-discrimination networks is different, and the hyper-parameters of each sub-discrimination network are the same.
Optionally, the loss function comprises a penalty loss function Ladv(G; D), reconstruction loss function LL1(G) Sum-feature matching loss function Lfm(G) Wherein:
the training objective function isWherein x is the sketch of the training sample, y is the real image, NDG (x) is the number of the sub-discriminant networks, and G (x) is a generated image corresponding to the draft of the training sample, Dk(x, y) is the output obtained by the kth sub-discrimination network according to the draft of the training sample and the real image, Dk(xG (x)) is the output obtained by the kth sub-discrimination network according to the training sample sketch and the generated image,to expect the (x, y) data distribution,to expect the data distribution of x, Q is the set of feature layers of the selected sub-discriminant network, NQFor each sub-decision network element number, nqThe number of elements of the q-th characteristic layer,for the kth sub-discrimination network from the intermediate output feature map of the qth layer of the generated image,outputting a characteristic diagram for the kth sub-discrimination network according to the middle of the q layer of the real image, wherein lambda is a reconstruction loss function LL1(G) Mu is the feature matching loss function Lfm(G) The weight of (c).
Optionally, the step S4 includes: training parameters except a self-attention module in the confrontation generation model according to the training objective function; fixing parameters except a self-attention module in the confrontation generation model, and training the parameters of the self-attention module; parameters in the challenge-generating model are simultaneously trained to minimize the target training function.
(III) advantageous effects
The sketch-based image generation method provided by the disclosure has the following beneficial effects:
(1) by introducing a conditional self-attention module into a conditional generation countermeasure network, long-distance dependence of images can be directly learned;
(2) by introducing a multi-scale discriminator into the condition generation countermeasure network, the sense of reality of the generated image can be discriminated on different scales so as to ensure that the local texture and the detail of the generated image have the sense of reality and ensure that the structure of the generated image is complete when the sketch is incomplete;
(3) the sub-judgment networks share the parameters of the previous layers, so that the number of the parameters of the networks can be reduced, and network convergence is facilitated.
Drawings
FIG. 1 schematically illustrates a flow chart of a sketch-based image generation method provided by an embodiment of the present disclosure;
fig. 2A schematically illustrates a structural diagram of a generator for constructing a confrontation generation model in the sketch-based image generation method provided by the embodiment of the disclosure;
fig. 2B schematically illustrates a structural diagram of a discriminator for constructing a confrontation generating model in the sketch-based image generating method provided in the embodiment of the present disclosure;
FIG. 3 schematically illustrates a schematic diagram of a conditional self-attention module in the generator shown in FIG. 2A.
Detailed Description
For the purpose of promoting a better understanding of the objects, aspects and advantages of the present disclosure, reference is made to the following detailed description taken in conjunction with the accompanying drawings.
The embodiment provides a sketch-based image generation method, and with reference to fig. 1, with reference to fig. 2A, fig. 2B, and fig. 3, the method shown in fig. 1 is described in detail, and the method includes the following operations.
And S1, constructing a confrontation generation model, wherein the confrontation generation model comprises a generator and a discriminator, the generator comprises a down-sampling mask residual error module, a residual error module, an up-sampling mask residual error module and a condition self-attention module, and the discriminator comprises more than one sub-discrimination network with different depths.
In this embodiment, the generator is a network of "encoding-decoding" structures. The encoding portion includes N downsampling mask residual modules, the decoding portion includes N upsampling mask residual modules, and a conditional self-attention module is inserted before a last upsampling mask residual module. The conditional self-attention module can effectively learn long-distance dependence information in the input feature map and help a network learn structural information of the image. Several residual modules can be added between the coding part and the decoding part to enhance the capacity and the fitting capability of the network.
Specifically, referring to fig. 2A, the number of down-sampling mask residual modules is the same as that of the up-sampling mask residual modules, and is N, the down-sampling mask residual modules, the first N-1 up-sampling mask residual modules, the conditional self-attention module, and the last up-sampling mask residual module are sequentially connected, outputs of the first to N-1 down-sampling mask residual modules are further respectively connected to the nth to the second up-sampling mask residual modules, that is, the decoding portion and the encoding portion are connected by skipping, so as to ensure that the low-layer information (i.e., the down-sampling module output information) can be directly transmitted to the high-layer (i.e., the up-sampling module). In the embodiment of the present disclosure, the sequential connection means that the output of the previous module is connected to the input of the next module, that is, the output characteristic diagram of the previous module is used as the input characteristic diagram of the next module.
It is understood that, in the countermeasure generation model of the embodiment of the present disclosure, the residual modules may not be included, that is, the number of the residual modules is 0. The larger the number of residual modules, the stronger the fitting ability of the generator, but the excessive residual modules affect the convergence speed and calculation speed of training against the generated model, and therefore, it is preferable to set the number of residual modules within 1 to 8.
In the embodiment of the present disclosure, taking N as 6 and the number of residual modules as 8 as an example, parameters of the generator are schematically illustrated, and taking the size of the sketch of the training sample as 256 × 256 × 1 as an example, the output size of each module of the generator is schematically illustrated, as shown in table 1. Meanwhile, a person skilled in the art can obtain a generator composed of other numbers and parameters of the downsampling mask residual module, the upsampling mask residual module, the residual module, and the conditional self-attention module according to the description of the embodiment.
TABLE 1
The input of the discriminator is the generated image and the real image corresponding to the training sample sketch, and the output is the discrimination of the reality sense of the input image. Referring to fig. 2B, according to an embodiment of the disclosure, the arbiter is composed of a plurality of sub-arbitration networks, each sub-arbitration network has more than three convolutional layers, and weights of the first three convolutional layers in each sub-arbitration network are shared. The hyper-parameters of each sub-discrimination network are the same, and only the convolution layer number is different. The hyper-parameter is a preset parameter such as convolution kernel size, step size, etc. Because the depth of each sub-discrimination network is different, the perception fields of the input images corresponding to each pixel of the output characteristic diagram are different, and therefore the sub-discrimination networks with different depths respectively discriminate the reality of the generated images under different scales; and because the bottom layer characteristics of each sub-discrimination network are consistent, each sub-discrimination network shares the parameters of the previous layers, thereby reducing the number of the parameters of the network and being beneficial to network convergence.
In the embodiment of the disclosure, the discriminator is a multi-scale discriminator, and the structures of the sub-discrimination networks are consistent and the depths of the sub-discrimination networks are different. Taking the maximum depth sub-discriminant network including 8 convolution modules as an example, the structure of the multi-scale discriminator is schematically illustrated, and as shown in table 2, only the maximum depth sub-discriminant network is shown in table 2. In the embodiment shown in table 2, the parameters of the convolution module 1, the convolution module 2 and the convolution module 3 are shared by each sub-discriminant network, and on the basis, other convolution modules are added to form the sub-discriminant network. In practical applications, the number of the sub-decision networks can be selected according to requirements, for example, only two sub-decision networks are provided, which are respectively the sub-decision network formed by the convolution modules 1, 2, 3 and 4 and the sub-decision network formed by the convolution modules 1, 2, 3, 4, 5, 6, 7 and 8.
TABLE 2
S2, inputting one or more training sample sketches into the generator to generate a generated image corresponding to each training sample sketch.
Specifically, a draft of a training sample is input into a first downsampling mask residual module, and after the draft of the training sample is processed by the downsampling mask residual module, the first N-1 upsampling mask residual modules, the conditional self-attention module and the last upsampling mask residual module in sequence, a generated image corresponding to each draft of the training sample is output by the last upsampling mask residual module.
Referring to fig. 3, the process of the conditional self-attention module is: connecting the received input feature graph with a training sample sketch in series to obtain condition features; mapping the condition characteristics according to three mapping matrixes to obtain three mapping characteristic graphs, wherein the mapping matrixes are composed of a large number of trainable parameters; processing the three mapping characteristic graphs to obtain a response graph; and adding the response graph to the input feature graph element by element to obtain an output feature graph.
Specifically, the input feature map a received from the attention module is the output feature map of the (N-1) th up-sampling module and the output feature map of the first down-sampling module, and a ∈ RC×H×W. First, the size of the training sample sketch needs to be modified so that the size of the training sample sketch x is x ∈ R1×H×WAnd connected in series in the channel direction to obtain the condition characteristics [ a, x]。
Mapping the condition characteristics according to the three mapping matrixes to map the condition characteristics to three new characteristic spaces to obtain three mapping characteristic maps, wherein the mapping characteristic maps are respectively as follows:
f([a,x])=Wf[a,x]
g([a,x])=Wg[a,x]
h([a,x])=Wh[a,x]
wherein a is the input characteristic diagram, x is the scaling diagram of the training sample sketch according to the same resolution of the characteristic diagram, and f ([ a, x)])、g([a,x])、h([a,x]) Three mapping feature maps, Wf、Wg、WhIs a mapping matrix, a ∈ RC×H×W,x∈R1 ×H×W,Wf∈RD×(C+1),Wg∈RD×(C+1),Wh∈RC×(C+1)D ═ C/8, C, H, W are the number of channels, height, and width, respectively, of the input profile.In the disclosed embodiments, W is implemented not explicitly using a matrixf、Wg、WhInstead, W is realized using 1 × 1 convolutional layers, respectivelyf、Wg、Wh。
Further, the map feature map f ([ a, x ]) and the map feature map g ([ a, x ]) are processed to obtain an attention map, and the attention map and the map feature map g ([ a, x ]) are processed to obtain a response map.
Specifically, the mapping feature map f ([ a, x ]) is transformed, multiplied by the mapping feature map g ([ a, x ]), normalized to obtain an attention map, and multiplied by the mapping feature map h ([ a, x ]) to obtain a response map r.
r=(r1,r2,……,rN)∈RC×N
Wherein, N is H multiplied by W, si,j=f([a,x])Tg([a,x]). Let B be RN×NFor the attention diagram, each element of B is denoted as Bi,jIndicating that the feature map h ([ a, x ] is currently mapped at the time of synthesizing the jth pixel of the response map]) The weight of the ith pixel.
Further, the response graph is added to the input feature graph element by element to form a residual structure, and the output feature graph of the conditional self-attention module is obtained:
oj=γrj+dj
wherein gamma is a trainable weight parameter with an initial value of 0, ojIs the jth pixel of the output feature map, rjIn response to the jth pixel of the map, ajIs conditioned on the jth pixel of the input feature map received from the attention module.
Through the conditional self-attention module, the generator can learn the long-distance dependence of the image step by step.
And S3, inputting the training sample sketch, the corresponding real image and the generated image into the discriminator to calculate a loss function, and calculating a training target function according to the loss function.
In the disclosed embodiment, the loss function includes a countering loss function Ladv(G; D), reconstruction loss function LL1(G) Sum-feature matching loss function Lfm(G) Wherein:
x is a sketch of a training sample, y is a real image, NDG (x) is a generated image corresponding to the sketch of the training sample, Dk(x, y) is the output obtained by the kth sub-discrimination network according to the draft of the training sample and the real image, Dk(x, G (x)) is the output obtained by the kth sub-discrimination network according to the training sample sketch and the generated image,to expect the (x, y) data distribution,to expect the data distribution of x, Q is the set of feature layers of the selected sub-discriminant network, NQFor each sub-decision network element number, nqThe number of elements of the q-th characteristic layer,for the kth sub-discrimination network from the intermediate output feature map of the qth layer of the generated image,and outputting the characteristic diagram for the kth sub-discrimination network according to the middle of the qth layer of the real image.
Penalty function Ladv(G; D) is used to make the generation countermeasure network (i.e., the countermeasure generation model) complete the countermeasure training to ensure the realism of the generated image. Reconstruction loss function LL1(G) For bringing the generated image closer to the real image. Feature matching loss function Lfm(G) For bringing the generated image closer to the real image in the feature space.
Further, it can also be based on Ladv(G;D)、LL1(G) And Lfm(G) Calculating a training objective function of the confrontation generation model:
where λ and μ are the reconstruction loss function L, respectivelyL1(G) Sum-feature matching loss function Lfm(G) For example, λ ═ 100.0 and μ ═ 1.0 are selected in the experiment.
S4, training the parameters of the generator and the discriminator according to the training objective function to minimize the training objective function.
Alternately training the parameters of the generator and the discriminator, namely firstly fixing the parameters of the discriminator and training the generator for one iteration; in the fixed generator, training the arbiter one iteration, and repeating alternately.
Further, in order to better converge against the generative model, operation S4 is divided into three phases, respectively including the following operations:
training parameters except the attention module in the confrontation generation model according to the training objective function, namely training the model except the attention module under the condition; fixing parameters except for a self-attention module in the confrontation generation model, and training the parameters of the self-attention module, namely only training the condition self-attention module; parameters in the antagonistic generative model are trained simultaneously such that the training objective function is minimized, i.e., all trainable parameters in the model are trained simultaneously.
And S5, generating a target generation image corresponding to the target sketch by using the trained generator.
After the training of the confrontation generation model in the embodiment of the present disclosure is completed according to the above operations S1-S4, the target generation image corresponding to the target sketch may be generated according to the generator in the trained confrontation generation model.
The superiority of the method of the present disclosure will be described below by comparing the sketch-based image generation method of the present disclosure with the existing methods pix2pix, SketchyGAN with better performance.
IS, FID, KID are three commonly used indicators to evaluate the quality of the generated images. The higher the IS score IS, the higher the sense of reality and diversity of the generated image are; the lower the FID and KID values, the higher the realism of the generated image. Table 3 objectively shows the IS, FID, KID indices of the method of the present disclosure and pix2pix, SketchyGAN, and it can be seen that the performance of the method of the present disclosure IS superior to pix2pix, SketchyGAN even when the conditional self-attention module or the multi-scale discriminator IS not used, and IS far superior to pix2pix, SketchyGAN when the conditional self-attention module and the multi-scale discriminator are used simultaneously. Thus, it is also demonstrated that both the conditional attention module and the multi-scale arbiter in the present disclosure can effectively enhance the generation effect of the confrontation generation model.
TABLE 3
Model (model) | IS | FID | KID |
pix2pix | 2.55±0.20 | 605.97±13.95 | 3.05±0.08 |
SketchyGAN | 2.75±0.17 | 479.09±15.25 | 2.11±0.09 |
The present disclosure method (removal condition self-attention module) | 3.73±0.14 | 451.58±14.91 | 1.88±0.07 |
The present disclosure method (remove multi-scale discriminator) | 2.83±0.15 | 409.36±17.04 | 1.37±0.03 |
The method of the disclosure | 2.99±0.21 | 333.19±16.75 | 0.91±0.05 |
In summary, the sketch-based image generation method provided by the disclosure introduces a condition self-attention module and a multi-scale discriminator in a condition generation confrontation network, wherein the condition self-attention module enables a confrontation generation model to directly learn long-distance dependence information of an image, and the multi-scale discriminator ensures authenticity of local texture and integrity of a global structure of the generated image, so that the confrontation generation model has strong robustness.
The above-mentioned embodiments are intended to illustrate the objects, aspects and advantages of the present disclosure in further detail, and it should be understood that the above-mentioned embodiments are only illustrative of the present disclosure and are not intended to limit the present disclosure, and any modifications, equivalents, improvements and the like made within the spirit and principle of the present disclosure should be included in the scope of the present disclosure.
Claims (10)
1. A sketch-based image generation method, comprising:
s1, constructing a countermeasure generation model which comprises a generator and a discriminator, wherein the generator comprises a down-sampling mask residual module, a residual module, an up-sampling mask residual module and a condition self-attention module, the discriminator comprises a plurality of sub-discrimination networks with different depths, and the condition self-attention module is used for learning the long-distance dependence of an input feature image input into the condition self-attention module;
s2, inputting more than one training sample sketch into the generator to generate a generated image corresponding to each training sample sketch;
s3, inputting the training sample sketch, the corresponding real image and the generated image into the discriminator to calculate a loss function, and calculating a training target function according to the loss function;
s4, training the parameters of the generator and the discriminator according to the training objective function so as to reduce the training objective function to the minimum;
and S5, generating a target generation image corresponding to the target sketch by using the trained generator.
2. The sketch-based image generating method as claimed in claim 1, wherein the number of said downsampling mask residual modules and said upsampling mask residual modules is N, said downsampling mask residual modules, first N-1 upsampling mask residual modules, conditional attention module and last upsampling mask residual module are connected in sequence, outputs of said first to N-1 downsampling mask residual modules are further connected to said nth to second upsampling mask residual modules, respectively, said step S2 comprises:
and inputting the training sample sketch into a first down-sampling mask residual error module, and outputting a generated image corresponding to each training sample sketch by a last up-sampling mask residual error module after the processing of the down-sampling mask residual error module, the first N-1 up-sampling mask residual error modules, the conditional self-attention module and the last up-sampling mask residual error module in sequence.
3. The sketch-based image generation method of claim 1, wherein the processing of the conditional self-attention module comprises:
connecting the received input feature graph with the training sample sketch in series to obtain condition features;
respectively mapping the condition characteristics according to three mapping matrixes to obtain three mapping characteristic graphs, wherein the mapping matrixes are composed of trainable parameters;
processing the three mapping characteristic graphs to obtain a response graph;
and adding the response graph and the input feature graph to obtain an output feature graph.
4. The sketch-based image generation method as claimed in claim 3, wherein said mapping feature maps are respectively:
f([a,x])=Wf[a,x]
g([a,x])=Wg[a,x]
h([a,x])=Wh[a,x]
wherein a is the input feature map, x is a zoom map of the training sample sketch with equal resolution according to the feature map, and f ([ a, x)])、g([a,x])、h([a,x]) Are the three mapping feature maps, W, respectivelyf、Wg、WhIs the mapping matrix, a ∈ RC ×H×W,x∈R1×H×W,Wf∈RD×(C+1),Wg∈RD×(C+1),Wh∈RC×(C+1),D=C/8,C、H、And W is the channel number, the height and the width of the input feature map respectively.
5. The sketch-based image generation method as claimed in claim 4, wherein said processing said three mapping feature maps to obtain a response map comprises:
processing the mapping characteristic map f ([ a, x ]) and the mapping characteristic map g ([ a, x ]) to obtain an attention map;
the attention map is processed with a map signature h ([ a, x ]) to obtain the response map.
7. the sketch-based image generation method of claim 6, wherein the output feature map is:
oj=γrj+aj
wherein gamma is a trainable weight parameter with an initial value of 0, ojIs the jth pixel, r, of the output feature mapjIs the jth pixel of the response map, ajThe jth pixel of the input feature map received from the attention module for the condition.
8. The sketch-based image generation method as claimed in claim 1, wherein said discriminator is composed of more than one convolutional layer, the convolutional layers of different sub-discrimination networks are different, and the hyper-parameters of each sub-discrimination network are the same.
9. The sketch-based image generation method of claim 1, wherein the loss function comprises a countering loss function Ladv(G; D), reconstruction loss function LL1(G) Sum-feature matching loss function Lfm(G) Wherein:
the training objective function isWherein x is the sketch of the training sample, y is the real image, NDDetermining the number of networks for said sub-network, NDIs an integer greater than 1, G (x) is a generated image corresponding to the sketch of the training sample, Dk(x, y) is the output obtained by the kth sub-discrimination network according to the draft of the training sample and the real image, Dk(x, G (x)) is the output obtained by the kth sub-discrimination network according to the training sample sketch and the generated image,to expect the (x, y) data distribution,to expect the data distribution of x, Q is the set of feature layers of the selected sub-discriminant network, NQFor each sub-decision network element number, nqThe number of elements of the q-th characteristic layer,for the kth sub-discrimination network from the intermediate output feature map of the qth layer of the generated image,outputting a characteristic diagram for the kth sub-discrimination network according to the middle of the q layer of the real image, wherein lambda is a reconstruction loss function LL1(G) Mu is the feature matching loss function Lfm(G) The weight of (c).
10. The sketch-based image generating method as claimed in claim 1, wherein said step S4 comprises:
training parameters except a self-attention module in the confrontation generation model according to the training objective function;
fixing parameters except a self-attention module in the confrontation generation model, and training the parameters of the self-attention module;
parameters in the challenge-generating model are simultaneously trained to minimize the target training function.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910909387.5A CN110659727B (en) | 2019-09-24 | 2019-09-24 | Sketch-based image generation method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910909387.5A CN110659727B (en) | 2019-09-24 | 2019-09-24 | Sketch-based image generation method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110659727A CN110659727A (en) | 2020-01-07 |
CN110659727B true CN110659727B (en) | 2022-05-13 |
Family
ID=69039033
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910909387.5A Active CN110659727B (en) | 2019-09-24 | 2019-09-24 | Sketch-based image generation method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110659727B (en) |
Families Citing this family (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113313133A (en) * | 2020-02-25 | 2021-08-27 | 武汉Tcl集团工业研究院有限公司 | Training method for generating countermeasure network and animation image generation method |
CN111428761B (en) * | 2020-03-11 | 2023-03-28 | 深圳先进技术研究院 | Image feature visualization method, image feature visualization device and electronic equipment |
CN111382845B (en) * | 2020-03-12 | 2022-09-02 | 成都信息工程大学 | Template reconstruction method based on self-attention mechanism |
CN111489405B (en) * | 2020-03-21 | 2022-09-16 | 复旦大学 | Face sketch synthesis system for generating confrontation network based on condition enhancement |
CN111489287B (en) * | 2020-04-10 | 2024-02-09 | 腾讯科技(深圳)有限公司 | Image conversion method, device, computer equipment and storage medium |
CN113592724A (en) * | 2020-04-30 | 2021-11-02 | 北京金山云网络技术有限公司 | Target face image restoration method and device |
CN111508069B (en) * | 2020-05-22 | 2023-03-21 | 南京大学 | Three-dimensional face reconstruction method based on single hand-drawn sketch |
CN112132172A (en) * | 2020-08-04 | 2020-12-25 | 绍兴埃瓦科技有限公司 | Model training method, device, equipment and medium based on image processing |
CN112070658B (en) * | 2020-08-25 | 2024-04-16 | 西安理工大学 | Deep learning-based Chinese character font style migration method |
CN112149802B (en) * | 2020-09-17 | 2022-08-09 | 广西大学 | Image content conversion method with consistent semantic structure |
CN112862110B (en) * | 2021-02-11 | 2024-01-30 | 脸萌有限公司 | Model generation method and device and electronic equipment |
CN112949553A (en) * | 2021-03-22 | 2021-06-11 | 陈懋宁 | Face image restoration method based on self-attention cascade generation countermeasure network |
CN112837215B (en) * | 2021-03-31 | 2022-10-18 | 电子科技大学 | Image shape transformation method based on generation countermeasure network |
CN113205521A (en) * | 2021-04-23 | 2021-08-03 | 复旦大学 | Image segmentation method of medical image data |
CN113269256B (en) * | 2021-05-26 | 2024-08-27 | 广州密码营地信息科技有限公司 | Construction method and application of MiSrc-GAN medical image model |
CN113823296A (en) * | 2021-06-15 | 2021-12-21 | 腾讯科技(深圳)有限公司 | Voice data processing method and device, computer equipment and storage medium |
CN114299218A (en) * | 2021-12-13 | 2022-04-08 | 吉林大学 | System for searching real human face based on hand-drawing sketch |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109145992A (en) * | 2018-08-27 | 2019-01-04 | 西安电子科技大学 | Cooperation generates confrontation network and sky composes united hyperspectral image classification method |
CN109978165A (en) * | 2019-04-04 | 2019-07-05 | 重庆大学 | A kind of generation confrontation network method merged from attention mechanism |
-
2019
- 2019-09-24 CN CN201910909387.5A patent/CN110659727B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109145992A (en) * | 2018-08-27 | 2019-01-04 | 西安电子科技大学 | Cooperation generates confrontation network and sky composes united hyperspectral image classification method |
CN109978165A (en) * | 2019-04-04 | 2019-07-05 | 重庆大学 | A kind of generation confrontation network method merged from attention mechanism |
Non-Patent Citations (2)
Title |
---|
Self-Attention Generative Adversarial Networks;Han Zhang et al.;《arXiv》;20190114;第1-10页 * |
SketchyGAN: Towards Diverse and Realistic Sketch to Image Synthesis;Wengling Chen et al.;《arXiv》;20180412;第1-19页 * |
Also Published As
Publication number | Publication date |
---|---|
CN110659727A (en) | 2020-01-07 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110659727B (en) | Sketch-based image generation method | |
CN111047548B (en) | Attitude transformation data processing method and device, computer equipment and storage medium | |
CN106462724B (en) | Method and system based on normalized images verification face-image | |
CN110533712A (en) | A kind of binocular solid matching process based on convolutional neural networks | |
CN110544297A (en) | Three-dimensional model reconstruction method for single image | |
CN109685819A (en) | A kind of three-dimensional medical image segmentation method based on feature enhancing | |
CN110728219A (en) | 3D face generation method based on multi-column multi-scale graph convolution neural network | |
CN113658322B (en) | Three-dimensional voxel reconstruction method based on visual transducer | |
CN105981050A (en) | Method and system for exacting face features from data of face images | |
CN111127538A (en) | Multi-view image three-dimensional reconstruction method based on convolution cyclic coding-decoding structure | |
CN114998525A (en) | Action identification method based on dynamic local-global graph convolutional neural network | |
CN113792641A (en) | High-resolution lightweight human body posture estimation method combined with multispectral attention mechanism | |
CN113096239B (en) | Three-dimensional point cloud reconstruction method based on deep learning | |
CN114419412A (en) | Multi-modal feature fusion method and system for point cloud registration | |
CN113449612B (en) | Three-dimensional target point cloud identification method based on sub-flow sparse convolution | |
CN112634438A (en) | Single-frame depth image three-dimensional model reconstruction method and device based on countermeasure network | |
CN114004847A (en) | Medical image segmentation method based on graph reversible neural network | |
CN113688765A (en) | Attention mechanism-based action recognition method for adaptive graph convolution network | |
CN113344869A (en) | Driving environment real-time stereo matching method and device based on candidate parallax | |
CN115546032A (en) | Single-frame image super-resolution method based on feature fusion and attention mechanism | |
CN114758152A (en) | Feature matching method based on attention mechanism and neighborhood consistency | |
CN112509021A (en) | Parallax optimization method based on attention mechanism | |
CN114463235A (en) | Infrared and visible light image fusion method and device and storage medium | |
CN115222998A (en) | Image classification method | |
CN114612902A (en) | Image semantic segmentation method, device, equipment, storage medium and program product |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |