CN112307714A - Character style migration method based on double-stage deep network - Google Patents
Character style migration method based on double-stage deep network Download PDFInfo
- Publication number
- CN112307714A CN112307714A CN202011210655.3A CN202011210655A CN112307714A CN 112307714 A CN112307714 A CN 112307714A CN 202011210655 A CN202011210655 A CN 202011210655A CN 112307714 A CN112307714 A CN 112307714A
- Authority
- CN
- China
- Prior art keywords
- picture
- network
- stylized
- font
- loss
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000013508 migration Methods 0.000 title claims abstract description 110
- 230000005012 migration Effects 0.000 title claims abstract description 110
- 238000000034 method Methods 0.000 title claims abstract description 41
- 238000012549 training Methods 0.000 claims abstract description 103
- 238000013507 mapping Methods 0.000 claims description 21
- 238000010586 diagram Methods 0.000 claims description 19
- 238000010606 normalization Methods 0.000 claims description 17
- 238000005457 optimization Methods 0.000 claims description 13
- 230000009466 transformation Effects 0.000 claims description 10
- 238000006243 chemical reaction Methods 0.000 claims description 7
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 claims description 5
- 238000011176 pooling Methods 0.000 claims description 5
- 230000008569 process Effects 0.000 claims description 5
- 238000009877 rendering Methods 0.000 claims description 5
- 238000005070 sampling Methods 0.000 claims description 5
- 230000004927 fusion Effects 0.000 claims description 4
- 230000003044 adaptive effect Effects 0.000 claims description 3
- 230000000694 effects Effects 0.000 abstract description 7
- 238000013461 design Methods 0.000 abstract description 4
- 230000006870 function Effects 0.000 description 19
- 238000012546 transfer Methods 0.000 description 4
- 238000012545 processing Methods 0.000 description 3
- 210000002569 neuron Anatomy 0.000 description 2
- 230000004913 activation Effects 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000017105 transposition Effects 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/103—Formatting, i.e. changing of presentation of documents
- G06F40/109—Font handling; Temporal or kinetic typography
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Computation (AREA)
- Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Molecular Biology (AREA)
- Biophysics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Biomedical Technology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Image Processing (AREA)
Abstract
A character style migration method based on a two-stage depth network comprises the steps of firstly constructing a training data set A and a training data set B, then training a de-stylized network by adopting the training data set A to obtain a de-stylized network model, then training a font migration network by utilizing de-stylized pictures obtained by the de-stylized network model and the training data set B to obtain a font migration network model, migrating a certain font picture to be converted into a target reference font picture by utilizing the model, finally training a texture migration network by utilizing the target reference font picture and the training data set A to obtain a texture migration network model, and obtaining a final result of character style migration by utilizing the model. The design has excellent character style migration effect.
Description
Technical Field
The invention belongs to the field of deep learning and image style migration, and particularly relates to a character style migration method based on a two-stage deep network.
Background
Style migration of images refers to the task of migrating one style from one image to another to synthesize a new artistic image. In recent years, with the continuous development of artificial intelligence technology and global creative industry, it is becoming a demand to realize style migration of text images. People hope to generate more artistic fonts and apply the fonts to the design and the propaganda of industries such as commerce, culture and the like.
Style migration of text images differs from that of ordinary images, and involves both font (font) migration and texture (texture) migration of text. The former realizes the font transformation of characters with the same content, and the latter realizes the style appearance transformation of the characters. Manually synthesized text images with specific fonts and textures consume a lot of time and energy, so that the realization of text style migration by using an automatic and efficient method becomes a concern of people. However, the existing character style migration method is limited to direct conversion in a single stage, that is, both the font (font) and the texture (texture) of the character are migrated at one time in the same stage, and the obtained effect is often not ideal.
Disclosure of Invention
The invention aims to overcome the problems in the prior art and provide a character style migration method based on a two-stage deep network with better migration effect.
In order to achieve the above purpose, the invention provides the following technical scheme:
a character style migration method based on a two-stage deep network sequentially comprises the following steps:
the method comprises the following steps of firstly, constructing a training data set A and a training data set B, wherein the training data set A comprises stylized character pictures with various textures and de-stylized character pictures corresponding to the stylized character pictures, and the training data set B comprises a reference font and de-stylized character pictures with other various fonts;
constructing a de-stylized network, and training the de-stylized network by adopting a training data set A to obtain a de-stylized network model for de-stylizing the stylized character and image with texture;
thirdly, constructing a font migration network, training the font migration network by using de-stylized pictures obtained by the de-stylized network model and a training data set B to obtain a font migration network model for realizing conversion and migration of various fonts, and then migrating a certain font picture to be converted into a target reference font picture by using the model;
and step four, constructing a texture migration network, training the texture migration network by using the target reference font picture and the training data set A generated in the step three to obtain a texture migration network model for realizing stylized texture rendering of the font picture, and finally obtaining a final result of character style migration by using the model.
In step two, the de-stylized network includes an encoder EXEncoder EYAnd decoder GXThe method for training the de-stylized network by adopting the training data set A sequentially comprises the following steps:
2.1 de-stylized network randomly selects an image pair (x, y) from training data set A, and inputs them to encoder EXAnd EYWherein y is a stylized character picture with texture, and x is a de-stylized character picture corresponding to y;
2.2 encoder EXAnd EYMapping x and y to shared characteristic space, coding to generate respective characteristic diagram, and calculating characteristic loss L according to the characteristic diagramfeatAnd with LfeatTraining encoder E with minimum optimization for targetXAnd EY;
2.3 transmitting the characteristic diagram to a decoder GXGenerating a reconstructed de-stylized text picture, and then calculating pixel loss L from the reconstructed de-stylized text picture and picture xpixAnd with LpixTraining decoder G with minimum for goal optimizationXThe reconstructed de-stylized text picture is brought sufficiently close to picture x.
The de-stylized network also includes a discriminator DXSaid adoption trainingTraining the de-stylized network for dataset a further comprises:
2.4 inputting the reconstructed de-stylized character picture into a discriminator DXDetermining the authenticity of the product, and calculating the resistance loss LadvAnd optimized using an Adam optimizer.
The de-stylized network employs a total loss function to be optimized as:
L1=λfeatLfeat+λpixLpix+λadvLadv
Lfeat=Ex,y[||SX(EY(y))-z||1]
z=SX(GX(x))
Lpix=Ex,y[||GX(EY(y))-x||1]
in the above formula, Lfeat、Lpix、LadvCharacteristic loss, pixel loss and contrast loss, λ, respectivelyfeat、λpix、λadvRespectively, characteristic loss, pixel loss, and superparametric of countermeasures loss, SXIs GXZ is a content feature, λgpIn order to be a penalty factor,for de-stylized text picture G along picture x and reconstructedX(EY(y)) are uniformly sampled.
In the third step, the font migration network comprises a generator G and a discriminator D, and the font migration network trained by the de-stylized picture and the training data set B obtained by using the de-stylized network model sequentially comprises the following steps:
3.1, randomly selecting a picture x from a training data set B by a font migration network, inputting the picture x into a generator G, and generating a false picture G (x, c) by the generator G according to the picture x and a target font label c;
3.2, on one hand, the false picture G (x, c) is input into the generator G again to generate a reconstructed picture G (G (x, c)), the de-stylized picture of the de-stylized network model is taken as a target font picture to supervise in the reconstruction process, and then the font classification loss L of the generator is calculatedfAnd reconstruction loss LrecWith Lf、LrecOn the other hand, the false picture G (x, c) is input into a discriminator D to discriminate the authenticity of the false picture G and the font domain to which the picture belongs, and the font classification loss L of the discriminator is calculatedrAnd with LrAnd optimally training the discriminator D with the minimum as a target.
The loss function to be optimized adopted by the font migration network is as follows:
L2=LD+LG
LD=-Ladv+λ1Lr
LG=Ladv+λ1Lf+λ2Lrec
Lr=Ex,c'[-logD(c'|x)]
Lf=Ex,c'[-logD(c|G(x,c))]
Lrec=Ex,c[||x-G(G(x,c),c')||1]
in the above formula, LDFor discriminator loss, LGFor generator losses, Ladv、Lr、Lf、LrecRespectively, the countermeasure loss, the font classification loss of the discriminator, the font classification loss of the generator, the reconstruction loss, lambda1、λ2、λgpRespectively representing a font classification loss hyper-parameter, a reconstruction loss hyper-parameter and a penalty function for resisting loss,to sample evenly along the straight line between the real picture sample and the false picture G (x, c), D (c '| x) is the probability distribution that the discriminator D attributes the real picture sample to the original font domain c'.
In step 3.1, the generation method of the false picture G (x, c) is as follows: firstly, the image x and the target font label c are subjected to feature mapping and fusion, and then the image x and the target font label c are transmitted into a deep convolutional network for training.
In the fourth step, the texture migration network comprises an encoder f, a decoder g and an AdaIN self-adaptive normalization layer positioned between the encoder f and the decoder g, wherein the encoder f and the decoder g are constructed by taking a VGG-19 network structure as a reference, the encoder f selects a front L layer of a pre-trained VGG-19 network, the decoder g is a symmetric structure of the encoder f, and all pooling layers are replaced by upper sampling layers;
the training of the texture migration network by using the target reference font picture and the training data set A generated in the third step sequentially comprises the following steps:
4.1, firstly, mapping a font picture c and a texture style picture s to a feature space by using an encoder f to obtain f (c) and f(s), and then performing feature transformation on the font picture c and the texture style picture s by using an AdaIN self-adaptive normalization layer to obtain a feature map t ═ AdaIN (f (c) and f (s));
4.2, mapping the feature map t back to the original feature space by a decoder g to obtain a stylized result map g (t);
4.3, inputting the stylized result graph g (t) and the texture style picture s into an encoder f, and realizing the training of the texture migration network through the optimization of the loss function.
In step 4.3, the loss function is:
L3=Lc+λLs
Lc=||f(g(t))-t||2
in the above formula, LcFor content loss, LsFor style loss, λ is the hyper-parameter of style loss, φiIn the i-th layer of the encoder f, σ and μ are the variance and mean of each image channel, respectively.
In step 4.1, the feature transformation formula of the AdaIN adaptive normalization layer is as follows:
in the above formula, σ and μ are the variance and mean of each image channel, respectively.
Compared with the prior art, the invention has the beneficial effects that:
the invention relates to a character style migration method based on a two-stage depth network, which comprises the steps of firstly constructing a training data set A and a training data set B, then training a de-stylized network by adopting the training data set A to obtain a de-stylized network model for de-stylizing a stylized character image with textures, then training a font migration network by utilizing the de-stylized network model to obtain a font migration network model for realizing conversion and migration of various fonts, migrating a certain font image to be converted into a target reference font image by utilizing the model, finally training the texture migration network by utilizing the target reference font image and the training data set A to obtain a texture migration network model for realizing stylized texture rendering of the font image, and obtaining a final character migration result by utilizing the model The texture is transferred in stages, namely the first stage of character font transfer is carried out, and then the second stage of character texture transfer is carried out, so that a better character style transfer effect can be obtained. Therefore, the invention can obtain better character style migration effect.
Drawings
FIG. 1 is an overall flow chart of the present invention.
Fig. 2 is a schematic diagram of a training data set a in the present invention.
Fig. 3 is a schematic diagram of a training data set B in the present invention.
FIG. 4 is a schematic diagram of a de-stylized network of the present invention.
Fig. 5 is a schematic structural diagram of a font migration network according to the present invention.
FIG. 6 is a schematic structural diagram of a texture migration network according to the present invention.
Detailed Description
The present invention will be further described with reference to the following detailed description and accompanying drawings.
Referring to fig. 1 to 6, a text style migration method based on a two-stage deep network sequentially includes the following steps:
the method comprises the following steps of firstly, constructing a training data set A and a training data set B, wherein the training data set A comprises stylized character pictures with various textures and de-stylized character pictures corresponding to the stylized character pictures, and the training data set B comprises a reference font and de-stylized character pictures with other various fonts;
constructing a de-stylized network, and training the de-stylized network by adopting a training data set A to obtain a de-stylized network model for de-stylizing the stylized character and image with texture;
thirdly, constructing a font migration network, training the font migration network by using de-stylized pictures obtained by the de-stylized network model and a training data set B to obtain a font migration network model for realizing conversion and migration of various fonts, and then migrating a certain font picture to be converted into a target reference font picture by using the model;
and step four, constructing a texture migration network, training the texture migration network by using the target reference font picture and the training data set A generated in the step three to obtain a texture migration network model for realizing stylized texture rendering of the font picture, and finally obtaining a final result of character style migration by using the model.
In step two, the de-stylized network includes an encoder EXEncoder EYAnd decoder GXThe method for training the de-stylized network by adopting the training data set A sequentially comprises the following steps:
2.1 de-stylized network randomly selects an image pair (x, y) from training data set A, and inputs them to encoder EXAnd EYIn (A), itIn the method, y is a stylized character picture with texture, and x is a de-stylized character picture corresponding to y;
2.2 encoder EXAnd EYMapping x and y to shared characteristic space, coding to generate respective characteristic diagram, and calculating characteristic loss L according to the characteristic diagramfeatAnd with LfeatTraining encoder E with minimum optimization for targetXAnd EY;
2.3 transmitting the characteristic diagram to a decoder GXGenerating a reconstructed de-stylized text picture, and then calculating pixel loss L from the reconstructed de-stylized text picture and picture xpixAnd with LpixTraining decoder G with minimum for goal optimizationXThe reconstructed de-stylized text picture is brought sufficiently close to picture x.
The de-stylized network also includes a discriminator DXThe training of the de-stylized network using the training data set a further comprises:
2.4 inputting the reconstructed de-stylized character picture into a discriminator DXDetermining the authenticity of the product, and calculating the resistance loss LadvAnd optimized using an Adam optimizer.
The de-stylized network employs a total loss function to be optimized as:
L1=λfeatLfeat+λpixLpix+λadvLadv
Lfeat=Ex,y[||SX(EY(y))-z||1]
z=SX(GX(x))
Lpix=Ex,y[||GX(EY(y))-x||1]
in the above formula, Lfeat、Lpix、LadvCharacteristic loss, pixel loss and contrast loss, λ, respectivelyfeat、λpix、λadvRespectively, characteristic loss, pixel loss, and superparametric of countermeasures loss, SXIs GXZ is a content feature, λgpIn order to be a penalty factor,for de-stylized text picture G along picture x and reconstructedX(EY(y)) are uniformly sampled.
In the third step, the font migration network comprises a generator G and a discriminator D, and the font migration network trained by the de-stylized picture and the training data set B obtained by using the de-stylized network model sequentially comprises the following steps:
3.1, randomly selecting a picture x from a training data set B by a font migration network, inputting the picture x into a generator G, and generating a false picture G (x, c) by the generator G according to the picture x and a target font label c;
3.2, on one hand, the false picture G (x, c) is input into the generator G again to generate a reconstructed picture G (G (x, c)), the de-stylized picture of the de-stylized network model is taken as a target font picture to supervise in the reconstruction process, and then the font classification loss L of the generator is calculatedfAnd reconstruction loss LrecWith Lf、LrecOn the other hand, the false picture G (x, c) is input into a discriminator D to discriminate the authenticity of the false picture G and the font domain to which the picture belongs, and the font classification loss L of the discriminator is calculatedrAnd with LrAnd optimally training the discriminator D with the minimum as a target.
The loss function to be optimized adopted by the font migration network is as follows:
L2=LD+LG
LD=-Ladv+λ1Lr
LG=Ladv+λ1Lf+λ2Lrec
Lr=Ex,c'[-logD(c'|x)]
Lf=Ex,c'[-logD(c|G(x,c))]
Lrec=Ex,c[||x-G(G(x,c),c')||1]
in the above formula, LDFor discriminator loss, LGFor generator losses, Ladv、Lr、Lf、LrecRespectively, the countermeasure loss, the font classification loss of the discriminator, the font classification loss of the generator, the reconstruction loss, lambda1、λ2、λgpRespectively representing a font classification loss hyper-parameter, a reconstruction loss hyper-parameter and a penalty function for resisting loss,to sample evenly along the straight line between the real picture sample and the false picture G (x, c), D (c '| x) is the probability distribution that the discriminator D attributes the real picture sample to the original font domain c'.
In step 3.1, the generation method of the false picture G (x, c) is as follows: firstly, the image x and the target font label c are subjected to feature mapping and fusion, and then the image x and the target font label c are transmitted into a deep convolutional network for training.
In the fourth step, the texture migration network comprises an encoder f, a decoder g and an AdaIN self-adaptive normalization layer positioned between the encoder f and the decoder g, wherein the encoder f and the decoder g are constructed by taking a VGG-19 network structure as a reference, the encoder f selects a front L layer of a pre-trained VGG-19 network, the decoder g is a symmetric structure of the encoder f, and all pooling layers are replaced by upper sampling layers;
the training of the texture migration network by using the target reference font picture and the training data set A generated in the third step sequentially comprises the following steps:
4.1, firstly, mapping a font picture c and a texture style picture s to a feature space by using an encoder f to obtain f (c) and f(s), and then performing feature transformation on the font picture c and the texture style picture s by using an AdaIN self-adaptive normalization layer to obtain a feature map t ═ AdaIN (f (c) and f (s));
4.2, mapping the feature map t back to the original feature space by a decoder g to obtain a stylized result map g (t);
4.3, inputting the stylized result graph g (t) and the texture style picture s into an encoder f, and realizing the training of the texture migration network through the optimization of the loss function.
In step 4.3, the loss function is:
L3=Lc+λLs
Lc=||f(g(t))-t||2
in the above formula, LcFor content loss, LsFor style loss, λ is the hyper-parameter of style loss, φiIn the i-th layer of the encoder f, σ and μ are the variance and mean of each image channel, respectively.
In step 4.1, the feature transformation formula of the AdaIN adaptive normalization layer is as follows:
in the above formula, σ and μ are the variance and mean of each image channel, respectively.
The principle of the invention is illustrated as follows:
the invention provides a character style migration method based on a two-stage depth network, which is based on a de-stylized network consisting of two encoders, a decoder and a discriminator and realizes de-stylized processing of textured characters by optimizing characteristic loss, pixel loss and countermeasure loss; based on a font migration network of a generator and a discriminator, realizing the first-stage migration of the character fonts by optimizing the countermeasure loss and the font classification loss; based on a texture migration network of an encoder and a decoder with an AdaIN self-adaptive normalization layer, feature transformation is carried out through mean values and variances, content loss and style loss are optimized, and second-stage migration of character textures is achieved. The style migration character image obtained by the method has higher artistic effect, has wide application in the field of visual design, can be used for various aspects such as artistic image design, culture and commercial image propaganda, drawing text processing and the like, is not only suitable for digital and letter images, but also has better performance in the aspect of Chinese character migration.
Discriminator DX: in order to make the de-stylized reconstruction result more accurate, the invention adds a discriminator D in the de-stylized networkXAnd the method is used for determining the authenticity of the reconstructed picture.
Font classification loss L of discriminatorr=Ex,c'[-logD(c'|x)]We expect that the input font image x is converted to the output font image y and correctly classified into the target font domain c, D (c '| x) represents the probability distribution of the discriminator to classify the true samples into the original font domain c', and the goal of the discriminator D is to minimize this loss.
Font classification penalty L for generatorsf=Ex,c'[-logD(c|G(x,c))]The loss function is used to optimize the generator G so that the pictures generated by the generator G can be classified into the target font domain c by the discriminator D.
Loss of reconstruction Lrec=Ex,c[||x-G(G(x,c),c')||1]. Considering that the generator G may only change the font-related information of the input picture without changing the font content of the picture to trick the discriminator D, re-inputting the generated G (x, c) to the generator G results in the picture G (x, c)) which should be as consistent as possible with the picture x, a 1-norm is used for loss limitation.
For fighting against loss LadvThe model collapse problem is solved by adopting a WGAN method, namely:
the texture migration network defines two kinds of penalties: content loss LcAnd style loss Ls. Content loss employing network output imagesExpressing Euclidean distance from the AdaIN layer output feature diagram, and aiming at enabling the final output content of the model to be close to the AdaIN layer output feature diagram t sufficiently so as to accelerate convergence speed; and the style loss is obtained by coding the image generating the result by a coder again, acquiring the mean value and the variance of the feature map of each layer of the VGG network, and performing Euclidean distance summation on the mean value and the variance of the layer corresponding to the real style map.
Example 1:
referring to fig. 1, a text style migration method based on a two-stage deep network is sequentially performed according to the following steps:
1. reference documents: yang S, Liu J, Wang W, et al. TET-GAN: Text Effects Transfer Via Stylation and Destylation [ J ].2018. constructing a training data set A and a training data set B, wherein the training data set A comprises stylized character pictures with various textures and de-stylized character pictures corresponding to the stylized character pictures, and the training data set B comprises de-stylized character pictures with a reference font and other various fonts (see fig. 2 and 3);
2. constructing a de-stylized network comprising an encoder EXEncoder EYDecoder GXAnd a discriminator DXEncoder EXAnd EYThe last layers of weights are shared, and the network structure adopted by the de-stylized network is shown in table 1:
TABLE 1 de-stylized network architecture Table
The de-stylized network employs a total loss function to be optimized as:
L1=λfeatLfeat+λpixLpix+λadvLadv
in the above formula, Lfeat、Lpix、LadvCharacteristic loss, pixel loss and contrast loss, λ, respectivelyfeat、λpix、λadvRespectively characteristic loss, imagePrime loss, hyper-parameters to combat loss;
3. referring to fig. 4, training the de-stylized network using the training data set a specifically includes:
(1) the de-stylized network randomly selects an image pair (x, y) from the training data set A, and inputs the image pair into the encoder EXAnd EYWherein y is a stylized character picture with texture, and x is a de-stylized character picture corresponding to y;
(2) encoder EXAnd EYMapping x and y to shared characteristic space, coding to generate respective characteristic diagram, and calculating characteristic loss L according to the characteristic diagramfeatAnd with LfeatTraining encoder E with minimum optimization for targetXAnd EY;
The task of the encoder is to bring the result closer to the group Truth of the content feature. With SXRepresents GXThe content feature for guidance is defined as z ═ SX(GX(x)),LfeatFor guidance EYRemoving texture feature elements from the text image, and retaining core font information defined as
Lfeat=Ex,y[||SX(EY(y))-z||1];
(3) Transmitting the characteristic diagram to a decoder GXGenerating a reconstructed de-stylized text picture, and then calculating pixel loss L from the reconstructed de-stylized text picture and picture xpixAnd with LpixTraining decoder G with minimum for goal optimizationXMaking the reconstructed de-stylized text picture sufficiently close to the picture x;
the de-stylized network needs to make the generated reconstruction result close to picture x, so a pixel loss constraint is performed using a 1-norm, where pixel loss is defined as:
Lpix=Ex,y[||GX(EY(y))-x||1]
(4) inputting the reconstructed de-stylized character picture into a discriminator DXDetermining the authenticity of the product, and calculating the resistance loss Ladv(definition of anti-loss purpose in networkIs a guide GXAnd EYConfusion DX):
In the above formula, λgpIn order to be a penalty factor,for de-stylized text picture G along picture x and reconstructedX(EY(y)) a straight line between;
and optimized by using an Adam optimizer, and the learning rate parameter is set to be 0.0002 and lambdafeat=λpix=100,λgp=10,λ adv1, the encoder E available for de-formatting is finally obtainedYAnd decoder GXG of model GenerationX(EY(y)) the image is sufficiently close to the de-stylized picture x:
4. the font migration network is constructed and comprises a generator G and a discriminator D, wherein the generator G comprises 2 convolution layers, 6 residual error layers and 2 deconvolution layers, normalization processing is used, the total network flow is that the generator G reduces the model dimension to 4 times, then 6 residual error networks are used to obtain equal dimension output, then a transposition convolution is used to amplify 4 times, finally, through the convolution with unchanged layer size, tanh is taken as output, and the network structure adopted by the generator G is shown in a table 2:
table 2 structural table of generator G
The convolution kernel size in each convolution layer is 4 x 4 with a step size of 2, and the dimension is reduced 1/2 for each convolution operation. Normalizing IN an image channel by the normalization layer, and calculating an average value (IN) according to H x W; since the result of the generation is mainly dependent on a certain image instance, the normalization (BN) of the whole batch is not suitable for the stylization of the image, so that the normalization of H × W can speed up the model convergence and maintain independence between each image instance. The activation function adopts LeakyReLU, and as the output of the function has a small gradient to the negative value input, the derivative is always not zero, so that the occurrence of silent neurons can be reduced, learning based on the gradient is allowed, and the problem that the neurons cannot learn after the Relu function enters a negative interval is solved.
Furthermore, to try to avoid the over-fitting problem, we no longer fit the desired feature map directly with multiple stacked layers, but explicitly fit them to a residual map. Assuming the desired feature mapping is h (x), then the stacked non-linear layer fits another mapping, i.e., f (x) h (x) -x. Assuming that it is easier to optimize the residual mapping than the desired mapping, i.e. f (x) ═ h (x) -x is easier to optimize than f (x) ═ h (x), then in the extreme case the desired mapping will fit the identity mapping, where the task of the residual network is to fit f (x) ═ 0 and the common network will fit f (x) ═ x, which is obviously easier to optimize.
The discriminator D adopts a PatchGAN structure, carries out true and false classification on the local image blocks, does not use a normalization layer, uses the output of Conv1 to represent the prediction probability of a target font, uses the output of Conv2 to represent the judgment whether the image is true or not, and has parallel relation;
the loss function to be optimized adopted by the font migration network is as follows:
L2=LD+LG
LD=-Ladv+λ1Lr
LG=Ladv+λ1Lf+λ2Lrec
in the above formula, LDFor discriminator loss, LGFor generator losses, Ladv、Lr、Lf、LrecRespectively, the countermeasure loss, the font classification loss of the discriminator, the font classification loss of the generator, the reconstruction loss, lambda1、λ2The font classification loss super-parameter and the reconstruction loss super-parameter are respectively;
5. referring to fig. 5, a font migration network is trained by using de-stylized pictures obtained by the de-stylized network model and a training data set B to obtain a font migration network model for realizing conversion and migration of multiple fonts, which specifically includes:
(1) randomly selecting a picture x from a training data set B by a font migration network, inputting the picture x into a generator G, performing feature mapping and fusion on the picture x and a target font label c by the generator G, and then transmitting the picture x and the target font label c into a deep convolutional network for training to generate a false picture G (x, c);
(2) on one hand, the false picture G (x, c) is input into the generator G again to generate a reconstructed picture G (G (x, c)), the de-stylized picture of the de-stylized network model is taken as a target font picture to be supervised in the reconstruction process, the storage of the picture content in the picture conversion process is ensured, only the part of the domain difference is changed, and the font classification loss L of the generator is calculatedfAnd reconstruction loss LrecWith Lf、LrecOn the other hand, the false picture G (x, c) is input into a discriminator D to discriminate the authenticity of the false picture G and the font domain to which the picture belongs, and the font classification loss L of the discriminator is calculatedrAnd with LrAnd optimally training the discriminator D with the minimum target as follows:
Lr=Ex,c'[-logD(c'|x)]
Lf=Ex,c'[-logD(c|G(x,c))]
Lrec=Ex,c[||x-G(G(x,c),c')||1]
in the above formula, λgpIn order to counter the penalty function of the loss,d (c '| x) is the probability distribution of the discriminator D for classifying the real picture samples into the original font domain c' for uniform sampling along the straight line between the real picture samples and the false picture G (x, c);
the model is trained by adopting a parameter beta1=0.5,β2Adam optimizer 0.999, flipping the image at a probability level of 0.5 for increasing data, performing 1 generator update after 5 discriminator updates, setting the batch size of all trials to 16, training all models at a learning rate of 0.0001 in the first 10 epochs, and linearly attenuating the learning rate to 0 in the next 10 epochs;
6. migrating a certain font picture to be converted into a target reference font picture by using the obtained font migration network model;
7. constructing a texture migration network, wherein the texture migration network comprises an encoder f, a decoder g and an AdaIN self-adaptive normalization layer positioned between the encoder f and the decoder g, the encoder f and the decoder g are constructed by taking a VGG-19 network structure as a reference, the encoder f selects relu1_1 to relu4_1 parts of a pre-trained VGG-19 network, the decoder g is a symmetrical structure of the encoder f, but all pooling layers are replaced by upsampling layers, and the specific structure of the texture migration network is shown in Table 3:
table 3 texture migration network structure table
The convolution kernels of the network convolution layers are all 3 × 3 in size, the step length is 1, the window size of the MaxPool maximum pooling layer is 2 × 2, and the upper sampling layer adopts a nearest neighbor interpolation algorithm;
8. referring to fig. 6, a texture migration network is trained by using a target reference font picture and a training data set a to obtain a texture migration network model for implementing stylized texture rendering on a font picture, which specifically includes:
(1) firstly, a font picture c and a texture style picture s are mapped to a feature space by an encoder f to obtain f (c) and f(s), and then feature transformation is carried out on the font picture c and the texture style picture s by an AdaIN self-adaptive normalization layer to obtain a feature graph t ═ AdaIN (f (c), f (s)):
in the above formula, σ and μ are the variance and mean of each image channel respectively;
(2) mapping the feature map t back to the original feature space by a decoder g to obtain a stylized result map g (t);
(3) inputting the stylized result graph g (t) and the texture style picture s into an encoder f, and realizing the training of the texture migration network through the optimization of a loss function, wherein the loss function is as follows:
L3=Lc+λLs
Lc=||f(g(t))-t||2
in the above formula, LcFor content loss, LsFor style loss, λ is the hyper-parameter of style loss, φiThe layer i of the encoder f is used, and sigma and mu are respectively the variance and the mean value of each image channel;
optimization loss the Adam optimizer is selected and the batch size is set to 8.
Claims (10)
1. A character style migration method based on a two-stage deep network is characterized in that:
the method comprises the following steps in sequence:
the method comprises the following steps of firstly, constructing a training data set A and a training data set B, wherein the training data set A comprises stylized character pictures with various textures and de-stylized character pictures corresponding to the stylized character pictures, and the training data set B comprises a reference font and de-stylized character pictures with other various fonts;
constructing a de-stylized network, and training the de-stylized network by adopting a training data set A to obtain a de-stylized network model for de-stylizing the stylized character and image with texture;
thirdly, constructing a font migration network, training the font migration network by using de-stylized pictures obtained by the de-stylized network model and a training data set B to obtain a font migration network model for realizing conversion and migration of various fonts, and then migrating a certain font picture to be converted into a target reference font picture by using the model;
and step four, constructing a texture migration network, training the texture migration network by using the target reference font picture and the training data set A generated in the step three to obtain a texture migration network model for realizing stylized texture rendering of the font picture, and finally obtaining a final result of character style migration by using the model.
2. The text style migration method based on the two-stage deep network as claimed in claim 1, wherein:
in step two, the de-stylized network includes an encoder EXEncoder EYAnd decoder GXThe method for training the de-stylized network by adopting the training data set A sequentially comprises the following steps:
2.1 de-stylized network randomly selects an image pair (x, y) from training data set A, and inputs them to encoder EXAnd EYWherein y is a stylized character picture with texture, and x is a de-stylized character picture corresponding to y;
2.2 encoder EXAnd EYMapping x and y to shared characteristic space, coding to generate respective characteristic diagram, and calculating characteristic loss L according to the characteristic diagramfeatAnd with LfeatTraining encoder E with minimum optimization for targetXAnd EY;
2.3 transmitting the characteristic diagram to a decoder GXGenerating a reconstructed de-stylized text picture, and then calculating pixel loss L from the reconstructed de-stylized text picture and picture xpixAnd with LpixTraining decoder G with minimum for goal optimizationXThe reconstructed de-stylized text picture is brought sufficiently close to picture x.
3. The text style migration method based on the two-stage deep network as claimed in claim 2, wherein:
the de-stylized network also includes a discriminator DXThe training of the de-stylized network using the training data set a further comprises:
2.4 inputting the reconstructed de-stylized character picture into a discriminator DXDetermining the authenticity of the product, and calculating the resistance loss LadvAnd optimized using an Adam optimizer.
4. The method for text style migration based on the two-stage deep network as claimed in claim 3, wherein:
the de-stylized network employs a total loss function to be optimized as:
L1=λfeatLfeat+λpixLpix+λadvLadv
Lfeat=Ex,y[||SX(EY(y))-z||1]
z=SX(GX(x))
Lpix=Ex,y[||GX(EY(y))-x||1]
in the above formula, Lfeat、Lpix、LadvCharacteristic loss, pixel loss and contrast loss, λ, respectivelyfeat、λpix、λadvRespectively, characteristic loss, pixel loss, and superparametric of countermeasures loss, SXIs GXZ is a content feature, λgpIn order to be a penalty factor,for de-stylized text picture G along picture x and reconstructedX(EY(y)) are uniformly sampled.
5. The character style migration method based on the two-stage deep network as claimed in any one of claims 1-4, wherein:
in the third step, the font migration network comprises a generator G and a discriminator D, and the font migration network trained by the de-stylized picture and the training data set B obtained by using the de-stylized network model sequentially comprises the following steps:
3.1, randomly selecting a picture x from a training data set B by a font migration network, inputting the picture x into a generator G, and generating a false picture G (x, c) by the generator G according to the picture x and a target font label c;
3.2, on one hand, the false picture G (x, c) is input into the generator G again to generate a reconstructed picture G (G (x, c)), the de-stylized picture of the de-stylized network model is taken as a target font picture to supervise in the reconstruction process, and then the font classification loss L of the generator is calculatedfAnd reconstruction loss LrecWith Lf、LrecOn the other hand, the false picture G (x, c) is input into a discriminator D to discriminate the authenticity of the false picture G and the font domain to which the picture belongs, and the font classification loss L of the discriminator is calculatedrAnd with LrAnd optimally training the discriminator D with the minimum as a target.
6. The text style migration method based on the two-stage deep network as claimed in claim 5, wherein:
the loss function to be optimized adopted by the font migration network is as follows:
L2=LD+LG
LD=-Ladv+λ1Lr
LG=Ladv+λ1Lf+λ2Lrec
Lr=Ex,c'[-logD(c'|x)]
Lf=Ex,c'[-logD(c|G(x,c))]
Lrec=Ex,c[||x-G(G(x,c),c')||1]
in the above formula, LDFor discriminator loss, LGFor generator losses, Ladv、Lr、Lf、LrecRespectively, the countermeasure loss, the font classification loss of the discriminator, the font classification loss of the generator, the reconstruction loss, lambda1、λ2、λgpRespectively representing a font classification loss hyper-parameter, a reconstruction loss hyper-parameter and a penalty function for resisting loss,to sample evenly along the straight line between the real picture sample and the false picture G (x, c), D (c '| x) is the probability distribution that the discriminator D attributes the real picture sample to the original font domain c'.
7. The text style migration method based on the two-stage deep network as claimed in claim 5, wherein:
in step 3.1, the generation method of the false picture G (x, c) is as follows: firstly, the image x and the target font label c are subjected to feature mapping and fusion, and then the image x and the target font label c are transmitted into a deep convolutional network for training.
8. The character style migration method based on the two-stage deep network as claimed in any one of claims 1-4, wherein:
in the fourth step, the texture migration network comprises an encoder f, a decoder g and an AdaIN self-adaptive normalization layer positioned between the encoder f and the decoder g, wherein the encoder f and the decoder g are constructed by taking a VGG-19 network structure as a reference, the encoder f selects a front L layer of a pre-trained VGG-19 network, the decoder g is a symmetric structure of the encoder f, and all pooling layers are replaced by upper sampling layers;
the training of the texture migration network by using the target reference font picture and the training data set A generated in the third step sequentially comprises the following steps:
4.1, firstly, mapping a font picture c and a texture style picture s to a feature space by using an encoder f to obtain f (c) and f(s), and then performing feature transformation on the font picture c and the texture style picture s by using an AdaIN self-adaptive normalization layer to obtain a feature map t ═ AdaIN (f (c) and f (s));
4.2, mapping the feature map t back to the original feature space by a decoder g to obtain a stylized result map g (t);
4.3, inputting the stylized result graph g (t) and the texture style picture s into an encoder f, and realizing the training of the texture migration network through the optimization of the loss function.
9. The text style migration method based on the two-stage deep network as claimed in claim 8, wherein:
in step 4.3, the loss function is:
L3=Lc+λLs
Lc=||f(g(t))-t||2
in the above formula, LcFor content loss, LsFor style loss, λ is the hyper-parameter of style loss, φiIn the i-th layer of the encoder f, σ and μ are the variance and mean of each image channel, respectively.
10. The text style migration method based on the two-stage deep network as claimed in claim 8, wherein:
in step 4.1, the feature transformation formula of the AdaIN adaptive normalization layer is as follows:
in the above formula, σ and μ are the variance and mean of each image channel, respectively.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011210655.3A CN112307714B (en) | 2020-11-03 | 2020-11-03 | Text style migration method based on dual-stage depth network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011210655.3A CN112307714B (en) | 2020-11-03 | 2020-11-03 | Text style migration method based on dual-stage depth network |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112307714A true CN112307714A (en) | 2021-02-02 |
CN112307714B CN112307714B (en) | 2024-03-08 |
Family
ID=74332675
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011210655.3A Active CN112307714B (en) | 2020-11-03 | 2020-11-03 | Text style migration method based on dual-stage depth network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112307714B (en) |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112966470A (en) * | 2021-02-23 | 2021-06-15 | 北京三快在线科技有限公司 | Character generation method and device, storage medium and electronic equipment |
CN113421318A (en) * | 2021-06-30 | 2021-09-21 | 合肥高维数据技术有限公司 | Font style migration method and system based on multitask generation countermeasure network |
CN113553932A (en) * | 2021-07-14 | 2021-10-26 | 同济大学 | Calligraphy character erosion repairing method based on style migration |
CN113554549A (en) * | 2021-07-27 | 2021-10-26 | 深圳思谋信息科技有限公司 | Text image generation method and device, computer equipment and storage medium |
CN113807430A (en) * | 2021-09-15 | 2021-12-17 | 网易(杭州)网络有限公司 | Model training method and device, computer equipment and storage medium |
CN114240735A (en) * | 2021-11-17 | 2022-03-25 | 西安电子科技大学 | Method, system, storage medium, computer device and terminal for transferring any style |
CN114399427A (en) * | 2022-01-07 | 2022-04-26 | 福州大学 | Character effect migration method based on cyclic generation countermeasure network |
CN115310405A (en) * | 2022-07-21 | 2022-11-08 | 北京汉仪创新科技股份有限公司 | Font replacement method, system, device and medium based on countermeasure generation network |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2019025909A1 (en) * | 2017-08-01 | 2019-02-07 | 3M Innovative Properties Company | Neural style transfer for image varietization and recognition |
CN110443864A (en) * | 2019-07-24 | 2019-11-12 | 北京大学 | A kind of characters in a fancy style body automatic generation method based on single phase a small amount of sample learning |
CN110503598A (en) * | 2019-07-30 | 2019-11-26 | 西安理工大学 | The font style moving method of confrontation network is generated based on condition circulation consistency |
IT201900002557A1 (en) * | 2019-02-22 | 2020-08-22 | Univ Bologna Alma Mater Studiorum | IMAGE-BASED CODING METHOD AND SYSTEM |
-
2020
- 2020-11-03 CN CN202011210655.3A patent/CN112307714B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2019025909A1 (en) * | 2017-08-01 | 2019-02-07 | 3M Innovative Properties Company | Neural style transfer for image varietization and recognition |
IT201900002557A1 (en) * | 2019-02-22 | 2020-08-22 | Univ Bologna Alma Mater Studiorum | IMAGE-BASED CODING METHOD AND SYSTEM |
CN110443864A (en) * | 2019-07-24 | 2019-11-12 | 北京大学 | A kind of characters in a fancy style body automatic generation method based on single phase a small amount of sample learning |
CN110503598A (en) * | 2019-07-30 | 2019-11-26 | 西安理工大学 | The font style moving method of confrontation network is generated based on condition circulation consistency |
Non-Patent Citations (2)
Title |
---|
张惊雷;厚雅伟;: "基于改进循环生成式对抗网络的图像风格迁移", 电子与信息学报, no. 05 * |
王晓红;卢辉;麻祥才;: "基于生成对抗网络的风格化书法图像生成", 包装工程, no. 11, 10 June 2020 (2020-06-10) * |
Cited By (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112966470A (en) * | 2021-02-23 | 2021-06-15 | 北京三快在线科技有限公司 | Character generation method and device, storage medium and electronic equipment |
CN113421318A (en) * | 2021-06-30 | 2021-09-21 | 合肥高维数据技术有限公司 | Font style migration method and system based on multitask generation countermeasure network |
CN113421318B (en) * | 2021-06-30 | 2022-10-28 | 合肥高维数据技术有限公司 | Font style migration method and system based on multitask generation countermeasure network |
CN113553932A (en) * | 2021-07-14 | 2021-10-26 | 同济大学 | Calligraphy character erosion repairing method based on style migration |
CN113553932B (en) * | 2021-07-14 | 2022-05-13 | 同济大学 | Calligraphy character erosion repairing method based on style migration |
CN113554549A (en) * | 2021-07-27 | 2021-10-26 | 深圳思谋信息科技有限公司 | Text image generation method and device, computer equipment and storage medium |
CN113554549B (en) * | 2021-07-27 | 2024-03-29 | 深圳思谋信息科技有限公司 | Text image generation method, device, computer equipment and storage medium |
CN113807430A (en) * | 2021-09-15 | 2021-12-17 | 网易(杭州)网络有限公司 | Model training method and device, computer equipment and storage medium |
CN113807430B (en) * | 2021-09-15 | 2023-08-08 | 网易(杭州)网络有限公司 | Model training method, device, computer equipment and storage medium |
CN114240735B (en) * | 2021-11-17 | 2024-03-19 | 西安电子科技大学 | Arbitrary style migration method, system, storage medium, computer equipment and terminal |
CN114240735A (en) * | 2021-11-17 | 2022-03-25 | 西安电子科技大学 | Method, system, storage medium, computer device and terminal for transferring any style |
CN114399427A (en) * | 2022-01-07 | 2022-04-26 | 福州大学 | Character effect migration method based on cyclic generation countermeasure network |
CN114399427B (en) * | 2022-01-07 | 2024-06-28 | 福州大学 | Word effect migration method based on loop generation countermeasure network |
CN115310405A (en) * | 2022-07-21 | 2022-11-08 | 北京汉仪创新科技股份有限公司 | Font replacement method, system, device and medium based on countermeasure generation network |
Also Published As
Publication number | Publication date |
---|---|
CN112307714B (en) | 2024-03-08 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112307714A (en) | Character style migration method based on double-stage deep network | |
AU2020100710A4 (en) | A method for sentiment analysis of film reviews based on deep learning and natural language processing | |
CN109657156B (en) | Individualized recommendation method based on loop generation countermeasure network | |
CN111079532B (en) | Video content description method based on text self-encoder | |
CN111091045A (en) | Sign language identification method based on space-time attention mechanism | |
Gai et al. | New image denoising algorithm via improved deep convolutional neural network with perceptive loss | |
WO2022252272A1 (en) | Transfer learning-based method for improved vgg16 network pig identity recognition | |
CN108830913B (en) | Semantic level line draft coloring method based on user color guidance | |
CN110110323B (en) | Text emotion classification method and device and computer readable storage medium | |
CN109934158B (en) | Video emotion recognition method based on local enhanced motion history map and recursive convolutional neural network | |
CN114240735B (en) | Arbitrary style migration method, system, storage medium, computer equipment and terminal | |
CN113780249B (en) | Expression recognition model processing method, device, equipment, medium and program product | |
CN113095314B (en) | Formula identification method, device, storage medium and equipment | |
CN114066871B (en) | Method for training new coronal pneumonia focus area segmentation model | |
CN109711411B (en) | Image segmentation and identification method based on capsule neurons | |
CN116468938A (en) | Robust image classification method on label noisy data | |
CN114359631A (en) | Target classification and positioning method based on coding-decoding weak supervision network model | |
CN113378949A (en) | Dual-generation confrontation learning method based on capsule network and mixed attention | |
CN111768326A (en) | High-capacity data protection method based on GAN amplification image foreground object | |
Shariff et al. | Artificial (or) fake human face generator using generative adversarial network (GAN) machine learning model | |
WO2024060839A1 (en) | Object operation method and apparatus, computer device, and computer storage medium | |
CN109284765A (en) | The scene image classification method of convolutional neural networks based on negative value feature | |
CN117611838A (en) | Multi-label image classification method based on self-adaptive hypergraph convolutional network | |
CN115280329A (en) | Method and system for query training | |
CN111339734A (en) | Method for generating image based on text |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |