CN112307714A - Character style migration method based on double-stage deep network - Google Patents

Character style migration method based on double-stage deep network Download PDF

Info

Publication number
CN112307714A
CN112307714A CN202011210655.3A CN202011210655A CN112307714A CN 112307714 A CN112307714 A CN 112307714A CN 202011210655 A CN202011210655 A CN 202011210655A CN 112307714 A CN112307714 A CN 112307714A
Authority
CN
China
Prior art keywords
picture
network
stylized
font
loss
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011210655.3A
Other languages
Chinese (zh)
Other versions
CN112307714B (en
Inventor
陈金泽
李龙
吕奕杭
廖志寰
朱安娜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan University of Technology WUT
Original Assignee
Wuhan University of Technology WUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan University of Technology WUT filed Critical Wuhan University of Technology WUT
Priority to CN202011210655.3A priority Critical patent/CN112307714B/en
Publication of CN112307714A publication Critical patent/CN112307714A/en
Application granted granted Critical
Publication of CN112307714B publication Critical patent/CN112307714B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/103Formatting, i.e. changing of presentation of documents
    • G06F40/109Font handling; Temporal or kinetic typography
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Biomedical Technology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Image Processing (AREA)

Abstract

A character style migration method based on a two-stage depth network comprises the steps of firstly constructing a training data set A and a training data set B, then training a de-stylized network by adopting the training data set A to obtain a de-stylized network model, then training a font migration network by utilizing de-stylized pictures obtained by the de-stylized network model and the training data set B to obtain a font migration network model, migrating a certain font picture to be converted into a target reference font picture by utilizing the model, finally training a texture migration network by utilizing the target reference font picture and the training data set A to obtain a texture migration network model, and obtaining a final result of character style migration by utilizing the model. The design has excellent character style migration effect.

Description

Character style migration method based on double-stage deep network
Technical Field
The invention belongs to the field of deep learning and image style migration, and particularly relates to a character style migration method based on a two-stage deep network.
Background
Style migration of images refers to the task of migrating one style from one image to another to synthesize a new artistic image. In recent years, with the continuous development of artificial intelligence technology and global creative industry, it is becoming a demand to realize style migration of text images. People hope to generate more artistic fonts and apply the fonts to the design and the propaganda of industries such as commerce, culture and the like.
Style migration of text images differs from that of ordinary images, and involves both font (font) migration and texture (texture) migration of text. The former realizes the font transformation of characters with the same content, and the latter realizes the style appearance transformation of the characters. Manually synthesized text images with specific fonts and textures consume a lot of time and energy, so that the realization of text style migration by using an automatic and efficient method becomes a concern of people. However, the existing character style migration method is limited to direct conversion in a single stage, that is, both the font (font) and the texture (texture) of the character are migrated at one time in the same stage, and the obtained effect is often not ideal.
Disclosure of Invention
The invention aims to overcome the problems in the prior art and provide a character style migration method based on a two-stage deep network with better migration effect.
In order to achieve the above purpose, the invention provides the following technical scheme:
a character style migration method based on a two-stage deep network sequentially comprises the following steps:
the method comprises the following steps of firstly, constructing a training data set A and a training data set B, wherein the training data set A comprises stylized character pictures with various textures and de-stylized character pictures corresponding to the stylized character pictures, and the training data set B comprises a reference font and de-stylized character pictures with other various fonts;
constructing a de-stylized network, and training the de-stylized network by adopting a training data set A to obtain a de-stylized network model for de-stylizing the stylized character and image with texture;
thirdly, constructing a font migration network, training the font migration network by using de-stylized pictures obtained by the de-stylized network model and a training data set B to obtain a font migration network model for realizing conversion and migration of various fonts, and then migrating a certain font picture to be converted into a target reference font picture by using the model;
and step four, constructing a texture migration network, training the texture migration network by using the target reference font picture and the training data set A generated in the step three to obtain a texture migration network model for realizing stylized texture rendering of the font picture, and finally obtaining a final result of character style migration by using the model.
In step two, the de-stylized network includes an encoder EXEncoder EYAnd decoder GXThe method for training the de-stylized network by adopting the training data set A sequentially comprises the following steps:
2.1 de-stylized network randomly selects an image pair (x, y) from training data set A, and inputs them to encoder EXAnd EYWherein y is a stylized character picture with texture, and x is a de-stylized character picture corresponding to y;
2.2 encoder EXAnd EYMapping x and y to shared characteristic space, coding to generate respective characteristic diagram, and calculating characteristic loss L according to the characteristic diagramfeatAnd with LfeatTraining encoder E with minimum optimization for targetXAnd EY
2.3 transmitting the characteristic diagram to a decoder GXGenerating a reconstructed de-stylized text picture, and then calculating pixel loss L from the reconstructed de-stylized text picture and picture xpixAnd with LpixTraining decoder G with minimum for goal optimizationXThe reconstructed de-stylized text picture is brought sufficiently close to picture x.
The de-stylized network also includes a discriminator DXSaid adoption trainingTraining the de-stylized network for dataset a further comprises:
2.4 inputting the reconstructed de-stylized character picture into a discriminator DXDetermining the authenticity of the product, and calculating the resistance loss LadvAnd optimized using an Adam optimizer.
The de-stylized network employs a total loss function to be optimized as:
L1=λfeatLfeatpixLpixadvLadv
Lfeat=Ex,y[||SX(EY(y))-z||1]
z=SX(GX(x))
Lpix=Ex,y[||GX(EY(y))-x||1]
Figure BDA0002758623170000031
in the above formula, Lfeat、Lpix、LadvCharacteristic loss, pixel loss and contrast loss, λ, respectivelyfeat、λpix、λadvRespectively, characteristic loss, pixel loss, and superparametric of countermeasures loss, SXIs GXZ is a content feature, λgpIn order to be a penalty factor,
Figure BDA0002758623170000032
for de-stylized text picture G along picture x and reconstructedX(EY(y)) are uniformly sampled.
In the third step, the font migration network comprises a generator G and a discriminator D, and the font migration network trained by the de-stylized picture and the training data set B obtained by using the de-stylized network model sequentially comprises the following steps:
3.1, randomly selecting a picture x from a training data set B by a font migration network, inputting the picture x into a generator G, and generating a false picture G (x, c) by the generator G according to the picture x and a target font label c;
3.2, on one hand, the false picture G (x, c) is input into the generator G again to generate a reconstructed picture G (G (x, c)), the de-stylized picture of the de-stylized network model is taken as a target font picture to supervise in the reconstruction process, and then the font classification loss L of the generator is calculatedfAnd reconstruction loss LrecWith Lf、LrecOn the other hand, the false picture G (x, c) is input into a discriminator D to discriminate the authenticity of the false picture G and the font domain to which the picture belongs, and the font classification loss L of the discriminator is calculatedrAnd with LrAnd optimally training the discriminator D with the minimum as a target.
The loss function to be optimized adopted by the font migration network is as follows:
L2=LD+LG
LD=-Ladv1Lr
LG=Ladv1Lf2Lrec
Figure BDA0002758623170000033
Lr=Ex,c'[-logD(c'|x)]
Lf=Ex,c'[-logD(c|G(x,c))]
Lrec=Ex,c[||x-G(G(x,c),c')||1]
in the above formula, LDFor discriminator loss, LGFor generator losses, Ladv、Lr、Lf、LrecRespectively, the countermeasure loss, the font classification loss of the discriminator, the font classification loss of the generator, the reconstruction loss, lambda1、λ2、λgpRespectively representing a font classification loss hyper-parameter, a reconstruction loss hyper-parameter and a penalty function for resisting loss,
Figure BDA0002758623170000041
to sample evenly along the straight line between the real picture sample and the false picture G (x, c), D (c '| x) is the probability distribution that the discriminator D attributes the real picture sample to the original font domain c'.
In step 3.1, the generation method of the false picture G (x, c) is as follows: firstly, the image x and the target font label c are subjected to feature mapping and fusion, and then the image x and the target font label c are transmitted into a deep convolutional network for training.
In the fourth step, the texture migration network comprises an encoder f, a decoder g and an AdaIN self-adaptive normalization layer positioned between the encoder f and the decoder g, wherein the encoder f and the decoder g are constructed by taking a VGG-19 network structure as a reference, the encoder f selects a front L layer of a pre-trained VGG-19 network, the decoder g is a symmetric structure of the encoder f, and all pooling layers are replaced by upper sampling layers;
the training of the texture migration network by using the target reference font picture and the training data set A generated in the third step sequentially comprises the following steps:
4.1, firstly, mapping a font picture c and a texture style picture s to a feature space by using an encoder f to obtain f (c) and f(s), and then performing feature transformation on the font picture c and the texture style picture s by using an AdaIN self-adaptive normalization layer to obtain a feature map t ═ AdaIN (f (c) and f (s));
4.2, mapping the feature map t back to the original feature space by a decoder g to obtain a stylized result map g (t);
4.3, inputting the stylized result graph g (t) and the texture style picture s into an encoder f, and realizing the training of the texture migration network through the optimization of the loss function.
In step 4.3, the loss function is:
L3=Lc+λLs
Lc=||f(g(t))-t||2
Figure BDA0002758623170000051
in the above formula, LcFor content loss, LsFor style loss, λ is the hyper-parameter of style loss, φiIn the i-th layer of the encoder f, σ and μ are the variance and mean of each image channel, respectively.
In step 4.1, the feature transformation formula of the AdaIN adaptive normalization layer is as follows:
Figure BDA0002758623170000052
in the above formula, σ and μ are the variance and mean of each image channel, respectively.
Compared with the prior art, the invention has the beneficial effects that:
the invention relates to a character style migration method based on a two-stage depth network, which comprises the steps of firstly constructing a training data set A and a training data set B, then training a de-stylized network by adopting the training data set A to obtain a de-stylized network model for de-stylizing a stylized character image with textures, then training a font migration network by utilizing the de-stylized network model to obtain a font migration network model for realizing conversion and migration of various fonts, migrating a certain font image to be converted into a target reference font image by utilizing the model, finally training the texture migration network by utilizing the target reference font image and the training data set A to obtain a texture migration network model for realizing stylized texture rendering of the font image, and obtaining a final character migration result by utilizing the model The texture is transferred in stages, namely the first stage of character font transfer is carried out, and then the second stage of character texture transfer is carried out, so that a better character style transfer effect can be obtained. Therefore, the invention can obtain better character style migration effect.
Drawings
FIG. 1 is an overall flow chart of the present invention.
Fig. 2 is a schematic diagram of a training data set a in the present invention.
Fig. 3 is a schematic diagram of a training data set B in the present invention.
FIG. 4 is a schematic diagram of a de-stylized network of the present invention.
Fig. 5 is a schematic structural diagram of a font migration network according to the present invention.
FIG. 6 is a schematic structural diagram of a texture migration network according to the present invention.
Detailed Description
The present invention will be further described with reference to the following detailed description and accompanying drawings.
Referring to fig. 1 to 6, a text style migration method based on a two-stage deep network sequentially includes the following steps:
the method comprises the following steps of firstly, constructing a training data set A and a training data set B, wherein the training data set A comprises stylized character pictures with various textures and de-stylized character pictures corresponding to the stylized character pictures, and the training data set B comprises a reference font and de-stylized character pictures with other various fonts;
constructing a de-stylized network, and training the de-stylized network by adopting a training data set A to obtain a de-stylized network model for de-stylizing the stylized character and image with texture;
thirdly, constructing a font migration network, training the font migration network by using de-stylized pictures obtained by the de-stylized network model and a training data set B to obtain a font migration network model for realizing conversion and migration of various fonts, and then migrating a certain font picture to be converted into a target reference font picture by using the model;
and step four, constructing a texture migration network, training the texture migration network by using the target reference font picture and the training data set A generated in the step three to obtain a texture migration network model for realizing stylized texture rendering of the font picture, and finally obtaining a final result of character style migration by using the model.
In step two, the de-stylized network includes an encoder EXEncoder EYAnd decoder GXThe method for training the de-stylized network by adopting the training data set A sequentially comprises the following steps:
2.1 de-stylized network randomly selects an image pair (x, y) from training data set A, and inputs them to encoder EXAnd EYIn (A), itIn the method, y is a stylized character picture with texture, and x is a de-stylized character picture corresponding to y;
2.2 encoder EXAnd EYMapping x and y to shared characteristic space, coding to generate respective characteristic diagram, and calculating characteristic loss L according to the characteristic diagramfeatAnd with LfeatTraining encoder E with minimum optimization for targetXAnd EY
2.3 transmitting the characteristic diagram to a decoder GXGenerating a reconstructed de-stylized text picture, and then calculating pixel loss L from the reconstructed de-stylized text picture and picture xpixAnd with LpixTraining decoder G with minimum for goal optimizationXThe reconstructed de-stylized text picture is brought sufficiently close to picture x.
The de-stylized network also includes a discriminator DXThe training of the de-stylized network using the training data set a further comprises:
2.4 inputting the reconstructed de-stylized character picture into a discriminator DXDetermining the authenticity of the product, and calculating the resistance loss LadvAnd optimized using an Adam optimizer.
The de-stylized network employs a total loss function to be optimized as:
L1=λfeatLfeatpixLpixadvLadv
Lfeat=Ex,y[||SX(EY(y))-z||1]
z=SX(GX(x))
Lpix=Ex,y[||GX(EY(y))-x||1]
Figure BDA0002758623170000071
in the above formula, Lfeat、Lpix、LadvCharacteristic loss, pixel loss and contrast loss, λ, respectivelyfeat、λpix、λadvRespectively, characteristic loss, pixel loss, and superparametric of countermeasures loss, SXIs GXZ is a content feature, λgpIn order to be a penalty factor,
Figure BDA0002758623170000072
for de-stylized text picture G along picture x and reconstructedX(EY(y)) are uniformly sampled.
In the third step, the font migration network comprises a generator G and a discriminator D, and the font migration network trained by the de-stylized picture and the training data set B obtained by using the de-stylized network model sequentially comprises the following steps:
3.1, randomly selecting a picture x from a training data set B by a font migration network, inputting the picture x into a generator G, and generating a false picture G (x, c) by the generator G according to the picture x and a target font label c;
3.2, on one hand, the false picture G (x, c) is input into the generator G again to generate a reconstructed picture G (G (x, c)), the de-stylized picture of the de-stylized network model is taken as a target font picture to supervise in the reconstruction process, and then the font classification loss L of the generator is calculatedfAnd reconstruction loss LrecWith Lf、LrecOn the other hand, the false picture G (x, c) is input into a discriminator D to discriminate the authenticity of the false picture G and the font domain to which the picture belongs, and the font classification loss L of the discriminator is calculatedrAnd with LrAnd optimally training the discriminator D with the minimum as a target.
The loss function to be optimized adopted by the font migration network is as follows:
L2=LD+LG
LD=-Ladv1Lr
LG=Ladv1Lf2Lrec
Figure BDA0002758623170000081
Lr=Ex,c'[-logD(c'|x)]
Lf=Ex,c'[-logD(c|G(x,c))]
Lrec=Ex,c[||x-G(G(x,c),c')||1]
in the above formula, LDFor discriminator loss, LGFor generator losses, Ladv、Lr、Lf、LrecRespectively, the countermeasure loss, the font classification loss of the discriminator, the font classification loss of the generator, the reconstruction loss, lambda1、λ2、λgpRespectively representing a font classification loss hyper-parameter, a reconstruction loss hyper-parameter and a penalty function for resisting loss,
Figure BDA0002758623170000082
to sample evenly along the straight line between the real picture sample and the false picture G (x, c), D (c '| x) is the probability distribution that the discriminator D attributes the real picture sample to the original font domain c'.
In step 3.1, the generation method of the false picture G (x, c) is as follows: firstly, the image x and the target font label c are subjected to feature mapping and fusion, and then the image x and the target font label c are transmitted into a deep convolutional network for training.
In the fourth step, the texture migration network comprises an encoder f, a decoder g and an AdaIN self-adaptive normalization layer positioned between the encoder f and the decoder g, wherein the encoder f and the decoder g are constructed by taking a VGG-19 network structure as a reference, the encoder f selects a front L layer of a pre-trained VGG-19 network, the decoder g is a symmetric structure of the encoder f, and all pooling layers are replaced by upper sampling layers;
the training of the texture migration network by using the target reference font picture and the training data set A generated in the third step sequentially comprises the following steps:
4.1, firstly, mapping a font picture c and a texture style picture s to a feature space by using an encoder f to obtain f (c) and f(s), and then performing feature transformation on the font picture c and the texture style picture s by using an AdaIN self-adaptive normalization layer to obtain a feature map t ═ AdaIN (f (c) and f (s));
4.2, mapping the feature map t back to the original feature space by a decoder g to obtain a stylized result map g (t);
4.3, inputting the stylized result graph g (t) and the texture style picture s into an encoder f, and realizing the training of the texture migration network through the optimization of the loss function.
In step 4.3, the loss function is:
L3=Lc+λLs
Lc=||f(g(t))-t||2
Figure BDA0002758623170000091
in the above formula, LcFor content loss, LsFor style loss, λ is the hyper-parameter of style loss, φiIn the i-th layer of the encoder f, σ and μ are the variance and mean of each image channel, respectively.
In step 4.1, the feature transformation formula of the AdaIN adaptive normalization layer is as follows:
Figure BDA0002758623170000092
in the above formula, σ and μ are the variance and mean of each image channel, respectively.
The principle of the invention is illustrated as follows:
the invention provides a character style migration method based on a two-stage depth network, which is based on a de-stylized network consisting of two encoders, a decoder and a discriminator and realizes de-stylized processing of textured characters by optimizing characteristic loss, pixel loss and countermeasure loss; based on a font migration network of a generator and a discriminator, realizing the first-stage migration of the character fonts by optimizing the countermeasure loss and the font classification loss; based on a texture migration network of an encoder and a decoder with an AdaIN self-adaptive normalization layer, feature transformation is carried out through mean values and variances, content loss and style loss are optimized, and second-stage migration of character textures is achieved. The style migration character image obtained by the method has higher artistic effect, has wide application in the field of visual design, can be used for various aspects such as artistic image design, culture and commercial image propaganda, drawing text processing and the like, is not only suitable for digital and letter images, but also has better performance in the aspect of Chinese character migration.
Discriminator DX: in order to make the de-stylized reconstruction result more accurate, the invention adds a discriminator D in the de-stylized networkXAnd the method is used for determining the authenticity of the reconstructed picture.
Font classification loss L of discriminatorr=Ex,c'[-logD(c'|x)]We expect that the input font image x is converted to the output font image y and correctly classified into the target font domain c, D (c '| x) represents the probability distribution of the discriminator to classify the true samples into the original font domain c', and the goal of the discriminator D is to minimize this loss.
Font classification penalty L for generatorsf=Ex,c'[-logD(c|G(x,c))]The loss function is used to optimize the generator G so that the pictures generated by the generator G can be classified into the target font domain c by the discriminator D.
Loss of reconstruction Lrec=Ex,c[||x-G(G(x,c),c')||1]. Considering that the generator G may only change the font-related information of the input picture without changing the font content of the picture to trick the discriminator D, re-inputting the generated G (x, c) to the generator G results in the picture G (x, c)) which should be as consistent as possible with the picture x, a 1-norm is used for loss limitation.
For fighting against loss LadvThe model collapse problem is solved by adopting a WGAN method, namely:
Figure BDA0002758623170000101
the texture migration network defines two kinds of penalties: content loss LcAnd style loss Ls. Content loss employing network output imagesExpressing Euclidean distance from the AdaIN layer output feature diagram, and aiming at enabling the final output content of the model to be close to the AdaIN layer output feature diagram t sufficiently so as to accelerate convergence speed; and the style loss is obtained by coding the image generating the result by a coder again, acquiring the mean value and the variance of the feature map of each layer of the VGG network, and performing Euclidean distance summation on the mean value and the variance of the layer corresponding to the real style map.
Example 1:
referring to fig. 1, a text style migration method based on a two-stage deep network is sequentially performed according to the following steps:
1. reference documents: yang S, Liu J, Wang W, et al. TET-GAN: Text Effects Transfer Via Stylation and Destylation [ J ].2018. constructing a training data set A and a training data set B, wherein the training data set A comprises stylized character pictures with various textures and de-stylized character pictures corresponding to the stylized character pictures, and the training data set B comprises de-stylized character pictures with a reference font and other various fonts (see fig. 2 and 3);
2. constructing a de-stylized network comprising an encoder EXEncoder EYDecoder GXAnd a discriminator DXEncoder EXAnd EYThe last layers of weights are shared, and the network structure adopted by the de-stylized network is shown in table 1:
TABLE 1 de-stylized network architecture Table
Figure BDA0002758623170000111
The de-stylized network employs a total loss function to be optimized as:
L1=λfeatLfeatpixLpixadvLadv
in the above formula, Lfeat、Lpix、LadvCharacteristic loss, pixel loss and contrast loss, λ, respectivelyfeat、λpix、λadvRespectively characteristic loss, imagePrime loss, hyper-parameters to combat loss;
3. referring to fig. 4, training the de-stylized network using the training data set a specifically includes:
(1) the de-stylized network randomly selects an image pair (x, y) from the training data set A, and inputs the image pair into the encoder EXAnd EYWherein y is a stylized character picture with texture, and x is a de-stylized character picture corresponding to y;
(2) encoder EXAnd EYMapping x and y to shared characteristic space, coding to generate respective characteristic diagram, and calculating characteristic loss L according to the characteristic diagramfeatAnd with LfeatTraining encoder E with minimum optimization for targetXAnd EY
The task of the encoder is to bring the result closer to the group Truth of the content feature. With SXRepresents GXThe content feature for guidance is defined as z ═ SX(GX(x)),LfeatFor guidance EYRemoving texture feature elements from the text image, and retaining core font information defined as
Lfeat=Ex,y[||SX(EY(y))-z||1];
(3) Transmitting the characteristic diagram to a decoder GXGenerating a reconstructed de-stylized text picture, and then calculating pixel loss L from the reconstructed de-stylized text picture and picture xpixAnd with LpixTraining decoder G with minimum for goal optimizationXMaking the reconstructed de-stylized text picture sufficiently close to the picture x;
the de-stylized network needs to make the generated reconstruction result close to picture x, so a pixel loss constraint is performed using a 1-norm, where pixel loss is defined as:
Lpix=Ex,y[||GX(EY(y))-x||1]
(4) inputting the reconstructed de-stylized character picture into a discriminator DXDetermining the authenticity of the product, and calculating the resistance loss Ladv(definition of anti-loss purpose in networkIs a guide GXAnd EYConfusion DX):
Figure BDA0002758623170000121
In the above formula, λgpIn order to be a penalty factor,
Figure BDA0002758623170000131
for de-stylized text picture G along picture x and reconstructedX(EY(y)) a straight line between;
and optimized by using an Adam optimizer, and the learning rate parameter is set to be 0.0002 and lambdafeat=λpix=100,λgp=10,λ adv1, the encoder E available for de-formatting is finally obtainedYAnd decoder GXG of model GenerationX(EY(y)) the image is sufficiently close to the de-stylized picture x:
4. the font migration network is constructed and comprises a generator G and a discriminator D, wherein the generator G comprises 2 convolution layers, 6 residual error layers and 2 deconvolution layers, normalization processing is used, the total network flow is that the generator G reduces the model dimension to 4 times, then 6 residual error networks are used to obtain equal dimension output, then a transposition convolution is used to amplify 4 times, finally, through the convolution with unchanged layer size, tanh is taken as output, and the network structure adopted by the generator G is shown in a table 2:
table 2 structural table of generator G
Figure BDA0002758623170000132
The convolution kernel size in each convolution layer is 4 x 4 with a step size of 2, and the dimension is reduced 1/2 for each convolution operation. Normalizing IN an image channel by the normalization layer, and calculating an average value (IN) according to H x W; since the result of the generation is mainly dependent on a certain image instance, the normalization (BN) of the whole batch is not suitable for the stylization of the image, so that the normalization of H × W can speed up the model convergence and maintain independence between each image instance. The activation function adopts LeakyReLU, and as the output of the function has a small gradient to the negative value input, the derivative is always not zero, so that the occurrence of silent neurons can be reduced, learning based on the gradient is allowed, and the problem that the neurons cannot learn after the Relu function enters a negative interval is solved.
Furthermore, to try to avoid the over-fitting problem, we no longer fit the desired feature map directly with multiple stacked layers, but explicitly fit them to a residual map. Assuming the desired feature mapping is h (x), then the stacked non-linear layer fits another mapping, i.e., f (x) h (x) -x. Assuming that it is easier to optimize the residual mapping than the desired mapping, i.e. f (x) ═ h (x) -x is easier to optimize than f (x) ═ h (x), then in the extreme case the desired mapping will fit the identity mapping, where the task of the residual network is to fit f (x) ═ 0 and the common network will fit f (x) ═ x, which is obviously easier to optimize.
The discriminator D adopts a PatchGAN structure, carries out true and false classification on the local image blocks, does not use a normalization layer, uses the output of Conv1 to represent the prediction probability of a target font, uses the output of Conv2 to represent the judgment whether the image is true or not, and has parallel relation;
the loss function to be optimized adopted by the font migration network is as follows:
L2=LD+LG
LD=-Ladv1Lr
LG=Ladv1Lf2Lrec
in the above formula, LDFor discriminator loss, LGFor generator losses, Ladv、Lr、Lf、LrecRespectively, the countermeasure loss, the font classification loss of the discriminator, the font classification loss of the generator, the reconstruction loss, lambda1、λ2The font classification loss super-parameter and the reconstruction loss super-parameter are respectively;
5. referring to fig. 5, a font migration network is trained by using de-stylized pictures obtained by the de-stylized network model and a training data set B to obtain a font migration network model for realizing conversion and migration of multiple fonts, which specifically includes:
(1) randomly selecting a picture x from a training data set B by a font migration network, inputting the picture x into a generator G, performing feature mapping and fusion on the picture x and a target font label c by the generator G, and then transmitting the picture x and the target font label c into a deep convolutional network for training to generate a false picture G (x, c);
(2) on one hand, the false picture G (x, c) is input into the generator G again to generate a reconstructed picture G (G (x, c)), the de-stylized picture of the de-stylized network model is taken as a target font picture to be supervised in the reconstruction process, the storage of the picture content in the picture conversion process is ensured, only the part of the domain difference is changed, and the font classification loss L of the generator is calculatedfAnd reconstruction loss LrecWith Lf、LrecOn the other hand, the false picture G (x, c) is input into a discriminator D to discriminate the authenticity of the false picture G and the font domain to which the picture belongs, and the font classification loss L of the discriminator is calculatedrAnd with LrAnd optimally training the discriminator D with the minimum target as follows:
Figure BDA0002758623170000151
Lr=Ex,c'[-logD(c'|x)]
Lf=Ex,c'[-logD(c|G(x,c))]
Lrec=Ex,c[||x-G(G(x,c),c')||1]
in the above formula, λgpIn order to counter the penalty function of the loss,
Figure BDA0002758623170000152
d (c '| x) is the probability distribution of the discriminator D for classifying the real picture samples into the original font domain c' for uniform sampling along the straight line between the real picture samples and the false picture G (x, c);
the model is trained by adopting a parameter beta1=0.5,β2Adam optimizer 0.999, flipping the image at a probability level of 0.5 for increasing data, performing 1 generator update after 5 discriminator updates, setting the batch size of all trials to 16, training all models at a learning rate of 0.0001 in the first 10 epochs, and linearly attenuating the learning rate to 0 in the next 10 epochs;
6. migrating a certain font picture to be converted into a target reference font picture by using the obtained font migration network model;
7. constructing a texture migration network, wherein the texture migration network comprises an encoder f, a decoder g and an AdaIN self-adaptive normalization layer positioned between the encoder f and the decoder g, the encoder f and the decoder g are constructed by taking a VGG-19 network structure as a reference, the encoder f selects relu1_1 to relu4_1 parts of a pre-trained VGG-19 network, the decoder g is a symmetrical structure of the encoder f, but all pooling layers are replaced by upsampling layers, and the specific structure of the texture migration network is shown in Table 3:
table 3 texture migration network structure table
Figure BDA0002758623170000161
The convolution kernels of the network convolution layers are all 3 × 3 in size, the step length is 1, the window size of the MaxPool maximum pooling layer is 2 × 2, and the upper sampling layer adopts a nearest neighbor interpolation algorithm;
8. referring to fig. 6, a texture migration network is trained by using a target reference font picture and a training data set a to obtain a texture migration network model for implementing stylized texture rendering on a font picture, which specifically includes:
(1) firstly, a font picture c and a texture style picture s are mapped to a feature space by an encoder f to obtain f (c) and f(s), and then feature transformation is carried out on the font picture c and the texture style picture s by an AdaIN self-adaptive normalization layer to obtain a feature graph t ═ AdaIN (f (c), f (s)):
Figure BDA0002758623170000162
in the above formula, σ and μ are the variance and mean of each image channel respectively;
(2) mapping the feature map t back to the original feature space by a decoder g to obtain a stylized result map g (t);
(3) inputting the stylized result graph g (t) and the texture style picture s into an encoder f, and realizing the training of the texture migration network through the optimization of a loss function, wherein the loss function is as follows:
L3=Lc+λLs
Lc=||f(g(t))-t||2
Figure BDA0002758623170000171
in the above formula, LcFor content loss, LsFor style loss, λ is the hyper-parameter of style loss, φiThe layer i of the encoder f is used, and sigma and mu are respectively the variance and the mean value of each image channel;
optimization loss the Adam optimizer is selected and the batch size is set to 8.

Claims (10)

1. A character style migration method based on a two-stage deep network is characterized in that:
the method comprises the following steps in sequence:
the method comprises the following steps of firstly, constructing a training data set A and a training data set B, wherein the training data set A comprises stylized character pictures with various textures and de-stylized character pictures corresponding to the stylized character pictures, and the training data set B comprises a reference font and de-stylized character pictures with other various fonts;
constructing a de-stylized network, and training the de-stylized network by adopting a training data set A to obtain a de-stylized network model for de-stylizing the stylized character and image with texture;
thirdly, constructing a font migration network, training the font migration network by using de-stylized pictures obtained by the de-stylized network model and a training data set B to obtain a font migration network model for realizing conversion and migration of various fonts, and then migrating a certain font picture to be converted into a target reference font picture by using the model;
and step four, constructing a texture migration network, training the texture migration network by using the target reference font picture and the training data set A generated in the step three to obtain a texture migration network model for realizing stylized texture rendering of the font picture, and finally obtaining a final result of character style migration by using the model.
2. The text style migration method based on the two-stage deep network as claimed in claim 1, wherein:
in step two, the de-stylized network includes an encoder EXEncoder EYAnd decoder GXThe method for training the de-stylized network by adopting the training data set A sequentially comprises the following steps:
2.1 de-stylized network randomly selects an image pair (x, y) from training data set A, and inputs them to encoder EXAnd EYWherein y is a stylized character picture with texture, and x is a de-stylized character picture corresponding to y;
2.2 encoder EXAnd EYMapping x and y to shared characteristic space, coding to generate respective characteristic diagram, and calculating characteristic loss L according to the characteristic diagramfeatAnd with LfeatTraining encoder E with minimum optimization for targetXAnd EY
2.3 transmitting the characteristic diagram to a decoder GXGenerating a reconstructed de-stylized text picture, and then calculating pixel loss L from the reconstructed de-stylized text picture and picture xpixAnd with LpixTraining decoder G with minimum for goal optimizationXThe reconstructed de-stylized text picture is brought sufficiently close to picture x.
3. The text style migration method based on the two-stage deep network as claimed in claim 2, wherein:
the de-stylized network also includes a discriminator DXThe training of the de-stylized network using the training data set a further comprises:
2.4 inputting the reconstructed de-stylized character picture into a discriminator DXDetermining the authenticity of the product, and calculating the resistance loss LadvAnd optimized using an Adam optimizer.
4. The method for text style migration based on the two-stage deep network as claimed in claim 3, wherein:
the de-stylized network employs a total loss function to be optimized as:
L1=λfeatLfeatpixLpixadvLadv
Lfeat=Ex,y[||SX(EY(y))-z||1]
z=SX(GX(x))
Lpix=Ex,y[||GX(EY(y))-x||1]
Figure FDA0002758623160000021
in the above formula, Lfeat、Lpix、LadvCharacteristic loss, pixel loss and contrast loss, λ, respectivelyfeat、λpix、λadvRespectively, characteristic loss, pixel loss, and superparametric of countermeasures loss, SXIs GXZ is a content feature, λgpIn order to be a penalty factor,
Figure FDA0002758623160000022
for de-stylized text picture G along picture x and reconstructedX(EY(y)) are uniformly sampled.
5. The character style migration method based on the two-stage deep network as claimed in any one of claims 1-4, wherein:
in the third step, the font migration network comprises a generator G and a discriminator D, and the font migration network trained by the de-stylized picture and the training data set B obtained by using the de-stylized network model sequentially comprises the following steps:
3.1, randomly selecting a picture x from a training data set B by a font migration network, inputting the picture x into a generator G, and generating a false picture G (x, c) by the generator G according to the picture x and a target font label c;
3.2, on one hand, the false picture G (x, c) is input into the generator G again to generate a reconstructed picture G (G (x, c)), the de-stylized picture of the de-stylized network model is taken as a target font picture to supervise in the reconstruction process, and then the font classification loss L of the generator is calculatedfAnd reconstruction loss LrecWith Lf、LrecOn the other hand, the false picture G (x, c) is input into a discriminator D to discriminate the authenticity of the false picture G and the font domain to which the picture belongs, and the font classification loss L of the discriminator is calculatedrAnd with LrAnd optimally training the discriminator D with the minimum as a target.
6. The text style migration method based on the two-stage deep network as claimed in claim 5, wherein:
the loss function to be optimized adopted by the font migration network is as follows:
L2=LD+LG
LD=-Ladv1Lr
LG=Ladv1Lf2Lrec
Figure FDA0002758623160000031
Lr=Ex,c'[-logD(c'|x)]
Lf=Ex,c'[-logD(c|G(x,c))]
Lrec=Ex,c[||x-G(G(x,c),c')||1]
in the above formula, LDFor discriminator loss, LGFor generator losses, Ladv、Lr、Lf、LrecRespectively, the countermeasure loss, the font classification loss of the discriminator, the font classification loss of the generator, the reconstruction loss, lambda1、λ2、λgpRespectively representing a font classification loss hyper-parameter, a reconstruction loss hyper-parameter and a penalty function for resisting loss,
Figure FDA0002758623160000032
to sample evenly along the straight line between the real picture sample and the false picture G (x, c), D (c '| x) is the probability distribution that the discriminator D attributes the real picture sample to the original font domain c'.
7. The text style migration method based on the two-stage deep network as claimed in claim 5, wherein:
in step 3.1, the generation method of the false picture G (x, c) is as follows: firstly, the image x and the target font label c are subjected to feature mapping and fusion, and then the image x and the target font label c are transmitted into a deep convolutional network for training.
8. The character style migration method based on the two-stage deep network as claimed in any one of claims 1-4, wherein:
in the fourth step, the texture migration network comprises an encoder f, a decoder g and an AdaIN self-adaptive normalization layer positioned between the encoder f and the decoder g, wherein the encoder f and the decoder g are constructed by taking a VGG-19 network structure as a reference, the encoder f selects a front L layer of a pre-trained VGG-19 network, the decoder g is a symmetric structure of the encoder f, and all pooling layers are replaced by upper sampling layers;
the training of the texture migration network by using the target reference font picture and the training data set A generated in the third step sequentially comprises the following steps:
4.1, firstly, mapping a font picture c and a texture style picture s to a feature space by using an encoder f to obtain f (c) and f(s), and then performing feature transformation on the font picture c and the texture style picture s by using an AdaIN self-adaptive normalization layer to obtain a feature map t ═ AdaIN (f (c) and f (s));
4.2, mapping the feature map t back to the original feature space by a decoder g to obtain a stylized result map g (t);
4.3, inputting the stylized result graph g (t) and the texture style picture s into an encoder f, and realizing the training of the texture migration network through the optimization of the loss function.
9. The text style migration method based on the two-stage deep network as claimed in claim 8, wherein:
in step 4.3, the loss function is:
L3=Lc+λLs
Lc=||f(g(t))-t||2
Figure FDA0002758623160000041
in the above formula, LcFor content loss, LsFor style loss, λ is the hyper-parameter of style loss, φiIn the i-th layer of the encoder f, σ and μ are the variance and mean of each image channel, respectively.
10. The text style migration method based on the two-stage deep network as claimed in claim 8, wherein:
in step 4.1, the feature transformation formula of the AdaIN adaptive normalization layer is as follows:
Figure FDA0002758623160000051
in the above formula, σ and μ are the variance and mean of each image channel, respectively.
CN202011210655.3A 2020-11-03 2020-11-03 Text style migration method based on dual-stage depth network Active CN112307714B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011210655.3A CN112307714B (en) 2020-11-03 2020-11-03 Text style migration method based on dual-stage depth network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011210655.3A CN112307714B (en) 2020-11-03 2020-11-03 Text style migration method based on dual-stage depth network

Publications (2)

Publication Number Publication Date
CN112307714A true CN112307714A (en) 2021-02-02
CN112307714B CN112307714B (en) 2024-03-08

Family

ID=74332675

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011210655.3A Active CN112307714B (en) 2020-11-03 2020-11-03 Text style migration method based on dual-stage depth network

Country Status (1)

Country Link
CN (1) CN112307714B (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112966470A (en) * 2021-02-23 2021-06-15 北京三快在线科技有限公司 Character generation method and device, storage medium and electronic equipment
CN113421318A (en) * 2021-06-30 2021-09-21 合肥高维数据技术有限公司 Font style migration method and system based on multitask generation countermeasure network
CN113553932A (en) * 2021-07-14 2021-10-26 同济大学 Calligraphy character erosion repairing method based on style migration
CN113554549A (en) * 2021-07-27 2021-10-26 深圳思谋信息科技有限公司 Text image generation method and device, computer equipment and storage medium
CN113807430A (en) * 2021-09-15 2021-12-17 网易(杭州)网络有限公司 Model training method and device, computer equipment and storage medium
CN114240735A (en) * 2021-11-17 2022-03-25 西安电子科技大学 Method, system, storage medium, computer device and terminal for transferring any style
CN114399427A (en) * 2022-01-07 2022-04-26 福州大学 Character effect migration method based on cyclic generation countermeasure network
CN115310405A (en) * 2022-07-21 2022-11-08 北京汉仪创新科技股份有限公司 Font replacement method, system, device and medium based on countermeasure generation network

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019025909A1 (en) * 2017-08-01 2019-02-07 3M Innovative Properties Company Neural style transfer for image varietization and recognition
CN110443864A (en) * 2019-07-24 2019-11-12 北京大学 A kind of characters in a fancy style body automatic generation method based on single phase a small amount of sample learning
CN110503598A (en) * 2019-07-30 2019-11-26 西安理工大学 The font style moving method of confrontation network is generated based on condition circulation consistency
IT201900002557A1 (en) * 2019-02-22 2020-08-22 Univ Bologna Alma Mater Studiorum IMAGE-BASED CODING METHOD AND SYSTEM

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019025909A1 (en) * 2017-08-01 2019-02-07 3M Innovative Properties Company Neural style transfer for image varietization and recognition
IT201900002557A1 (en) * 2019-02-22 2020-08-22 Univ Bologna Alma Mater Studiorum IMAGE-BASED CODING METHOD AND SYSTEM
CN110443864A (en) * 2019-07-24 2019-11-12 北京大学 A kind of characters in a fancy style body automatic generation method based on single phase a small amount of sample learning
CN110503598A (en) * 2019-07-30 2019-11-26 西安理工大学 The font style moving method of confrontation network is generated based on condition circulation consistency

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
张惊雷;厚雅伟;: "基于改进循环生成式对抗网络的图像风格迁移", 电子与信息学报, no. 05 *
王晓红;卢辉;麻祥才;: "基于生成对抗网络的风格化书法图像生成", 包装工程, no. 11, 10 June 2020 (2020-06-10) *

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112966470A (en) * 2021-02-23 2021-06-15 北京三快在线科技有限公司 Character generation method and device, storage medium and electronic equipment
CN113421318A (en) * 2021-06-30 2021-09-21 合肥高维数据技术有限公司 Font style migration method and system based on multitask generation countermeasure network
CN113421318B (en) * 2021-06-30 2022-10-28 合肥高维数据技术有限公司 Font style migration method and system based on multitask generation countermeasure network
CN113553932A (en) * 2021-07-14 2021-10-26 同济大学 Calligraphy character erosion repairing method based on style migration
CN113553932B (en) * 2021-07-14 2022-05-13 同济大学 Calligraphy character erosion repairing method based on style migration
CN113554549A (en) * 2021-07-27 2021-10-26 深圳思谋信息科技有限公司 Text image generation method and device, computer equipment and storage medium
CN113554549B (en) * 2021-07-27 2024-03-29 深圳思谋信息科技有限公司 Text image generation method, device, computer equipment and storage medium
CN113807430A (en) * 2021-09-15 2021-12-17 网易(杭州)网络有限公司 Model training method and device, computer equipment and storage medium
CN113807430B (en) * 2021-09-15 2023-08-08 网易(杭州)网络有限公司 Model training method, device, computer equipment and storage medium
CN114240735B (en) * 2021-11-17 2024-03-19 西安电子科技大学 Arbitrary style migration method, system, storage medium, computer equipment and terminal
CN114240735A (en) * 2021-11-17 2022-03-25 西安电子科技大学 Method, system, storage medium, computer device and terminal for transferring any style
CN114399427A (en) * 2022-01-07 2022-04-26 福州大学 Character effect migration method based on cyclic generation countermeasure network
CN114399427B (en) * 2022-01-07 2024-06-28 福州大学 Word effect migration method based on loop generation countermeasure network
CN115310405A (en) * 2022-07-21 2022-11-08 北京汉仪创新科技股份有限公司 Font replacement method, system, device and medium based on countermeasure generation network

Also Published As

Publication number Publication date
CN112307714B (en) 2024-03-08

Similar Documents

Publication Publication Date Title
CN112307714A (en) Character style migration method based on double-stage deep network
AU2020100710A4 (en) A method for sentiment analysis of film reviews based on deep learning and natural language processing
CN109657156B (en) Individualized recommendation method based on loop generation countermeasure network
CN111079532B (en) Video content description method based on text self-encoder
CN111091045A (en) Sign language identification method based on space-time attention mechanism
Gai et al. New image denoising algorithm via improved deep convolutional neural network with perceptive loss
WO2022252272A1 (en) Transfer learning-based method for improved vgg16 network pig identity recognition
CN108830913B (en) Semantic level line draft coloring method based on user color guidance
CN110110323B (en) Text emotion classification method and device and computer readable storage medium
CN109934158B (en) Video emotion recognition method based on local enhanced motion history map and recursive convolutional neural network
CN114240735B (en) Arbitrary style migration method, system, storage medium, computer equipment and terminal
CN113780249B (en) Expression recognition model processing method, device, equipment, medium and program product
CN113095314B (en) Formula identification method, device, storage medium and equipment
CN114066871B (en) Method for training new coronal pneumonia focus area segmentation model
CN109711411B (en) Image segmentation and identification method based on capsule neurons
CN116468938A (en) Robust image classification method on label noisy data
CN114359631A (en) Target classification and positioning method based on coding-decoding weak supervision network model
CN113378949A (en) Dual-generation confrontation learning method based on capsule network and mixed attention
CN111768326A (en) High-capacity data protection method based on GAN amplification image foreground object
Shariff et al. Artificial (or) fake human face generator using generative adversarial network (GAN) machine learning model
WO2024060839A1 (en) Object operation method and apparatus, computer device, and computer storage medium
CN109284765A (en) The scene image classification method of convolutional neural networks based on negative value feature
CN117611838A (en) Multi-label image classification method based on self-adaptive hypergraph convolutional network
CN115280329A (en) Method and system for query training
CN111339734A (en) Method for generating image based on text

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant