CN112307714B - Text style migration method based on dual-stage depth network - Google Patents

Text style migration method based on dual-stage depth network Download PDF

Info

Publication number
CN112307714B
CN112307714B CN202011210655.3A CN202011210655A CN112307714B CN 112307714 B CN112307714 B CN 112307714B CN 202011210655 A CN202011210655 A CN 202011210655A CN 112307714 B CN112307714 B CN 112307714B
Authority
CN
China
Prior art keywords
picture
stylized
network
font
loss
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011210655.3A
Other languages
Chinese (zh)
Other versions
CN112307714A (en
Inventor
陈金泽
李龙
吕奕杭
廖志寰
朱安娜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan University of Technology WUT
Original Assignee
Wuhan University of Technology WUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan University of Technology WUT filed Critical Wuhan University of Technology WUT
Priority to CN202011210655.3A priority Critical patent/CN112307714B/en
Publication of CN112307714A publication Critical patent/CN112307714A/en
Application granted granted Critical
Publication of CN112307714B publication Critical patent/CN112307714B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/103Formatting, i.e. changing of presentation of documents
    • G06F40/109Font handling; Temporal or kinetic typography
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Biomedical Technology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Image Processing (AREA)

Abstract

A character style migration method based on a dual-stage depth network comprises the steps of firstly constructing a training data set A and a training data set B, then training a de-stylized network by adopting the training data set A to obtain a de-stylized network model, then training a character style migration network by utilizing a de-stylized picture obtained by the de-stylized network model and the training data set B to obtain a character style migration network model, migrating a character picture to be converted into a target reference character picture by utilizing the model, and finally training a texture migration network by utilizing the target reference character picture and the training data set A to obtain a texture migration network model, so that a final result of character style migration can be obtained by utilizing the model. The design has excellent text style migration effect.

Description

Text style migration method based on dual-stage depth network
Technical Field
The invention belongs to the field of deep learning and image style migration, and particularly relates to a text style migration method based on a dual-stage deep network.
Background
Style migration of images refers to the task of migrating one style from one image to another to synthesize a new artistic image. In recent years, with the continuous development of artificial intelligence technology and global creative industry, realization of style migration of text and images has become a requirement for people. It is desirable to generate more artistic fonts for use in the design and promotion of businesses, cultures, and other industries.
The style migration of a text image is different from that of a common image, and relates to two aspects of font migration and texture migration of the text. The former realizes the font conversion of the characters with the same content, and the latter realizes the style appearance conversion of the characters. The artificial synthesis of text images of specific fonts and textures consumes a lot of time and effort, so that the realization of text style migration using an automatic and efficient method is a concern. However, the existing text style migration method is limited to single-stage direct conversion, namely, the two aspects of font (font) and texture (texture) of the text are migrated at one time at the same stage, so that the obtained effect is often not ideal.
Disclosure of Invention
The invention aims to overcome the problems in the prior art and provide a text style migration method based on a dual-stage depth network, which has better migration effect.
In order to achieve the above object, the present invention provides the following technical solutions:
a text style migration method based on a dual-stage depth network sequentially comprises the following steps:
step one, constructing a training data set A and a training data set B, wherein the training data set A comprises stylized text pictures with various textures and corresponding stylized text pictures, and the training data set B comprises reference fonts and other stylized text pictures with various fonts;
constructing a de-stylized network, and training the de-stylized network by adopting a training data set A to obtain a de-stylized network model for de-stylizing textured stylized character images;
thirdly, firstly constructing a font migration network, training the font migration network by utilizing a de-stylized picture obtained by the de-stylized network model and a training data set B to obtain a font migration network model for realizing conversion and migration of various fonts, and then migrating a font picture to be converted into a target reference font picture by utilizing the model;
and fourthly, firstly constructing a texture migration network, then training the texture migration network by utilizing the target reference font picture and the training data set A generated in the third step to obtain a texture migration network model for realizing stylized texture rendering of the font picture, and finally obtaining a final result of character style migration by utilizing the model.
In the second step, the de-stylized network includes an encoder E X Encoder E Y And decoder G X The training of the de-stylized network by using the training data set A sequentially comprises the following steps:
2.1, the de-stylized network randomly selects an image pair (x, y) from the training data set A, and inputs the image pairs to the encoder E respectively X And E is Y Wherein y is a stylized text picture with textures, and x is a de-stylized text picture corresponding to y;
2.2 encoder E X And E is Y Mapping x and y to shared feature space, coding to generate respective feature graphs, and calculating feature loss L according to the feature graphs feat And take L as feat Minimal target optimization training encoder E X And E is Y
2.3 transmitting the feature map to the decoder G X Generating a reconstructed de-stylized literal picture, and then calculating pixel loss L according to the reconstructed de-stylized literal picture and picture x pix And take L as pix Minimum target optimized training decoder G X The reconstructed de-stylized text picture is made sufficiently close to picture x.
The de-stylized network further comprises a discriminator D X The training the de-stylized network using the training data set a further comprises:
2.4, inputting the reconstructed de-stylized text and picture into a discriminator D X Determining the authenticity of the product, and calculating the countering loss L adv And optimized using Adam optimizer.
The total loss function to be optimized adopted by the de-stylized network is as follows:
L 1 =λ feat L featpix L pixadv L adv
L feat =E x,y [||S X (E Y (y))-z|| 1 ]
z=S X (G X (x))
L pix =E x,y [||G X (E Y (y))-x|| 1 ]
in the above, L feat 、L pix 、L adv Respectively, characteristic loss, pixel loss and contrast loss, lambda feat 、λ pix 、λ adv Super-parameters of feature loss, pixel loss, contrast loss, S X Is G X Z is the content feature, lambda gp In order to penalize the coefficients,de-stylized literal picture G for image x and reconstruction X (E Y (y)) are sampled uniformly.
In the third step, the font migration network includes a generator G and a discriminator D, and the training of the font migration network by using the de-stylized picture and the training data set B obtained by using the de-stylized network model sequentially includes the following steps:
3.1, the font migration network randomly selects a picture x from the training data set B and inputs the picture x into a generator G, and the generator G generates a fake picture G (x, c) according to the picture x and the target font label c;
3.2, on the one hand, inputting the dummy picture G (x, c) into the generator G again to generate a reconstructed picture G (x, c)), supervising the reconstructed picture by taking the de-stylized picture of the de-stylized network model as a target font picture in the reconstruction process, and calculating the font classification loss L of the generator again f And reconstruction loss L rec In L f 、L rec The generator G is optimized and trained for the purpose of minimization, on the other hand, the fake picture G (x, c) is input into the discriminator D to discriminate the true or false and the font domain of the picture, and the font classification loss L of the discriminator is calculated r And take L as r And optimally training the discriminator D for the minimum target.
The loss function to be optimized adopted by the font migration network is as follows:
L 2 =L D +L G
L D =-L adv1 L r
L G =L adv1 L f2 L rec
L r =E x,c' [-logD(c'|x)]
L f =E x,c' [-logD(c|G(x,c))]
L rec =E x,c [||x-G(G(x,c),c')|| 1 ]
in the above, L D For discriminator loss, L G Generator loss, L adv 、L r 、L f 、L rec The counterloss, the font classification loss of the discriminator, the font classification loss of the generator, the reconstruction loss, lambda respectively 1 、λ 2 、λ gp The hyper-parameters of font classification loss, reconstruction loss and penalty function of counterloss are respectively adopted,for uniform sampling along a straight line between the real picture sample and the dummy picture G (x, c), D (c '|x) is a probability distribution of the discriminator D classifying the real picture sample into the original font field c'.
In step 3.1, the method for generating the dummy pictures G (x, c) includes: the image x and the target font label c are subjected to feature mapping and fusion, and then are transmitted into a deep convolutional network for training.
In the fourth step, the texture migration network comprises an encoder f, a decoder g and an AdaIN self-adaptive normalization layer positioned between the encoder f and the decoder g, wherein the encoder f and the decoder g are constructed by taking a VGG-19 network structure as a reference, the encoder f selects a front L layer of a pretrained VGG-19 network, the decoder g is a symmetrical structure of the encoder f, and all pooling layers are replaced by up-sampling layers;
the training texture migration network by utilizing the target reference font picture and the training data set A generated in the step three sequentially comprises the following steps:
4.1, firstly, mapping a font picture c and a texture style picture s to a feature space by adopting an encoder f to obtain f (c) and f(s), and then carrying out feature transformation on the font picture c and the texture style picture s by an AdaIN self-adaptive normalization layer to obtain a feature map t=AdaIN (f (c), f (s));
4.2, mapping the feature map t back to the original feature space by adopting a decoder g to obtain a stylized result map g (t);
and 4.3, inputting the stylized result graph g (t) and the texture style graph s into the encoder f, and realizing training of a texture migration network through optimization of a loss function.
In step 4.3, the loss function is:
L 3 =L c +λL s
L c =||f(g(t))-t|| 2
in the above, L c For content loss, L s As style loss, lambda is the super parameter of style loss, phi i For the i-th layer of the encoder f, σ and μ are the variance and the mean of each image channel, respectively.
In step 4.1, the characteristic transformation formula of the AdaIN adaptive normalization layer is as follows:
in the above equation, σ and μ are the variance and the mean of each image channel, respectively.
Compared with the prior art, the invention has the beneficial effects that:
according to the character style migration method based on the dual-stage deep network, a training data set A and a training data set B are firstly constructed, then a training data set A is used for training a stylized network to obtain a stylized network model for performing stylized on a textured stylized character image, then a character migration network is trained by using the stylized picture obtained by the stylized network model and the training data set B to obtain a character migration network model for performing multiple character conversion migration, a character picture to be converted is migrated to be a target reference character picture by using the model, finally a texture migration network model for performing stylized texture rendering on the character picture is obtained by using the target reference character picture and the training data set A, a final result of character style migration can be obtained by using the model, and the character style migration effect can be better through performing first-stage character migration and second-stage character migration by performing character texture migration. Therefore, the invention can obtain better text style migration effect.
Drawings
Fig. 1 is an overall flow chart of the present invention.
Fig. 2 is a schematic diagram of a training data set a according to the present invention.
Fig. 3 is a schematic diagram of a training data set B according to the present invention.
Fig. 4 is a schematic diagram of the structure of the de-stylized network according to the present invention.
Fig. 5 is a schematic diagram of a font migration network according to the present invention.
FIG. 6 is a schematic diagram of a texture migration network according to the present invention.
Detailed Description
The invention is further described below with reference to the detailed description and the accompanying drawings.
Referring to fig. 1-6, a text style migration method based on a dual-stage deep network sequentially includes the following steps:
step one, constructing a training data set A and a training data set B, wherein the training data set A comprises stylized text pictures with various textures and corresponding stylized text pictures, and the training data set B comprises reference fonts and other stylized text pictures with various fonts;
constructing a de-stylized network, and training the de-stylized network by adopting a training data set A to obtain a de-stylized network model for de-stylizing textured stylized character images;
thirdly, firstly constructing a font migration network, training the font migration network by utilizing a de-stylized picture obtained by the de-stylized network model and a training data set B to obtain a font migration network model for realizing conversion and migration of various fonts, and then migrating a font picture to be converted into a target reference font picture by utilizing the model;
and fourthly, firstly constructing a texture migration network, then training the texture migration network by utilizing the target reference font picture and the training data set A generated in the third step to obtain a texture migration network model for realizing stylized texture rendering of the font picture, and finally obtaining a final result of character style migration by utilizing the model.
In the second step, the de-stylized network includes an encoder E X Encoder E Y And decoder G X The training of the de-stylized network by using the training data set A sequentially comprises the following steps:
2.1, the de-stylized network randomly selects an image pair (x, y) from the training data set A, and inputs the image pairs to the encoder E respectively X And E is Y Wherein y is a stylized text picture with textures, and x is a de-stylized text picture corresponding to y;
2.2 encoder E X And E is Y Mapping x and y to shared feature space, coding to generate respective feature graphs, and calculating feature loss L according to the feature graphs feat And take L as feat Minimal target optimization training encoder E X And E is Y
2.3 transmitting the feature map to the decoder G X Generating a reconstructed de-stylized literal picture, and then calculating pixel loss L according to the reconstructed de-stylized literal picture and picture x pix And take L as pix Minimum ofTraining decoder G for target optimization X The reconstructed de-stylized text picture is made sufficiently close to picture x.
The de-stylized network further comprises a discriminator D X The training the de-stylized network using the training data set a further comprises:
2.4, inputting the reconstructed de-stylized text and picture into a discriminator D X Determining the authenticity of the product, and calculating the countering loss L adv And optimized using Adam optimizer.
The total loss function to be optimized adopted by the de-stylized network is as follows:
L 1 =λ feat L featpix L pixadv L adv
L feat =E x,y [||S X (E Y (y))-z|| 1 ]
z=S X (G X (x))
L pix =E x,y [||G X (E Y (y))-x|| 1 ]
in the above, L feat 、L pix 、L adv Respectively, characteristic loss, pixel loss and contrast loss, lambda feat 、λ pix 、λ adv Super-parameters of feature loss, pixel loss, contrast loss, S X Is G X Z is the content feature, lambda gp In order to penalize the coefficients,de-stylized literal picture G for image x and reconstruction X (E Y (y)) are sampled uniformly.
In the third step, the font migration network includes a generator G and a discriminator D, and the training of the font migration network by using the de-stylized picture and the training data set B obtained by using the de-stylized network model sequentially includes the following steps:
3.1, the font migration network randomly selects a picture x from the training data set B and inputs the picture x into a generator G, and the generator G generates a fake picture G (x, c) according to the picture x and the target font label c;
3.2, on the one hand, inputting the dummy picture G (x, c) into the generator G again to generate a reconstructed picture G (x, c)), supervising the reconstructed picture by taking the de-stylized picture of the de-stylized network model as a target font picture in the reconstruction process, and calculating the font classification loss L of the generator again f And reconstruction loss L rec In L f 、L rec The generator G is optimized and trained for the purpose of minimization, on the other hand, the fake picture G (x, c) is input into the discriminator D to discriminate the true or false and the font domain of the picture, and the font classification loss L of the discriminator is calculated r And take L as r And optimally training the discriminator D for the minimum target.
The loss function to be optimized adopted by the font migration network is as follows:
L 2 =L D +L G
L D =-L adv1 L r
L G =L adv1 L f2 L rec
L r =E x,c' [-logD(c'|x)]
L f =E x,c' [-logD(c|G(x,c))]
L rec =E x,c [||x-G(G(x,c),c')|| 1 ]
in the above, L D For discriminator loss, L G Generator loss, L adv 、L r 、L f 、L rec The counterloss, the font classification loss of the discriminator, the font classification loss of the generator, the reconstruction loss, lambda respectively 1 、λ 2 、λ gp Respectively fontsClassifying the hyper-parameters of the loss, reconstructing the hyper-parameters of the loss, penalty functions against the loss,for uniform sampling along a straight line between the real picture sample and the dummy picture G (x, c), D (c '|x) is a probability distribution of the discriminator D classifying the real picture sample into the original font field c'.
In step 3.1, the method for generating the dummy pictures G (x, c) includes: the image x and the target font label c are subjected to feature mapping and fusion, and then are transmitted into a deep convolutional network for training.
In the fourth step, the texture migration network comprises an encoder f, a decoder g and an AdaIN self-adaptive normalization layer positioned between the encoder f and the decoder g, wherein the encoder f and the decoder g are constructed by taking a VGG-19 network structure as a reference, the encoder f selects a front L layer of a pretrained VGG-19 network, the decoder g is a symmetrical structure of the encoder f, and all pooling layers are replaced by up-sampling layers;
the training texture migration network by utilizing the target reference font picture and the training data set A generated in the step three sequentially comprises the following steps:
4.1, firstly, mapping a font picture c and a texture style picture s to a feature space by adopting an encoder f to obtain f (c) and f(s), and then carrying out feature transformation on the font picture c and the texture style picture s by an AdaIN self-adaptive normalization layer to obtain a feature map t=AdaIN (f (c), f (s));
4.2, mapping the feature map t back to the original feature space by adopting a decoder g to obtain a stylized result map g (t);
and 4.3, inputting the stylized result graph g (t) and the texture style graph s into the encoder f, and realizing training of a texture migration network through optimization of a loss function.
In step 4.3, the loss function is:
L 3 =L c +λL s
L c =||f(g(t))-t|| 2
in the above, L c For content loss, L s As style loss, lambda is the super parameter of style loss, phi i For the i-th layer of the encoder f, σ and μ are the variance and the mean of each image channel, respectively.
In step 4.1, the characteristic transformation formula of the AdaIN adaptive normalization layer is as follows:
in the above equation, σ and μ are the variance and the mean of each image channel, respectively.
The principle of the invention is explained as follows:
the invention provides a character style migration method based on a dual-stage depth network, which is based on a de-stylized network formed by two encoders, a decoder and a discriminator, and realizes de-stylized processing of textured characters by optimizing characteristic loss, pixel loss and contrast loss; based on a font migration network of a generator and a discriminator, realizing first-stage migration of the character fonts by optimizing the countermeasures and the font classification losses; based on a texture migration network of an encoder and a decoder and an AdaIN self-adaptive normalization layer, the content loss and the style loss are optimized by carrying out feature transformation through mean and variance, and the second stage migration of the text texture is realized. The style migration text image obtained by the invention has higher artistic effect, has wide application in the field of visual design, can be used for a plurality of aspects such as artistic image design, cultural commercial image propaganda, painting text processing and the like, is not only suitable for digital and letter images, but also has better expression in the aspect of Chinese characters.
Distinguishing device D X : in order to make the de-stylized reconstruction result more accurate, the invention adds a discriminator D in the de-stylized network X For determining the authenticity of the reconstructed picture.
Font classification loss of a discriminatorL r =E x,c' [-logD(c'|x)]We expect that the input font image x is converted to the output font image y and correctly classified into the target font field c, D (c '|x) represents the probability distribution that the discriminator classifies the true samples into the original font field c', while the objective of the discriminator D is to minimize this portion of the loss as well.
Font classification loss L for generator f =E x,c' [-logD(c|G(x,c))]The loss function is used to optimize the generator G so that the pictures generated by the generator G can be classified into the target font field c by the discriminator D.
Reconstruction loss L rec =E x,c [||x-G(G(x,c),c')|| 1 ]. Considering that the generator G may change only the font-related information of the input picture without changing the font content of the picture to thereby fool the discriminator D, the generated G (x, c) is re-input to the generator G to obtain the picture G (x, c)), which should be kept as consistent as possible with the picture x, so that the loss restriction is performed using the 1-norm.
For countering loss L adv The problem of model collapse is solved by adopting a WGAN method, namely:
texture migration networks define two losses: content loss L c And style loss L s . The content loss is represented by the Euclidean distance between the network output image and the AdaIN layer output feature image, so that the final output content of the model is sufficiently close to the AdaIN layer output feature image t, and the convergence speed is increased; and (3) carrying out encoder encoding on the generated image again after the style loss, obtaining the mean value and variance of the characteristic images of each layer of the VGG network, and carrying out Euclidean distance summation on the mean value and variance of the corresponding layer of the real style image.
Example 1:
referring to fig. 1, a text style migration method based on a dual-stage deep network sequentially comprises the following steps:
1. reference is made to: constructing a training data set A and a training data set B, wherein the training data set A comprises stylized literal pictures with various textures and corresponding stylized literal pictures, and the training data set B comprises reference fonts and other stylized literal pictures with various fonts (see figures 2 and 3);
2. constructing a de-stylized network comprising an encoder E X Encoder E Y Decoder G X Discriminator D X Encoder E X And E is Y The last several layers of weights share, and the network structure adopted by the de-stylized network is shown in table 1:
table 1 de-stylized network structure table
The total loss function to be optimized adopted by the de-stylized network is as follows:
L 1 =λ feat L featpix L pixadv L adv
in the above, L feat 、L pix 、L adv Respectively, characteristic loss, pixel loss and contrast loss, lambda feat 、λ pix 、λ adv Super parameters of characteristic loss, pixel loss and contrast loss respectively;
3. referring to fig. 4, the de-stylized network is trained by using a training data set a, specifically:
(1) The de-stylized network randomly selects an image pair (x, y) from the training data set A, and inputs the image pairs to the encoder E respectively X And E is Y Wherein y is a stylized text picture with textures, and x is a de-stylized text picture corresponding to y;
(2) Encoder E X And E is Y Mapping x and y to shared feature space, coding to generate respective feature graphs, and calculating feature loss L according to the feature graphs feat And take L as feat Minimal target optimization training encoder E X And E is Y
The task of the encoder is to make the result more similar to the group Truth of the content feature. By S X Represents G X For instruction, then the content feature for instruction is defined as z=s X (G X (x)),L feat For guiding E Y Removing texture feature elements from a text image, preserving core font information, defined as
L feat =E x,y [||S X (E Y (y))-z|| 1 ];
(3) Transmitting the characteristic diagram to a decoder G X Generating a reconstructed de-stylized literal picture, and then calculating pixel loss L according to the reconstructed de-stylized literal picture and picture x pix And take L as pix Minimum target optimized training decoder G X Enabling the reconstructed de-stylized text picture to be sufficiently close to the picture x;
the de-stylized network needs to approximate the resulting reconstruction to picture x, so pixel loss constraint is performed using the 1-normal form, the pixel loss being defined as:
L pix =E x,y [||G X (E Y (y))-x|| 1 ]
(4) Inputting the reconstructed de-stylized text picture into a discriminator D X Determining the authenticity of the product, and calculating the countering loss L adv (definition of countermeasures against loss in network is to guide G) X And E is Y Confusion D X ):
In the above, lambda gp In order to penalize the coefficients,de-stylized literal picture G for image x and reconstruction X (E Y (y)) is uniformly sampled;
and optimizing it with Adam optimizer, settingLearning rate parameter is 0.0002 lambda feat =λ pix =100,λ gp =10,λ adv =1, finally yielding an encoder E usable for de-stylization Y And decoder G X Model-generated G X (E Y (y)) the image is sufficiently close to the de-stylized picture x:
4. the method comprises the steps of constructing a font migration network, wherein the font migration network comprises a generator G and a discriminator D, the generator G comprises 2 convolution layers, 6 residual layers and 2 deconvolution layers, normalization processing is used, the total network flow is that the model dimension of the generator G is reduced to 4 times, then the 6 residual networks are used for obtaining equal dimension output, the transposed convolution is used for amplifying by 4 times, and finally tan h is taken as output through convolution with a constant dimension, and the network structure adopted by the generator G is shown in a table 2:
table 2 structural table of generator G
The convolution kernel size in each convolution layer is 4*4, the step size is 2, and the dimension is reduced by 1/2 each time the convolution operation is performed. The normalization layer normalizes IN an image channel and calculates a mean value (IN) according to H; since the generation result mainly depends on a certain image instance, normalization (BN) of the whole batch is not suitable for the stylization of the image, and thus normalization of h×w can accelerate model convergence and keep independence between each image instance. The activation function adopts LeakyReLU, and the derivative is always non-zero because the function output has a small gradient to the negative input, so that the occurrence of silent neurons can be reduced, gradient-based learning is allowed, and the problem that neurons are not learned after the Relu function enters a negative interval is solved.
Furthermore, to try to avoid the over-fitting problem, we do not fit the desired feature map directly with multiple stacked layers, but explicitly fit a residual map with them. Assuming that the desired feature map is H (x), the stacked nonlinear layers fit to another map, namely F (x) =h (x) -x. Assuming that it is easier to optimize the residual map than the desired one, i.e. F (x) =h (x) -x is easier to optimize than F (x) =h (x), in the extreme case the desired map is to fit an identity map, where the task of the residual network is to fit F (x) =0, the common network is to fit F (x) =x, the former is obviously easier to optimize.
The discriminator D adopts a PatchGAN structure to carry out true and false classification on the local image blocks, a normalization layer is not used, the output of Conv1 is used for representing the prediction probability of a target font, the output of Conv2 is used for representing whether the picture is true or not, and the relationship between the two is parallel;
the loss function to be optimized adopted by the font migration network is as follows:
L 2 =L D +L G
L D =-L adv1 L r
L G =L adv1 L f2 L rec
in the above, L D For discriminator loss, L G Generator loss, L adv 、L r 、L f 、L rec The counterloss, the font classification loss of the discriminator, the font classification loss of the generator, the reconstruction loss, lambda respectively 1 、λ 2 The super parameters are the super parameters of font classification loss and reconstruction loss respectively;
5. referring to fig. 5, training a font migration network by using a de-stylized picture and a training data set B obtained by the de-stylized network model to obtain a font migration network model for implementing multiple font conversion migration, specifically:
(1) The font migration network randomly selects a picture x from the training data set B and inputs the picture x into a generator G, the generator G performs feature mapping and fusion on the picture x and a target font label c, and then the picture x and the target font label c are transmitted into a deep convolution network for training to generate a fake picture G (x, c);
(2) On the one hand, the fake picture G (x, c) is input into the generator G again to generate a reconstructed picture G (G (x, c)), and the wind removal of the network model is stylized by wind removal in the reconstruction processThe stylized picture is used as a target font picture to be supervised, the preservation of the picture content in the picture conversion process is ensured, only the part of the field difference is changed, and the font classification loss L of the generator is calculated f And reconstruction loss L rec In L f 、L rec The generator G is optimized and trained for the purpose of minimization, on the other hand, the fake picture G (x, c) is input into the discriminator D to discriminate the true or false and the font domain of the picture, and the font classification loss L of the discriminator is calculated r And take L as r The minimum is to optimize the training of the discriminator D:
L r =E x,c' [-logD(c'|x)]
L f =E x,c' [-logD(c|G(x,c))]
L rec =E x,c [||x-G(G(x,c),c')|| 1 ]
in the above, lambda gp In order to combat the penalty function of the loss,for uniform sampling along a straight line between a real picture sample and a fake picture G (x, c), D (c '|x) is a probability distribution of the discriminator D classifying the real picture sample into an original font field c';
the training of the model adopts the parameter beta 1 =0.5,β 2 Adam optimizer of =0.999, to increase data, flip the image at a probability level of 0.5, perform 1 generator update after 5 arbiter updates, set the batch size for all trials to 16, train all models in the first 10 epochs with a learning rate of 0.0001, and linearly attenuate the learning rate to 0 in the next 10 epochs;
6. migrating a certain font picture to be converted into a target reference font picture by using the obtained font migration network model;
7. constructing a texture migration network, wherein the texture migration network comprises an encoder f, a decoder g and an AdaIN self-adaptive normalization layer positioned between the encoder f and the decoder g, the encoder f and the decoder g are constructed by taking a VGG-19 network structure as a reference, the encoder f selects relu1_1 to relu4_1 parts of a pretrained VGG-19 network, the decoder g is a symmetrical structure of the encoder f, but all pooling layers are replaced by upsampling layers, and the specific structure of the texture migration network is shown in a table 3:
table 3 texture migration network structure table
The convolution kernels of the network convolution layers are 3*3, the step length is 1, the window size of the MaxPool maximum pooling layer is 2 x 2, and the up-sampling layer adopts a nearest neighbor interpolation algorithm;
8. referring to fig. 6, training a texture migration network by using a target reference font picture and a training data set a to obtain a texture migration network model for realizing stylized texture rendering of the font picture, specifically:
(1) Firstly, mapping a font picture c and a texture style picture s to a feature space by adopting an encoder f to obtain f (c) and f(s), and then carrying out feature transformation on the font picture c and the texture style picture s through an AdaIN self-adaptive normalization layer to obtain a feature map t=AdaIN (f (c), f (s)):
in the above formula, sigma and mu are respectively the variance and the mean of each image channel;
(2) Mapping the feature map t back to the original feature space by adopting a decoder g to obtain a stylized result map g (t);
(3) Inputting a stylized result graph g (t) and a texture style graph s into an encoder f, and realizing training of a texture migration network through optimization of a loss function, wherein the loss function is as follows:
L 3 =L c +λL s
L c =||f(g(t))-t|| 2
in the above, L c For content loss, L s As style loss, lambda is the super parameter of style loss, phi i For the ith layer of the encoder f, sigma and mu are the variance and the mean of each image channel respectively;
optimization loss selects Adam optimizer and sets the batch size to 8.

Claims (3)

1. A character style migration method based on a dual-stage depth network is characterized in that:
the method sequentially comprises the following steps:
step one, constructing a training data set A and a training data set B, wherein the training data set A comprises stylized text pictures with various textures and corresponding stylized text pictures, and the training data set B comprises reference fonts and stylized text pictures with various fonts;
constructing a de-stylized network, and training the de-stylized network by adopting a training data set A to obtain a de-stylized network model for de-stylizing textured stylized text images, wherein the de-stylized network comprises an encoder E X Encoder E Y Decoder G X Discriminator D X The training of the de-stylized network by using the training data set A sequentially comprises the following steps:
2.1, the de-stylized network randomly selects an image pair (x, y) from the training data set A, and inputs the image pairs to the encoder E respectively X And E is Y Wherein y is a stylized text picture with textures, and x is a de-stylized text picture corresponding to y;
2.2 encoder E X And E is Y Mapping x and y to shared feature space, coding to generate respective feature graphs, and calculating feature loss L according to the feature graphs feat And take L as feat Minimal target optimization training encoder E X And E is Y
2.3 transmitting the feature map to the decoder G X Generating a reconstructed de-stylized literal picture, and then calculating pixel loss L according to the reconstructed de-stylized literal picture and picture x pix And take L as pix Minimum target optimized training decoder G X
2.4, inputting the reconstructed de-stylized text and picture into a discriminator D X Determining the authenticity of the product, and calculating the countering loss L adv And optimizing the model by using an Adam optimizer;
the total loss function to be optimized adopted by the de-stylized network is as follows:
L 1 =λ feat L featpix L pixadv L adv
L feat =E x,y [||S X (E Y (y))-z|| 1 ]
z=S X (G X (x))
L pix =E x,y [||G X (E Y (y))-x|| 1 ]
in the above, L feat 、L pix 、L adv Respectively, characteristic loss, pixel loss and contrast loss, lambda feat 、λ pix 、λ adv Super-parameters of feature loss, pixel loss, contrast loss, S X Is G X Z is the content feature, lambda gp In order to penalize the coefficients,de-stylized literal picture G for image x and reconstruction X (E Y (y)) is uniformly sampled;
thirdly, constructing a font migration network, training the font migration network by utilizing a de-stylized picture and a training data set B which are obtained by the de-stylized network model to obtain a font migration network model for realizing conversion and migration of various fonts, and migrating a font picture to be converted into a target reference font picture by utilizing the model, wherein the font migration network comprises a generator G and a discriminator D, and the de-stylized picture and the training data set B which are obtained by utilizing the de-stylized network model train the font migration network sequentially and comprise the following steps:
3.1, the font migration network randomly selects a picture x from the training data set B and inputs the picture x into a generator G, and the generator G generates a fake picture G (x, c) according to the picture x and the target font label c;
3.2, on the one hand, inputting the dummy picture G (x, c) into the generator G again to generate a reconstructed picture G (x, c)), supervising the reconstructed picture by taking the de-stylized picture of the de-stylized network model as a target font picture in the reconstruction process, and calculating the font classification loss L of the generator again f And reconstruction loss L rec In L f 、L rec The generator G is optimized and trained for the purpose of minimization, on the other hand, the fake picture G (x, c) is input into the discriminator D to discriminate the true or false and the font domain of the picture, and the font classification loss L of the discriminator is calculated r And take L as r Optimally training the discriminator D for the minimum target;
the loss function to be optimized adopted by the font migration network is as follows:
L 2 =L D +L G
L D =-L adv1 L r
L G =L adv1 L f2 L rec
L r =E x,c' [-logD(c'|x)]
L f =E x,c' [-logD(c|G(x,c))]
L rec =E x,c [||x-G(G(x,c),c')|| 1 ]
in the above, L D For discriminator loss, L G Generator loss, L adv 、L r 、L f 、L rec The counterloss, the font classification loss of the discriminator, the font classification loss of the generator, the reconstruction loss, lambda respectively 1 、λ 2 、λ gp The hyper-parameters of font classification loss, reconstruction loss and penalty function of counterloss are respectively adopted,for uniform sampling along a straight line between a real picture sample and a fake picture G (x, c), D (c '|x) is a probability distribution of the discriminator D classifying the real picture sample into an original font field c';
constructing a texture migration network, training the texture migration network by using the target reference font picture and the training data set A generated in the step three to obtain a texture migration network model for realizing stylized texture rendering of the font picture, and finally obtaining a final result of text style migration by using the model, wherein the texture migration network comprises an encoder f, a decoder g and an AdaIN self-adaptive normalization layer positioned between the encoder f and the decoder g, wherein the encoder f and the decoder g are constructed by taking a VGG-19 network structure as a benchmark, the encoder f selects a front L layer of a pre-trained VGG-19 network, the decoder g is a symmetrical structure of the encoder f, and all pooling layers are replaced by up-sampling layers;
the training texture migration network by utilizing the target reference font picture and the training data set A generated in the step three sequentially comprises the following steps:
4.1, firstly, mapping a font picture c and a texture style picture s to a feature space by adopting an encoder f to obtain f (c) and f(s), and then carrying out feature transformation on the font picture c and the texture style picture s by an AdaIN self-adaptive normalization layer to obtain a feature map t=AdaIN (f (c), f (s));
4.2, mapping the feature map t back to the original feature space by adopting a decoder g to obtain a stylized result map g (t);
4.3, inputting the stylized result graph g (t) and the texture style graph s into the encoder f, and realizing training of a texture migration network through optimization of the following loss functions:
L 3 =L c +λL s
L c =||f(g(t))-t|| 2
in the above, L c For content loss, L s For style loss, lambda is the hyper-parameter of style loss,for the i-th layer of the encoder f, σ and μ are the variance and the mean of each image channel, respectively.
2. The text style migration method based on the dual-stage deep network of claim 1, wherein the text style migration method is characterized by:
in step 3.1, the method for generating the dummy pictures G (x, c) includes: the image x and the target font label c are subjected to feature mapping and fusion, and then are transmitted into a deep convolutional network for training.
3. The text style migration method based on the dual-stage deep network of claim 1, wherein the text style migration method is characterized by:
in step 4.1, the characteristic transformation formula of the AdaIN adaptive normalization layer is as follows:
in the above equation, σ and μ are the variance and the mean of each image channel, respectively.
CN202011210655.3A 2020-11-03 2020-11-03 Text style migration method based on dual-stage depth network Active CN112307714B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011210655.3A CN112307714B (en) 2020-11-03 2020-11-03 Text style migration method based on dual-stage depth network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011210655.3A CN112307714B (en) 2020-11-03 2020-11-03 Text style migration method based on dual-stage depth network

Publications (2)

Publication Number Publication Date
CN112307714A CN112307714A (en) 2021-02-02
CN112307714B true CN112307714B (en) 2024-03-08

Family

ID=74332675

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011210655.3A Active CN112307714B (en) 2020-11-03 2020-11-03 Text style migration method based on dual-stage depth network

Country Status (1)

Country Link
CN (1) CN112307714B (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112966470A (en) * 2021-02-23 2021-06-15 北京三快在线科技有限公司 Character generation method and device, storage medium and electronic equipment
CN113421318B (en) * 2021-06-30 2022-10-28 合肥高维数据技术有限公司 Font style migration method and system based on multitask generation countermeasure network
CN113553932B (en) * 2021-07-14 2022-05-13 同济大学 Calligraphy character erosion repairing method based on style migration
CN113554549B (en) * 2021-07-27 2024-03-29 深圳思谋信息科技有限公司 Text image generation method, device, computer equipment and storage medium
CN113807430B (en) * 2021-09-15 2023-08-08 网易(杭州)网络有限公司 Model training method, device, computer equipment and storage medium
CN114240735B (en) * 2021-11-17 2024-03-19 西安电子科技大学 Arbitrary style migration method, system, storage medium, computer equipment and terminal
CN115310405A (en) * 2022-07-21 2022-11-08 北京汉仪创新科技股份有限公司 Font replacement method, system, device and medium based on countermeasure generation network

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019025909A1 (en) * 2017-08-01 2019-02-07 3M Innovative Properties Company Neural style transfer for image varietization and recognition
CN110443864A (en) * 2019-07-24 2019-11-12 北京大学 A kind of characters in a fancy style body automatic generation method based on single phase a small amount of sample learning
CN110503598A (en) * 2019-07-30 2019-11-26 西安理工大学 The font style moving method of confrontation network is generated based on condition circulation consistency
IT201900002557A1 (en) * 2019-02-22 2020-08-22 Univ Bologna Alma Mater Studiorum IMAGE-BASED CODING METHOD AND SYSTEM

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019025909A1 (en) * 2017-08-01 2019-02-07 3M Innovative Properties Company Neural style transfer for image varietization and recognition
IT201900002557A1 (en) * 2019-02-22 2020-08-22 Univ Bologna Alma Mater Studiorum IMAGE-BASED CODING METHOD AND SYSTEM
CN110443864A (en) * 2019-07-24 2019-11-12 北京大学 A kind of characters in a fancy style body automatic generation method based on single phase a small amount of sample learning
CN110503598A (en) * 2019-07-30 2019-11-26 西安理工大学 The font style moving method of confrontation network is generated based on condition circulation consistency

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于生成对抗网络的风格化书法图像生成;王晓红;卢辉;麻祥才;;包装工程;20200610(第11期);全文 *

Also Published As

Publication number Publication date
CN112307714A (en) 2021-02-02

Similar Documents

Publication Publication Date Title
CN112307714B (en) Text style migration method based on dual-stage depth network
CN108510456B (en) Sketch simplification method of deep convolutional neural network based on perception loss
Ye et al. Deep learning hierarchical representations for image steganalysis
Gai et al. New image denoising algorithm via improved deep convolutional neural network with perceptive loss
CN110110323B (en) Text emotion classification method and device and computer readable storage medium
CN111091045A (en) Sign language identification method based on space-time attention mechanism
CN111079532A (en) Video content description method based on text self-encoder
CN109934158B (en) Video emotion recognition method based on local enhanced motion history map and recursive convolutional neural network
CN111986075B (en) Style migration method for target edge clarification
Zsolnai-Fehér et al. Gaussian material synthesis
CN113077388B (en) Data-augmented deep semi-supervised over-limit learning image classification method and system
CN110321805B (en) Dynamic expression recognition method based on time sequence relation reasoning
CN111553837A (en) Artistic text image generation method based on neural style migration
CN112561791B (en) Image style migration based on optimized AnimeGAN
CN111696046A (en) Watermark removing method and device based on generating type countermeasure network
CN114240735B (en) Arbitrary style migration method, system, storage medium, computer equipment and terminal
CN113780249B (en) Expression recognition model processing method, device, equipment, medium and program product
CN114581341A (en) Image style migration method and system based on deep learning
Salem et al. Semantic image inpainting using self-learning encoder-decoder and adversarial loss
WO2024060839A1 (en) Object operation method and apparatus, computer device, and computer storage medium
CN111160327B (en) Expression recognition method based on lightweight convolutional neural network
CN109284765A (en) The scene image classification method of convolutional neural networks based on negative value feature
CN111339734B (en) Method for generating image based on text
CN111667006A (en) Method for generating family font based on AttGan model
CN115862015A (en) Training method and device of character recognition system, and character recognition method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant