CN112307714B

CN112307714B - Text style migration method based on dual-stage depth network

Info

Publication number: CN112307714B
Application number: CN202011210655.3A
Authority: CN
Inventors: 陈金泽; 李龙; 吕奕杭; 廖志寰; 朱安娜
Original assignee: Wuhan University of Technology WUT
Current assignee: Wuhan University of Technology WUT
Priority date: 2020-11-03
Filing date: 2020-11-03
Publication date: 2024-03-08
Anticipated expiration: 2040-11-03
Also published as: CN112307714A

Abstract

A character style migration method based on a dual-stage depth network comprises the steps of firstly constructing a training data set A and a training data set B, then training a de-stylized network by adopting the training data set A to obtain a de-stylized network model, then training a character style migration network by utilizing a de-stylized picture obtained by the de-stylized network model and the training data set B to obtain a character style migration network model, migrating a character picture to be converted into a target reference character picture by utilizing the model, and finally training a texture migration network by utilizing the target reference character picture and the training data set A to obtain a texture migration network model, so that a final result of character style migration can be obtained by utilizing the model. The design has excellent text style migration effect.

Description

Text style migration method based on dual-stage depth network

Technical Field

The invention belongs to the field of deep learning and image style migration, and particularly relates to a text style migration method based on a dual-stage deep network.

Background

Style migration of images refers to the task of migrating one style from one image to another to synthesize a new artistic image. In recent years, with the continuous development of artificial intelligence technology and global creative industry, realization of style migration of text and images has become a requirement for people. It is desirable to generate more artistic fonts for use in the design and promotion of businesses, cultures, and other industries.

The style migration of a text image is different from that of a common image, and relates to two aspects of font migration and texture migration of the text. The former realizes the font conversion of the characters with the same content, and the latter realizes the style appearance conversion of the characters. The artificial synthesis of text images of specific fonts and textures consumes a lot of time and effort, so that the realization of text style migration using an automatic and efficient method is a concern. However, the existing text style migration method is limited to single-stage direct conversion, namely, the two aspects of font (font) and texture (texture) of the text are migrated at one time at the same stage, so that the obtained effect is often not ideal.

Disclosure of Invention

The invention aims to overcome the problems in the prior art and provide a text style migration method based on a dual-stage depth network, which has better migration effect.

In order to achieve the above object, the present invention provides the following technical solutions:

a text style migration method based on a dual-stage depth network sequentially comprises the following steps:

step one, constructing a training data set A and a training data set B, wherein the training data set A comprises stylized text pictures with various textures and corresponding stylized text pictures, and the training data set B comprises reference fonts and other stylized text pictures with various fonts;

constructing a de-stylized network, and training the de-stylized network by adopting a training data set A to obtain a de-stylized network model for de-stylizing textured stylized character images;

thirdly, firstly constructing a font migration network, training the font migration network by utilizing a de-stylized picture obtained by the de-stylized network model and a training data set B to obtain a font migration network model for realizing conversion and migration of various fonts, and then migrating a font picture to be converted into a target reference font picture by utilizing the model;

and fourthly, firstly constructing a texture migration network, then training the texture migration network by utilizing the target reference font picture and the training data set A generated in the third step to obtain a texture migration network model for realizing stylized texture rendering of the font picture, and finally obtaining a final result of character style migration by utilizing the model.

In the second step, the de-stylized network includes an encoder E _X Encoder E _Y And decoder G _X The training of the de-stylized network by using the training data set A sequentially comprises the following steps:

2.1, the de-stylized network randomly selects an image pair (x, y) from the training data set A, and inputs the image pairs to the encoder E respectively _X And E is _Y Wherein y is a stylized text picture with textures, and x is a de-stylized text picture corresponding to y;

2.2 encoder E _X And E is _Y Mapping x and y to shared feature space, coding to generate respective feature graphs, and calculating feature loss L according to the feature graphs _feat And take L as _feat Minimal target optimization training encoder E _X And E is _Y ；

2.3 transmitting the feature map to the decoder G _X Generating a reconstructed de-stylized literal picture, and then calculating pixel loss L according to the reconstructed de-stylized literal picture and picture x _pix And take L as _pix Minimum target optimized training decoder G _X The reconstructed de-stylized text picture is made sufficiently close to picture x.

The de-stylized network further comprises a discriminator D _X The training the de-stylized network using the training data set a further comprises:

2.4, inputting the reconstructed de-stylized text and picture into a discriminator D _X Determining the authenticity of the product, and calculating the countering loss L _adv And optimized using Adam optimizer.

The total loss function to be optimized adopted by the de-stylized network is as follows:

L ₁ ＝λ _feat L _feat +λ _pix L _pix +λ _adv L _adv

L _feat ＝E _x,y [||S _X (E _Y (y))-z|| ₁ ]

z＝S _X (G _X (x))

L _pix ＝E _x,y [||G _X (E _Y (y))-x|| ₁ ]

in the above, L _feat 、L _pix 、L _adv Respectively, characteristic loss, pixel loss and contrast loss, lambda _feat 、λ _pix 、λ _adv Super-parameters of feature loss, pixel loss, contrast loss, S _X Is G _X Z is the content feature, lambda _gp In order to penalize the coefficients,de-stylized literal picture G for image x and reconstruction _X (E _Y (y)) are sampled uniformly.

In the third step, the font migration network includes a generator G and a discriminator D, and the training of the font migration network by using the de-stylized picture and the training data set B obtained by using the de-stylized network model sequentially includes the following steps:

3.1, the font migration network randomly selects a picture x from the training data set B and inputs the picture x into a generator G, and the generator G generates a fake picture G (x, c) according to the picture x and the target font label c;

3.2, on the one hand, inputting the dummy picture G (x, c) into the generator G again to generate a reconstructed picture G (x, c)), supervising the reconstructed picture by taking the de-stylized picture of the de-stylized network model as a target font picture in the reconstruction process, and calculating the font classification loss L of the generator again _f And reconstruction loss L _rec In L _f 、L _rec The generator G is optimized and trained for the purpose of minimization, on the other hand, the fake picture G (x, c) is input into the discriminator D to discriminate the true or false and the font domain of the picture, and the font classification loss L of the discriminator is calculated _r And take L as _r And optimally training the discriminator D for the minimum target.

The loss function to be optimized adopted by the font migration network is as follows:

L ₂ ＝L _D +L _G

L _D ＝-L _adv +λ ₁ L _r

L _G ＝L _adv +λ ₁ L _f +λ ₂ L _rec

L _r ＝E _x,c' [-logD(c'|x)]

L _f ＝E _x,c' [-logD(c|G(x,c))]

L _rec ＝E _x,c [||x-G(G(x,c),c')|| ₁ ]

in the above, L _D For discriminator loss, L _G Generator loss, L _adv 、L _r 、L _f 、L _rec The counterloss, the font classification loss of the discriminator, the font classification loss of the generator, the reconstruction loss, lambda respectively ₁ 、λ ₂ 、λ _gp The hyper-parameters of font classification loss, reconstruction loss and penalty function of counterloss are respectively adopted,for uniform sampling along a straight line between the real picture sample and the dummy picture G (x, c), D (c '|x) is a probability distribution of the discriminator D classifying the real picture sample into the original font field c'.

In step 3.1, the method for generating the dummy pictures G (x, c) includes: the image x and the target font label c are subjected to feature mapping and fusion, and then are transmitted into a deep convolutional network for training.

In the fourth step, the texture migration network comprises an encoder f, a decoder g and an AdaIN self-adaptive normalization layer positioned between the encoder f and the decoder g, wherein the encoder f and the decoder g are constructed by taking a VGG-19 network structure as a reference, the encoder f selects a front L layer of a pretrained VGG-19 network, the decoder g is a symmetrical structure of the encoder f, and all pooling layers are replaced by up-sampling layers;

the training texture migration network by utilizing the target reference font picture and the training data set A generated in the step three sequentially comprises the following steps:

4.1, firstly, mapping a font picture c and a texture style picture s to a feature space by adopting an encoder f to obtain f (c) and f(s), and then carrying out feature transformation on the font picture c and the texture style picture s by an AdaIN self-adaptive normalization layer to obtain a feature map t=AdaIN (f (c), f (s));

4.2, mapping the feature map t back to the original feature space by adopting a decoder g to obtain a stylized result map g (t);

and 4.3, inputting the stylized result graph g (t) and the texture style graph s into the encoder f, and realizing training of a texture migration network through optimization of a loss function.

In step 4.3, the loss function is:

L ₃ ＝L _c +λL _s

L _c ＝||f(g(t))-t|| ₂

in the above, L _c For content loss, L _s As style loss, lambda is the super parameter of style loss, phi _i For the i-th layer of the encoder f, σ and μ are the variance and the mean of each image channel, respectively.

In step 4.1, the characteristic transformation formula of the AdaIN adaptive normalization layer is as follows:

in the above equation, σ and μ are the variance and the mean of each image channel, respectively.

Compared with the prior art, the invention has the beneficial effects that:

according to the character style migration method based on the dual-stage deep network, a training data set A and a training data set B are firstly constructed, then a training data set A is used for training a stylized network to obtain a stylized network model for performing stylized on a textured stylized character image, then a character migration network is trained by using the stylized picture obtained by the stylized network model and the training data set B to obtain a character migration network model for performing multiple character conversion migration, a character picture to be converted is migrated to be a target reference character picture by using the model, finally a texture migration network model for performing stylized texture rendering on the character picture is obtained by using the target reference character picture and the training data set A, a final result of character style migration can be obtained by using the model, and the character style migration effect can be better through performing first-stage character migration and second-stage character migration by performing character texture migration. Therefore, the invention can obtain better text style migration effect.

Drawings

Fig. 1 is an overall flow chart of the present invention.

Fig. 2 is a schematic diagram of a training data set a according to the present invention.

Fig. 3 is a schematic diagram of a training data set B according to the present invention.

Fig. 4 is a schematic diagram of the structure of the de-stylized network according to the present invention.

Fig. 5 is a schematic diagram of a font migration network according to the present invention.

FIG. 6 is a schematic diagram of a texture migration network according to the present invention.

Detailed Description

The invention is further described below with reference to the detailed description and the accompanying drawings.

Referring to fig. 1-6, a text style migration method based on a dual-stage deep network sequentially includes the following steps:

2.3 transmitting the feature map to the decoder G _X Generating a reconstructed de-stylized literal picture, and then calculating pixel loss L according to the reconstructed de-stylized literal picture and picture x _pix And take L as _pix Minimum ofTraining decoder G for target optimization _X The reconstructed de-stylized text picture is made sufficiently close to picture x.

L ₁ ＝λ _feat L _feat +λ _pix L _pix +λ _adv L _adv

L _feat ＝E _x,y [||S _X (E _Y (y))-z|| ₁ ]

z＝S _X (G _X (x))

L _pix ＝E _x,y [||G _X (E _Y (y))-x|| ₁ ]

L ₂ ＝L _D +L _G

L _D ＝-L _adv +λ ₁ L _r

L _G ＝L _adv +λ ₁ L _f +λ ₂ L _rec

L _r ＝E _x,c' [-logD(c'|x)]

L _f ＝E _x,c' [-logD(c|G(x,c))]

L _rec ＝E _x,c [||x-G(G(x,c),c')|| ₁ ]

in the above, L _D For discriminator loss, L _G Generator loss, L _adv 、L _r 、L _f 、L _rec The counterloss, the font classification loss of the discriminator, the font classification loss of the generator, the reconstruction loss, lambda respectively ₁ 、λ ₂ 、λ _gp Respectively fontsClassifying the hyper-parameters of the loss, reconstructing the hyper-parameters of the loss, penalty functions against the loss,for uniform sampling along a straight line between the real picture sample and the dummy picture G (x, c), D (c '|x) is a probability distribution of the discriminator D classifying the real picture sample into the original font field c'.

In step 4.3, the loss function is:

L ₃ ＝L _c +λL _s

L _c ＝||f(g(t))-t|| ₂

The principle of the invention is explained as follows:

the invention provides a character style migration method based on a dual-stage depth network, which is based on a de-stylized network formed by two encoders, a decoder and a discriminator, and realizes de-stylized processing of textured characters by optimizing characteristic loss, pixel loss and contrast loss; based on a font migration network of a generator and a discriminator, realizing first-stage migration of the character fonts by optimizing the countermeasures and the font classification losses; based on a texture migration network of an encoder and a decoder and an AdaIN self-adaptive normalization layer, the content loss and the style loss are optimized by carrying out feature transformation through mean and variance, and the second stage migration of the text texture is realized. The style migration text image obtained by the invention has higher artistic effect, has wide application in the field of visual design, can be used for a plurality of aspects such as artistic image design, cultural commercial image propaganda, painting text processing and the like, is not only suitable for digital and letter images, but also has better expression in the aspect of Chinese characters.

Distinguishing device D _X : in order to make the de-stylized reconstruction result more accurate, the invention adds a discriminator D in the de-stylized network _X For determining the authenticity of the reconstructed picture.

Font classification loss of a discriminatorL _r ＝E _x,c' [-logD(c'|x)]We expect that the input font image x is converted to the output font image y and correctly classified into the target font field c, D (c '|x) represents the probability distribution that the discriminator classifies the true samples into the original font field c', while the objective of the discriminator D is to minimize this portion of the loss as well.

Font classification loss L for generator _f ＝E _x,c' [-logD(c|G(x,c))]The loss function is used to optimize the generator G so that the pictures generated by the generator G can be classified into the target font field c by the discriminator D.

Reconstruction loss L _rec ＝E _x,c [||x-G(G(x,c),c')|| ₁ ]. Considering that the generator G may change only the font-related information of the input picture without changing the font content of the picture to thereby fool the discriminator D, the generated G (x, c) is re-input to the generator G to obtain the picture G (x, c)), which should be kept as consistent as possible with the picture x, so that the loss restriction is performed using the 1-norm.

For countering loss L _adv The problem of model collapse is solved by adopting a WGAN method, namely:

texture migration networks define two losses: content loss L _c And style loss L _s . The content loss is represented by the Euclidean distance between the network output image and the AdaIN layer output feature image, so that the final output content of the model is sufficiently close to the AdaIN layer output feature image t, and the convergence speed is increased; and (3) carrying out encoder encoding on the generated image again after the style loss, obtaining the mean value and variance of the characteristic images of each layer of the VGG network, and carrying out Euclidean distance summation on the mean value and variance of the corresponding layer of the real style image.

Example 1:

referring to fig. 1, a text style migration method based on a dual-stage deep network sequentially comprises the following steps:

1. reference is made to: constructing a training data set A and a training data set B, wherein the training data set A comprises stylized literal pictures with various textures and corresponding stylized literal pictures, and the training data set B comprises reference fonts and other stylized literal pictures with various fonts (see figures 2 and 3);

2. constructing a de-stylized network comprising an encoder E _X Encoder E _Y Decoder G _X Discriminator D _X Encoder E _X And E is _Y The last several layers of weights share, and the network structure adopted by the de-stylized network is shown in table 1:

table 1 de-stylized network structure table

L ₁ ＝λ _feat L _feat +λ _pix L _pix +λ _adv L _adv

in the above, L _feat 、L _pix 、L _adv Respectively, characteristic loss, pixel loss and contrast loss, lambda _feat 、λ _pix 、λ _adv Super parameters of characteristic loss, pixel loss and contrast loss respectively;

3. referring to fig. 4, the de-stylized network is trained by using a training data set a, specifically:

(1) The de-stylized network randomly selects an image pair (x, y) from the training data set A, and inputs the image pairs to the encoder E respectively _X And E is _Y Wherein y is a stylized text picture with textures, and x is a de-stylized text picture corresponding to y;

(2) Encoder E _X And E is _Y Mapping x and y to shared feature space, coding to generate respective feature graphs, and calculating feature loss L according to the feature graphs _feat And take L as _feat Minimal target optimization training encoder E _X And E is _Y ；

The task of the encoder is to make the result more similar to the group Truth of the content feature. By S _X Represents G _X For instruction, then the content feature for instruction is defined as z=s _X (G _X (x))，L _feat For guiding E _Y Removing texture feature elements from a text image, preserving core font information, defined as

L _feat ＝E _x,y [||S _X (E _Y (y))-z|| ₁ ]；

(3) Transmitting the characteristic diagram to a decoder G _X Generating a reconstructed de-stylized literal picture, and then calculating pixel loss L according to the reconstructed de-stylized literal picture and picture x _pix And take L as _pix Minimum target optimized training decoder G _X Enabling the reconstructed de-stylized text picture to be sufficiently close to the picture x;

the de-stylized network needs to approximate the resulting reconstruction to picture x, so pixel loss constraint is performed using the 1-normal form, the pixel loss being defined as:

L _pix ＝E _x,y [||G _X (E _Y (y))-x|| ₁ ]

(4) Inputting the reconstructed de-stylized text picture into a discriminator D _X Determining the authenticity of the product, and calculating the countering loss L _adv (definition of countermeasures against loss in network is to guide G) _X And E is _Y Confusion D _X )：

In the above, lambda _gp In order to penalize the coefficients,de-stylized literal picture G for image x and reconstruction _X (E _Y (y)) is uniformly sampled;

and optimizing it with Adam optimizer, settingLearning rate parameter is 0.0002 lambda _feat ＝λ _pix ＝100，λ _gp ＝10，λ _adv =1, finally yielding an encoder E usable for de-stylization _Y And decoder G _X Model-generated G _X (E _Y (y)) the image is sufficiently close to the de-stylized picture x:

4. the method comprises the steps of constructing a font migration network, wherein the font migration network comprises a generator G and a discriminator D, the generator G comprises 2 convolution layers, 6 residual layers and 2 deconvolution layers, normalization processing is used, the total network flow is that the model dimension of the generator G is reduced to 4 times, then the 6 residual networks are used for obtaining equal dimension output, the transposed convolution is used for amplifying by 4 times, and finally tan h is taken as output through convolution with a constant dimension, and the network structure adopted by the generator G is shown in a table 2:

table 2 structural table of generator G

The convolution kernel size in each convolution layer is 4*4, the step size is 2, and the dimension is reduced by 1/2 each time the convolution operation is performed. The normalization layer normalizes IN an image channel and calculates a mean value (IN) according to H; since the generation result mainly depends on a certain image instance, normalization (BN) of the whole batch is not suitable for the stylization of the image, and thus normalization of h×w can accelerate model convergence and keep independence between each image instance. The activation function adopts LeakyReLU, and the derivative is always non-zero because the function output has a small gradient to the negative input, so that the occurrence of silent neurons can be reduced, gradient-based learning is allowed, and the problem that neurons are not learned after the Relu function enters a negative interval is solved.

Furthermore, to try to avoid the over-fitting problem, we do not fit the desired feature map directly with multiple stacked layers, but explicitly fit a residual map with them. Assuming that the desired feature map is H (x), the stacked nonlinear layers fit to another map, namely F (x) =h (x) -x. Assuming that it is easier to optimize the residual map than the desired one, i.e. F (x) =h (x) -x is easier to optimize than F (x) =h (x), in the extreme case the desired map is to fit an identity map, where the task of the residual network is to fit F (x) =0, the common network is to fit F (x) =x, the former is obviously easier to optimize.

The discriminator D adopts a PatchGAN structure to carry out true and false classification on the local image blocks, a normalization layer is not used, the output of Conv1 is used for representing the prediction probability of a target font, the output of Conv2 is used for representing whether the picture is true or not, and the relationship between the two is parallel;

L ₂ ＝L _D +L _G

L _D ＝-L _adv +λ ₁ L _r

L _G ＝L _adv +λ ₁ L _f +λ ₂ L _rec

in the above, L _D For discriminator loss, L _G Generator loss, L _adv 、L _r 、L _f 、L _rec The counterloss, the font classification loss of the discriminator, the font classification loss of the generator, the reconstruction loss, lambda respectively ₁ 、λ ₂ The super parameters are the super parameters of font classification loss and reconstruction loss respectively;

5. referring to fig. 5, training a font migration network by using a de-stylized picture and a training data set B obtained by the de-stylized network model to obtain a font migration network model for implementing multiple font conversion migration, specifically:

(1) The font migration network randomly selects a picture x from the training data set B and inputs the picture x into a generator G, the generator G performs feature mapping and fusion on the picture x and a target font label c, and then the picture x and the target font label c are transmitted into a deep convolution network for training to generate a fake picture G (x, c);

(2) On the one hand, the fake picture G (x, c) is input into the generator G again to generate a reconstructed picture G (G (x, c)), and the wind removal of the network model is stylized by wind removal in the reconstruction processThe stylized picture is used as a target font picture to be supervised, the preservation of the picture content in the picture conversion process is ensured, only the part of the field difference is changed, and the font classification loss L of the generator is calculated _f And reconstruction loss L _rec In L _f 、L _rec The generator G is optimized and trained for the purpose of minimization, on the other hand, the fake picture G (x, c) is input into the discriminator D to discriminate the true or false and the font domain of the picture, and the font classification loss L of the discriminator is calculated _r And take L as _r The minimum is to optimize the training of the discriminator D:

L _r ＝E _x,c' [-logD(c'|x)]

L _f ＝E _x,c' [-logD(c|G(x,c))]

L _rec ＝E _x,c [||x-G(G(x,c),c')|| ₁ ]

in the above, lambda _gp In order to combat the penalty function of the loss,for uniform sampling along a straight line between a real picture sample and a fake picture G (x, c), D (c '|x) is a probability distribution of the discriminator D classifying the real picture sample into an original font field c';

the training of the model adopts the parameter beta ₁ ＝0.5，β ₂ Adam optimizer of =0.999, to increase data, flip the image at a probability level of 0.5, perform 1 generator update after 5 arbiter updates, set the batch size for all trials to 16, train all models in the first 10 epochs with a learning rate of 0.0001, and linearly attenuate the learning rate to 0 in the next 10 epochs;

6. migrating a certain font picture to be converted into a target reference font picture by using the obtained font migration network model;

7. constructing a texture migration network, wherein the texture migration network comprises an encoder f, a decoder g and an AdaIN self-adaptive normalization layer positioned between the encoder f and the decoder g, the encoder f and the decoder g are constructed by taking a VGG-19 network structure as a reference, the encoder f selects relu1_1 to relu4_1 parts of a pretrained VGG-19 network, the decoder g is a symmetrical structure of the encoder f, but all pooling layers are replaced by upsampling layers, and the specific structure of the texture migration network is shown in a table 3:

table 3 texture migration network structure table

The convolution kernels of the network convolution layers are 3*3, the step length is 1, the window size of the MaxPool maximum pooling layer is 2 x 2, and the up-sampling layer adopts a nearest neighbor interpolation algorithm;

8. referring to fig. 6, training a texture migration network by using a target reference font picture and a training data set a to obtain a texture migration network model for realizing stylized texture rendering of the font picture, specifically:

(1) Firstly, mapping a font picture c and a texture style picture s to a feature space by adopting an encoder f to obtain f (c) and f(s), and then carrying out feature transformation on the font picture c and the texture style picture s through an AdaIN self-adaptive normalization layer to obtain a feature map t=AdaIN (f (c), f (s)):

in the above formula, sigma and mu are respectively the variance and the mean of each image channel;

(2) Mapping the feature map t back to the original feature space by adopting a decoder g to obtain a stylized result map g (t);

(3) Inputting a stylized result graph g (t) and a texture style graph s into an encoder f, and realizing training of a texture migration network through optimization of a loss function, wherein the loss function is as follows:

L ₃ ＝L _c +λL _s

L _c ＝||f(g(t))-t|| ₂

in the above, L _c For content loss, L _s As style loss, lambda is the super parameter of style loss, phi _i For the ith layer of the encoder f, sigma and mu are the variance and the mean of each image channel respectively;

optimization loss selects Adam optimizer and sets the batch size to 8.

Claims

1. A character style migration method based on a dual-stage depth network is characterized in that:

the method sequentially comprises the following steps:

step one, constructing a training data set A and a training data set B, wherein the training data set A comprises stylized text pictures with various textures and corresponding stylized text pictures, and the training data set B comprises reference fonts and stylized text pictures with various fonts;

constructing a de-stylized network, and training the de-stylized network by adopting a training data set A to obtain a de-stylized network model for de-stylizing textured stylized text images, wherein the de-stylized network comprises an encoder E _X Encoder E _Y Decoder G _X Discriminator D _X The training of the de-stylized network by using the training data set A sequentially comprises the following steps:

2.3 transmitting the feature map to the decoder G _X Generating a reconstructed de-stylized literal picture, and then calculating pixel loss L according to the reconstructed de-stylized literal picture and picture x _pix And take L as _pix Minimum target optimized training decoder G _X ；

2.4, inputting the reconstructed de-stylized text and picture into a discriminator D _X Determining the authenticity of the product, and calculating the countering loss L _adv And optimizing the model by using an Adam optimizer;

L ₁ ＝λ _feat L _feat +λ _pix L _pix +λ _adv L _adv

L _feat ＝E _x,y [||S _X (E _Y (y))-z|| ₁ ]

z＝S _X (G _X (x))

L _pix ＝E _x,y [||G _X (E _Y (y))-x|| ₁ ]

in the above, L _feat 、L _pix 、L _adv Respectively, characteristic loss, pixel loss and contrast loss, lambda _feat 、λ _pix 、λ _adv Super-parameters of feature loss, pixel loss, contrast loss, S _X Is G _X Z is the content feature, lambda _gp In order to penalize the coefficients,de-stylized literal picture G for image x and reconstruction _X (E _Y (y)) is uniformly sampled;

thirdly, constructing a font migration network, training the font migration network by utilizing a de-stylized picture and a training data set B which are obtained by the de-stylized network model to obtain a font migration network model for realizing conversion and migration of various fonts, and migrating a font picture to be converted into a target reference font picture by utilizing the model, wherein the font migration network comprises a generator G and a discriminator D, and the de-stylized picture and the training data set B which are obtained by utilizing the de-stylized network model train the font migration network sequentially and comprise the following steps:

3.2, on the one hand, inputting the dummy picture G (x, c) into the generator G again to generate a reconstructed picture G (x, c)), supervising the reconstructed picture by taking the de-stylized picture of the de-stylized network model as a target font picture in the reconstruction process, and calculating the font classification loss L of the generator again _f And reconstruction loss L _rec In L _f 、L _rec The generator G is optimized and trained for the purpose of minimization, on the other hand, the fake picture G (x, c) is input into the discriminator D to discriminate the true or false and the font domain of the picture, and the font classification loss L of the discriminator is calculated _r And take L as _r Optimally training the discriminator D for the minimum target;

L ₂ ＝L _D +L _G

L _D ＝-L _adv +λ ₁ L _r

L _G ＝L _adv +λ ₁ L _f +λ ₂ L _rec

L _r ＝E _x,c' [-logD(c'|x)]

L _f ＝E _x,c' [-logD(c|G(x,c))]

L _rec ＝E _x,c [||x-G(G(x,c),c')|| ₁ ]

in the above, L _D For discriminator loss, L _G Generator loss, L _adv 、L _r 、L _f 、L _rec The counterloss, the font classification loss of the discriminator, the font classification loss of the generator, the reconstruction loss, lambda respectively ₁ 、λ ₂ 、λ _gp The hyper-parameters of font classification loss, reconstruction loss and penalty function of counterloss are respectively adopted,for uniform sampling along a straight line between a real picture sample and a fake picture G (x, c), D (c '|x) is a probability distribution of the discriminator D classifying the real picture sample into an original font field c';

constructing a texture migration network, training the texture migration network by using the target reference font picture and the training data set A generated in the step three to obtain a texture migration network model for realizing stylized texture rendering of the font picture, and finally obtaining a final result of text style migration by using the model, wherein the texture migration network comprises an encoder f, a decoder g and an AdaIN self-adaptive normalization layer positioned between the encoder f and the decoder g, wherein the encoder f and the decoder g are constructed by taking a VGG-19 network structure as a benchmark, the encoder f selects a front L layer of a pre-trained VGG-19 network, the decoder g is a symmetrical structure of the encoder f, and all pooling layers are replaced by up-sampling layers;

4.3, inputting the stylized result graph g (t) and the texture style graph s into the encoder f, and realizing training of a texture migration network through optimization of the following loss functions:

L ₃ ＝L _c +λL _s

L _c ＝||f(g(t))-t|| ₂

in the above, L _c For content loss, L _s For style loss, lambda is the hyper-parameter of style loss,for the i-th layer of the encoder f, σ and μ are the variance and the mean of each image channel, respectively.

2. The text style migration method based on the dual-stage deep network of claim 1, wherein the text style migration method is characterized by:

3. The text style migration method based on the dual-stage deep network of claim 1, wherein the text style migration method is characterized by: