CN114399427B

CN114399427B - Word effect migration method based on loop generation countermeasure network

Info

Publication number: CN114399427B
Application number: CN202210018956.9A
Authority: CN
Inventors: 牛玉贞; 李悦洲; 陈沛祥
Original assignee: Fuzhou University
Current assignee: Fuzhou University
Priority date: 2022-01-07
Filing date: 2022-01-07
Publication date: 2024-06-28
Anticipated expiration: 2042-01-07
Also published as: CN114399427A

Abstract

The invention relates to a word effect migration method based on a loop generation countermeasure network. Comprising the following steps: s1: processing the data set to obtain a font image and word effect image pair, and preprocessing each image to obtain a training data set; s2: designing a word effect style migration network based on a loop generation countermeasure network, wherein the network consists of a word effect removal sub-network and a word effect migration sub-network; s3: designing a loss function for training the network designed in the step S2; s4: training a word effect style migration network based on a loop generation countermeasure network using the training data set; s5: inputting the new target word effect image and the word effect image to be migrated into a trained word effect migration network, and outputting the image with the target word effect after the word effect migration. The invention can realize word effect style migration and generate high-quality and strong-structural word effect migration images.

Description

Word effect migration method based on loop generation countermeasure network

Technical Field

The invention relates to the field of image processing and computer vision, in particular to a word effect migration method based on a cyclic generation countermeasure network.

Background

In recent years, style migration technology has been unprecedented, especially, new network models are proposed and new mathematical methods are introduced, and more style migration methods emerge. For an artistic image, everyone has a unique insight into the artistic style. Therefore, it is difficult to uniformly describe and express an artistic style possessed by one artistic image, and after all, the concept and definition of the artistic style are ambiguous. This is a more difficult problem to describe in order to convert the style of a piece of content image into the artistic style of a given style image. The development of computer vision and the upgrading of hardware have made possible the implementation of this problem, and thus many researchers have begun to invest in research on style migration.

Font special effects are a combination of visual elements, such as outlines, colors and textures of words, that can significantly improve the artistry of words, often used in the design industry, but due to their special complexity they are usually created manually by humans. Word effect style migration is a subtask of style migration, which is a technique that applies a font-specific effect to different glyphs. Since the fonts in the character effect style migration are highly structured, most of the style migration methods are not suitable for the character effect style migration, but the character effect style migration can design a reference standard image. In recent years, some new attempts have been made to style migration, however, the lack of data limits the performance of style migration models. In 2020, yang et al proposed a new word effect data set, i.e., TE141K data set, and a baseline method for generating a challenge network (TET-GAN) by word effect migration, which greatly motivated the development of word effect style migration tasks.

The word effect style migration has important practical significance, and for Chinese characters with huge quantity and base numbers, people can manually realize word effect images of all common words, and huge labor and time cost can be consumed. The word effect style migration algorithm just solves the problem, and a developer only needs to design a few representative word effect samples, so that the words for applying the word effect to other fonts can be automatically generated through the word effect style migration algorithm, thereby avoiding repetitive labor and saving a great amount of time.

The invention provides a word effect style migration method based on a circularly generated countermeasure network. According to the method, the network is trained to complete two tasks of word effect migration and word effect removal, so that the network can learn to decouple and recombine the character pattern features and the word effect features of the characters, and high-quality word effect style migration is completed. The method trains the network using a loop-generated countermeasure and introduces a attentive mechanism to strengthen the supervision of the glyph edge structure by the model. The word effect style migration method based on the circularly generated countermeasure network can effectively complete the word effect style migration task, and the generated word effect image is improved in visual effect and performance index.

Disclosure of Invention

The invention aims to provide a word effect migration method based on a loop generation countermeasure network, which can realize high-quality word effect style migration.

In order to achieve the above purpose, the technical scheme of the invention is as follows: a method for word effect migration based on a loop generation countermeasure network, comprising the steps of:

S1, processing a data set to obtain a font image and a word effect image pair, and preprocessing each image to obtain a training data set;

Step S2, designing a word effect style migration network based on a loop generation countermeasure network, wherein the network consists of a word effect removal subnet and a word effect migration subnet;

Step S3, designing a loss function for training the network designed in the step S2;

s4, training a word effect style migration network based on a loop generation countermeasure network by using a training data set;

and S5, inputting the new target word effect image and the word effect image to be migrated into a trained word effect style migration network, and outputting the image with the target word effect after the word effect migration.

In an embodiment of the present invention, the step S1 is specifically implemented as follows:

S11, dividing each image in the original input data set into an independent font image and a word effect image with the same size by using the font image and the word effect image with the same size; combining the font image and the word effect image from the same original image into a font image and word effect image pair;

step S12, the two images of the font image and the word effect image pair are subjected to the same random overturn and are cut into a font image and a word effect image with H multiplied by W at the center, and the relationship between the font image and the word effect image pair is maintained;

step S13, normalizing all font images and word effect images with H multiplied by W, giving an image I (I, j), and calculating a normalized image The formula of (2) is as follows:

where (i, j) denotes the position of the pixel.

In an embodiment of the present invention, the step S2 is specifically implemented as follows:

s21, designing a generation network and a discrimination network of a word effect removal subnet, and respectively generating probabilities of a word effect removal image and a discrimination image being a real image by using the designed network;

And S22, designing a generation network and a discrimination network of the character effect migration sub-network, and respectively generating probabilities of the character effect migration image and the discrimination image being real images by using the designed network.

In an embodiment of the present invention, the step S21 is specifically implemented as follows:

Step S211, designing a generating network of the word effect removing sub-network, wherein the input of the generating network is a pair of a font image and a word effect image, the font image is represented by A _s, and the word effect image is represented by A _e; the generating network is composed of encoders E _x and E _y, a shared layer S ₁ for extracting image features, and a generator G ₁ for reconstructing features into word effect removed images

Step S212, design encoders E _y and E _x, whose inputs are a _e and a _s, respectively; the encoder E _y consists of five convolution blocks, each of which consists of an activation layer, a convolution layer, a normalization layer and an attention module in sequence; the activation layer adopts LeakyReLU activation functions, the convolution layer uses convolution with a convolution kernel of 4×4, a step size of 2 and a filling of 1, the normalization layer uses an example for normalization, and the attention module uses a convolution attention module (CBAM); image characteristics of size C x H x W obtained by inputting image into encoder E _y The encoder E _x and the encoder E _y have the same structure, and c×h×w image characteristics obtained by inputting an image to the encoder E _x

Step S213, designing a shared layer S ₁,S₁ to be a network with a Unet structure; features of size C.times.H.times.W obtained in step S212AndCan be used as the input of the sharing layer S ₁, the input is that f ₀,f₀ sequentially passes through three convolution blocks which are the same as those in S212 to respectively obtain the characteristic f ₁ with the size of C multiplied by H/2 multiplied by W/2, the characteristic f ₂ with the size of C multiplied by H/4 multiplied by W/4 and the characteristic f ₃;f₃ with the size of C multiplied by H/8 multiplied by W/8, and the like, and the method comprises the steps of Deconvolution blocks composed of a deconvolution layer, a normalization layer and an attention module in sequence obtain a characteristic f ₄ of C multiplied by H/4 multiplied by W/4; The activation layer in the deconvolution block adopts a ReLU activation function, the deconvolution layer uses deconvolution with a convolution kernel of 4 multiplied by 4, a step length of 2 and a filling of 1, the normalization layer uses an example for normalization, and the attention module uses a convolution attention module (CBAM); the feature f ₂ and the feature f ₄ are spliced in the channel dimension to obtain a feature with the size of 2C x H/4 x W/4, and the feature is subjected to deconvolution to obtain a feature f ₅ with the size of C x H/2 x W/2; The feature f ₁ and the feature f ₅ are spliced in the channel dimension to obtain a feature with the size of 2C multiplied by H/2 multiplied by W/2, and the feature is subjected to deconvolution to obtain a feature f ₆ with the size of C multiplied by H multiplied by W; the feature f ₀ and the feature f ₆ are spliced in the channel dimension to obtain an output feature f _s with the size of 2C multiplied by H multiplied by W; When the input features areThe output characteristic f _s is then noted asWhen the input features areThe output characteristic f _s is then noted as

Step S214, a design generator G ₁; the input to generator G ₁ is the characteristics generated in step S213Or (b)The generator G ₁ consists of five deconvolution blocks identical to those in step S213, a deconvolution layer with a convolution kernel of 3 x 3, a step size of 1, a padding of 1, and a tanh activation function; each deconvolution block reduces the channel number of the feature by half and improves the width and the height of the feature to two times; when the input features areFeatures of size 2C H WThe feature with the size of C/16 multiplied by 32H multiplied by 32W is obtained through five deconvolution blocks, and finally the feature is subjected to deconvolution layer and tanh activation function to obtain A3-channel word effect removal image with the same size as the input word effect image A _e When the input features areFeatures of size 2C H WThe feature with the size of C/16 multiplied by 32H multiplied by 32W is obtained through five deconvolution blocks, and finally the feature is subjected to deconvolution layer and tanh activation function to obtain A3-channel font reconstruction image with the same size as the input font image A _s

Step S215, designing a discrimination network of a word effect removal subnet, wherein the discrimination network consists of a discriminator D ₁, and the discriminator D ₁ adopts a structure of a Markov discriminator in PatchGAN; the discriminator D ₁ consists of 5 convolution blocks which are sequentially composed of a convolution layer, an instance normalization layer and an activation layer, wherein the convolution layer uses convolution with a convolution kernel of 4 multiplied by 4, a step length of 2 and a filling of 1, the normalization layer uses instance normalization, and the activation layer uses LeakyReLU activation functions; the inputs to the arbiter D ₁ are the inputs to the arbiter A _e andThe result obtained by splicing the false sample or the result obtained by splicing the A _e and the A _s in the channel dimension, namely the real sample, is output as an N multiplied by 1 tensor, wherein N=32, and the probability and the calculation loss of the image as the real image are judged by the average value of the tensor.

In an embodiment of the present invention, the step S22 is specifically implemented as follows:

Step S221, designing a generating network of a word effect migration subnet, wherein the inputs of the generating network are a word pattern image A _s to be migrated and a target word effect image B _e; the generating network is composed of an encoder E _x、E_z, a sharing layer S ₂ and a generator G ₂, wherein the encoder and the sharing layer are used for extracting image features, and the generator reconstructs the features into word-effect migration images

Step S222, designing the encoder E _z,E_z, and the encoders E _x and E _y in step S212 to have the same structure, and inputting the target byte image B _e into the encoder to obtain image characteristics with the size of C×H×W

Step S223, designing a shared layer S ₂,S₂ to be a network with a Unet structure; the structure of the shared layer S ₂ is the same as that of the shared layer S ₁ in step S213, and the input characteristic is a characteristic of the size of c×h×w output from the encoder in step S222Features of size 2C x H x W

Step S224, designing a generator G ₂, the structure of the generator G ₂ is the same as the structure of the generator G ₁ in step S214, but the input is the output feature of the shared layer S ₁ And sharing layer S ₂ output featuresStitching features in the channel dimension, feature size 4C x H x W, output 3-channel and same size word-effect migrated image as input image B _e

Step S225, designing a discrimination network of the word effect migration sub-network, wherein the discrimination network consists of a discriminator D ₂, a discriminator D ₂ adopts the structure of a Markov discriminator in PatchGAN, and the structures of a discriminator D ₂ and a discriminator D ₁ in S215 are the same; the input to arbiter D ₂ is the word effect migration image generated by A _s、B_e and generator G ₂ The output of the concatenation in the channel dimension, i.e. the false sample, or the concatenation in the channel dimension, i.e. the true sample, of the a _s、B_e and the standard effect migration image a _e is an nxnx1 tensor, where n=32, and the probability that the image is the true image and the calculation loss are determined by the mean of this tensor.

In an embodiment of the present invention, the step S3 is specifically implemented as follows:

step S31, designing a total optimization target of the whole network, wherein the optimization target is as follows:

Wherein, Representing the loss function of the training image encoding and reconstruction,Representing the training handwriting effect removal subnet loss function,The loss function of the training handwriting migration subnet is represented, E, G, D respectively represents an encoder (E _x、E_y、E_z), a generator (G ₁、G₂) and a discriminator (D ₁、D₂);

step S32, designing a loss function of training image coding and reconstruction, The calculation formula of (2) is as follows:

Where x represents a glyph image, G ₁(S₁(E_x (x))) represents an image encoded and reconstructed image, λ _rec represents a weight, E _x represents an encoder E _x,S₁ represents a shared layer S ₁,G₁ represents a generator G ₁,|| ||₁ that is an absolute value taking operation;

step S33, designing a loss function for removing the subnet by training handwriting effect, The calculation formula of (2) is as follows:

Wherein, Indicating a loss of the characteristic(s),Indicating that the word effect removes the pixel loss,Representing the word effect removal condition against the loss of resistance, λ _dfeat,λ_dpix and λ _dadv are the weights of each of the three parts;

The feature loss directs the encoder to remove the word effect from the word effect image, retaining only the glyph information, and the formula for λ _dfeat is as follows:

Wherein x represents a font image, y represents a word effect image, E _x and E _y represent two encoders E _x and E _y,S₁ represent that the shared layer S ₁,|| ||₁ is an absolute value taking operation;

the word effect removal pixel loss is a word effect removal image that directs the generator to generate a more similar content glyph image, The calculation formula of (2) is as follows:

Where x represents a glyph image, G ₁ represents a generator G ₁, y represents a word effect image, G ₁(S₁(E_y (y))) represents an image from which a word effect is removed, and || ₁ is an absolute value taking operation;

the word effect removal condition is used to improve the quality of the generated result, enable the generator to learn to "fool" the arbiter, also enable the arbiter to distinguish between the real image and the generated image, employ the condition in WGAN-GP against the loss, The calculation formula of (2) is as follows:

where lambda _gp is the weight of the gradient penalty, Is the result of the fusion of the font image and the generated word effect removing image,Representation ofD ₁ represents the gradient of the arbiter D ₁,The calculation formula of (2) is as follows:

wherein epsilon represents a fusion weight parameter;

step S34, designing a loss function of the training handwriting migration subnetwork, The calculation formula of (2) is as follows:

Wherein, Representing a word effect migrate pixel loss,Representing the word effect migration condition against the loss of resistance, lambda _spix、λ_sadv is the weight of each of the two parts;

the effect migrate pixel loss is to instruct the generator to generate an image that is closer to the target effect image, The calculation formula of (2) is as follows:

Where x represents a glyph image, y represents a reference image corresponding to x and having a target word effect, z represents a target word effect image, E _x represents an encoder E _x,E_z represents a style encoder E _z,S₁ and S ₂ represent two shared layers S ₁ and S ₂,G₂ represent a word effect image generator G ₂,G₂(concat(S₁(E_x(x)),S₂(E_z (z))) respectively), and a word effect migration generated image;

word effect migration condition combat penalty Also, using the design in WGAN-GP, the calculation formula is as follows:

where lambda _gp is the weight of the gradient penalty, Is the result of the fusion of the standard word effect image and the generated word effect image,Representation ofD ₂ represents the gradient of the arbiter D ₂,The calculation formula of (2) is as follows:

Where ε represents the fusion weight parameter.

In an embodiment of the present invention, the step S4 is specifically implemented as follows:

Step S41, randomly selecting two pairs of images with the same word effect from a data set, wherein the font image and the word effect image of the first image pair are respectively marked as A _s and A _e, and the font image and the word effect image of the second image pair are respectively marked as B _s and B _e;

s42, training image coding and reconstruction; input glyph image A _s, through encoder E _x, shared layer S ₁, and generator G ₁, generates an encoded and reconstructed image Calculating the loss in the step S32, calculating the gradient of each parameter in E _x、S₁ and G ₁ by using a back propagation method, and updating the parameters by using an Adam optimization method;

Step S43, training word effect removing the discrimination network of the sub-network; input word effect image A _e, obtaining word effect removed image through encoder E _y, shared layer S ₁ and generator G ₁ Calculating word effect removal conditions against resistance lossCalculating the gradient of each parameter in the discriminator D ₁ by using a back propagation method, and updating the parameters by using an Adam optimization method;

step S44, training word effect removing the generation network of the sub-network; input word effect image A _e, obtain feature S ₁(E_y(A_e) through encoder E _y, shared layer S ₁), and obtain word effect removed image through generator G ₁ Inputting a font image a _s, obtaining a feature S ₁(E_x(A_s by the encoder E _x and the sharing layer S ₁); calculating a total loss function of a word-effect removal subnetCalculating gradients of the parameters in E _y、S₁ and G ₁ by using a back propagation method, and updating the parameters by using an Adam optimization method;

Step S45, training the handwriting migration subnetwork in two stages, wherein each stage is used for training a judging network and a generating network of the handwriting migration subnetwork; judging a first stage of a network of the word effect migration subnetwork; input font image A _s, obtained by encoder E _x, shared layer S ₁ Input word effect image B _e, obtain characteristics through encoder E _z, sharing layer S ₂ Will beAndConnecting the inputs as generator G ₂ in the channel dimension, generating a word-effect migration imageCalculating word effect migration conditions against resistance lossCalculating the gradient of each parameter in the discriminator D ₂ by using a back propagation method, and updating the parameters by using an Adam optimization method; training a first stage of generating a handwriting migration subnet; the input image is the same as the training discrimination network, and the character effect migration image is generatedComputing total loss function of word effect migration subnetworkCalculating gradients of the parameters in E _x、E_z、S₁、S₂ and G ₂ by using a back propagation method, and updating the parameters by using an Adam optimization method;

Step S46, training a second stage of the network discrimination of the word effect migration subnetwork; input font image B _s, obtained by encoder E _x, shared layer S ₁ Inputting the word effect migration image generated in step S45Obtaining features by encoder E _z, shared layer S ₂ Will beAndConnecting the inputs as generator G ₂ in the channel dimension, generating a word-effect migration imageCalculating word effect migration conditions against resistance lossCalculating the gradient of each parameter in the discriminator D ₂ by using a back propagation method, and updating the parameters by using an Adam optimization method; training a second stage of generating a network of the handwriting migration subnetwork; the input image is identical to the training discriminator, and a word effect migration image is generatedComputing total loss function of word effect migration subnetworkCalculating gradients of the parameters in E _x、E_z、S₁、S₂ and G ₂ by using a back propagation method, and updating the parameters by using an Adam optimization method;

In step S47, the above steps are a complete iteration for two pairs of images, two hundred and fifty iterations are needed in the whole training process, and in each iteration process, a plurality of pairs of images with two pairs of the same word effect are randomly sampled and used as a batch for training, the previous hundred iterations do not perform the training of the second stage in step S46, and the remaining hundred and fifty iterations perform the training of all steps.

In an embodiment of the present invention, the step S5 is specifically implemented as follows:

Inputting the font image to be migrated, and obtaining font characteristics through the encoder E _x and the sharing layer S ₁ Inputting target word effect image, obtaining word effect characteristic by encoder E _z and sharing layer S ₂ Will beAndThe input as generator G ₂ is connected in the channel dimension, generating a word effect migration image having the word effect of the word effect image and the glyph of the glyph image.

Compared with the prior art, the invention has the following beneficial effects: the invention aims to solve the problem that the traditional style migration model can not separate the fonts and the word effects well and generate high-quality word effect migration images. The invention provides a word effect style migration method based on a cyclic generation countermeasure network, which simultaneously trains the network to complete two tasks of word effect migration and word effect removal, so that the network can learn the character pattern characteristics and the word effect characteristics of the decoupled and recombined characters. And training the network in a manner of circularly generating the countermeasure, introducing a attention mechanism to strengthen the supervision of the model on the edge structure of the character pattern, and ensuring that the network can generate a character effect migration image with high quality and strong structure.

Drawings

FIG. 1 is a flow chart of a method according to an embodiment of the invention.

FIG. 2 is a word effect migration model according to an embodiment of the present invention.

Fig. 3 is a training encoding and reconstruction process according to an embodiment of the present invention.

Fig. 4 is a training handwriting removal process according to an embodiment of the present invention.

Fig. 5 is a training handwriting migration subnet process according to an embodiment of the invention.

FIG. 6 is a cycle generation countermeasure training process in accordance with an embodiment of the present invention.

Detailed Description

The technical scheme of the invention is specifically described below with reference to the accompanying drawings.

The invention provides a word effect style migration method based on a loop generation countermeasure network, which is shown in figure 1 and comprises the following steps:

and S5, inputting the new target word effect image and the word effect image to be migrated into a trained word effect migration network, and outputting the image with the target word effect after the word effect migration.

Further, step S1 includes the steps of:

S11, dividing each image in the original input data set into an independent font image and a word effect image with the same size by using the font image and the word effect image with the same size; and combining the font image and the word effect image from the same original image into a font image and word effect image pair.

And step S12, the two images of the font image and the word effect image pair are subjected to the same random overturn and are cut into the font image and the word effect image with the size of H multiplied by W at the center, and the relationship between the font image and the word effect image pair is maintained.

And step S13, normalizing all the font images and the character effect images with the size of H multiplied by W. Given image I (I, j), a normalized image is calculatedThe formula of (2) is as follows:

where (i, j) denotes the position of the pixel.

Further, step S2 includes the steps of:

And S21, designing a generation network and a discrimination network of the word effect removal sub-network, and respectively generating probabilities of the word effect removal image and the discrimination image being real images by using the designed network.

Further, step S21 includes the steps of:

Step S211, designing a generating network of the word effect removing sub-network, wherein the input of the network is a pair of a font image and a word effect image, the font image is denoted by a _s, and the word effect image is denoted by a _e. The generating network is composed of encoders E _x and E _y, a shared layer S ₁ for extracting image features, and a generator G ₁ for reconstructing features into word effect removed images

Step S212, design encoders E _y and E _x, whose inputs are a _e and a _s, respectively. The encoder E _y consists of five convolution blocks, each of which consists of an activation layer, a convolution layer, a normalization layer and an attention module in sequence. The activation layer uses LeakyReLU activation functions, the convolution layer uses convolution with a convolution kernel of 4 x 4, a step size of 2, and a fill of 1, the normalization layer uses instance normalization, and the attention module uses convolution attention module (CBAM). Image characteristics of size C x H x W obtained by inputting image into encoder E _y The encoder E _x and the encoder E _y have the same structure, and c×h×w image characteristics obtained by inputting an image to the encoder E _x

Step S213, design sharing layer S ₁,S₁ is a network of Unet structure. Features of size C.times.H.times.W obtained in step S212AndCan be used as the input of the sharing layer S ₁, the input is that f ₀,f₀ sequentially passes through three convolution blocks which are the same as those in S212 to respectively obtain the characteristic f ₁ with the size of C multiplied by H/2 multiplied by W/2, the characteristic f ₂ with the size of C multiplied by H/4 multiplied by W/4 and the characteristic f ₃.f₃ with the size of C multiplied by H/8 multiplied by W/8, and the like, and the method comprises the steps of Deconvolution layer, normalization layer and attention module form deconvolution blocks in sequence to obtain characteristic f ₄ of size c×h/4×w/4. The active layer in the deconvolution block adopts a ReLU active function, the deconvolution layer uses deconvolution with a convolution kernel of 4×4, a step size of 2 and a filling of 1, the normalization layer uses an example for normalization, and the attention module uses a convolution attention module (CBAM). Feature f ₂ and feature f ₄ are concatenated in the channel dimension to yield a feature of size 2c×h/4×w/4, which is deconvoluted to yield a feature of size c×h/2×w/2 f ₅ by the deconvolution block described above. Feature f ₁ and feature f ₅ are concatenated in the channel dimension to yield a feature of size 2c×h/2×w/2, which is deconvoluted to yield a feature f ₆ of size c×h×w by the deconvolution block described above. Feature f ₀ and feature f ₆ are spliced in the channel dimension to yield an output feature f _s of the size 2c×h×w. When the input features areThe output characteristic f _s is then noted asWhen the input features areThe output characteristic f _s is then noted as

Step S214, design generator G ₁. The input to generator G ₁ is the characteristics generated in step S213Or (b)The generator G ₁ consists of five deconvolution blocks identical to those in step S213, a deconvolution kernel of 3 x3, step size 1, padding 1, and tanh activation function. Each deconvolution block reduces the number of channels of the feature by half, doubling the width and height of the feature. When the input features areFeatures of size 2C H WThe feature with the size of C/16 multiplied by 32H multiplied by 32W is obtained through five deconvolution blocks, and finally the feature is subjected to deconvolution layer and tanh activation function to obtain A3-channel word effect removal image with the same size as the input word effect image A _e When the input features areFeatures of size 2C H WThe feature with the size of C/16 multiplied by 32H multiplied by 32W is obtained through five deconvolution blocks, and finally the feature is subjected to deconvolution layer and tanh activation function to obtain A3-channel font reconstruction image with the same size as the input font image A _s

In step S215, a discrimination network for removing the sub-network is designed, and the discrimination network is composed of a discriminator D ₁, and a structure of a markov discriminator in PatchGAN is adopted as a discriminator D ₁. The discriminator D ₁ consists of 5 convolutions consisting of a convolutions layer, an instance normalization layer, an activation layer in sequence, the convolutions layer uses convolutions with a convolution kernel of 4 x 4, a step size of 2, and a fill of 1, the normalization layer uses instance normalization, and the activation layer uses LeakyReLU activation functions. The inputs to the arbiter D ₁ are the inputs to the arbiter A _e andThe result of stitching in the channel dimension (i.e. false sample) or the result of stitching a _e and a _s in the channel dimension (i.e. true sample) is output as a tensor of n×n×1, where n=32, and the probability of the image being a true image and the computational loss are determined by the mean of this tensor.

Further, step S22 includes the steps of:

Step S221, designing a generation network of a word effect migration subnet, where the inputs of the network are a word pattern image a _s to be migrated and a target word effect image B _e. The generating network is composed of an encoder E _x、E_z, a sharing layer S ₂ and a generator G ₂, wherein the encoder and the sharing layer are used for extracting image features, and the generator reconstructs the features into word-effect migration images

Step S222, designing the encoder E _z.E_z, and the encoders E _x and E _y in step S212 to have the same structure, and inputting the target byte image B _e into the encoder to obtain image characteristics with the size of C×H×W

Step S223, designing the shared layer S ₂,S₂ is a network of Unet structure. The structure of the shared layer S ₂ is the same as that of the shared layer S ₁ in step S213, and the input characteristic is a characteristic of the size of c×h×w output from the encoder in step S222Features of size 2C x H x W

Step S224, design generator G ₂. The structure of generator G ₂ is the same as that of generator G ₁ in step S214, but the input is the output feature of shared layer S ₁ And sharing layer S ₂ output featuresStitching features in the channel dimension, feature size 4C x H x W, output 3-channel and same size word-effect migrated image as input image B _e

In step S225, a discrimination network of the word effect migration subnetwork is designed, the discrimination network is composed of a discriminator D ₂, a discriminator D ₂ adopts the structure of a markov discriminator in PatchGAN, and the structures of a discriminator D ₂ and a discriminator D ₁ in S215 are the same. The input to arbiter D ₂ is the word effect migration image generated by A _s、B_e and generator G ₂ The output of the concatenation in the channel dimension (i.e. false samples) or the concatenation of a _s、B_e and standard effect migration image a _e in the channel dimension (i.e. true samples) is an N x1 tensor, where n=32, the mean of this tensor is used to determine the probability that the image is a true image and the computational loss.

Further, step S3 includes the steps of:

step S31, designing a total optimization target of the whole network. The optimization targets are as follows:

Wherein, Representing the loss function of the training image encoding and reconstruction,Representing the training handwriting effect removal subnet loss function,The loss function of the training effect migration subnet is represented by E, G, D, which respectively represent an encoder (E _x、E_y、E_z), a generator (G ₁、G₂) and a discriminator (D ₁、D₂).

Step S32, designing a loss function of training image coding and reconstruction.The calculation formula of (2) is as follows:

Where x represents a glyph image, G ₁(S₁(E_x (x))) represents an image encoded and reconstructed image, λ _rec represents a weight, E _x represents an encoder E _x,S₁ represents a shared layer S ₁,G₁, and generator G ₁,|| ||₁ is operated with absolute value.

And step S33, designing a loss function for removing the subnet by training handwriting effect.The calculation formula of (2) is as follows:

Wherein, Indicating a loss of the characteristic(s),Indicating that the word effect removes the pixel loss,Meaning that the word effect removal condition is against the loss of resistance, λ _dfeat,λ_dpix and λ _dadv are the weights of each of the three parts.

Where x represents a glyph image, y represents a word effect image, E _x and E _y represent two encoders E _x and E _y,S₁ represent that the shared layer S ₁,|| ||₁ is an absolute value taking operation.

The word effect removal pixel loss is a word effect removal image that directs the generator to generate a more similar content glyph image,The calculation formula of (2) is as follows:

Where G ₁ denotes a generator G ₁, y denotes a word effect image, G ₁(S₁(E_y (y))) represents a character effect removed image, the absolute ₁ is an absolute value operation.

The word effect removal condition is used to improve the quality of the generated result, enable the generator to learn to "fool" the arbiter, also enable the arbiter to distinguish between the real image and the generated image, employ the condition in WGAN-GP against the loss,The calculation formula of (2) is as follows:

where lambda _gp is the weight of the gradient penalty, Is the result of the fusion of the font image and the generated word effect removing image,Representation ofD ₁ represents the arbiter D ₁.The calculation formula of (2) is as follows:

Where ε represents the fusion weight parameter.

And step S34, designing a loss function of the training effect migration subnet.The calculation formula of (2) is as follows:

Wherein, Representing a word effect migrate pixel loss,Representing the word effect migration condition against the loss of resistance, lambda _spix、λ_sadv is the weight of each of the two parts.

The effect migrate pixel loss is to instruct the generator to generate an image that is closer to the target effect image,The calculation formula of (2) is as follows:

Where x represents a glyph image, y represents a reference image corresponding to x and having a target effect, z represents a target effect image, E _x represents an encoder E _x,E_z represents a style encoder E _z,S₁ and S ₂ represent two shared layers S ₁ and S ₂,G₂ represent an effect image generator G ₂,G₂(concat(S₁(E_x(x)),S₂(E_z (z))) respectively) represents an effect migration generated image.

Word effect migration condition combat penaltyAlso, using the design in WGAN-GP, the calculation formula is as follows:

where lambda _gp is the weight of the gradient penalty, Is the result of the fusion of the standard word effect image and the generated word effect image,Representation ofD ₂ represents the arbiter D ₂.The calculation formula of (2) is as follows:

Where ε represents the fusion weight parameter.

Further, step S4 includes the steps of:

Step S41, two pairs of images with the same word effect are randomly selected from the data set, wherein the font image and the word effect image of the first image pair are respectively marked as A _s and A _e, and the font image and the word effect image of the second image pair are respectively marked as B _s and B _e.

Step S42, training image coding and reconstruction. Input glyph image A _s, through encoder E _x, shared layer S ₁, and generator G ₁, generates an encoded and reconstructed imageAnd calculates the loss in step S32, calculates the gradient of each parameter in E _x、S₁ and G ₁ using the back propagation method, and updates the parameters using Adam optimization method.

And S43, training word effect to remove the discrimination network of the subnet. Input word effect image A _e, obtaining word effect removed image through encoder E _y, shared layer S ₁ and generator G ₁ Calculating word effect removal conditions against resistance lossThe gradient of each parameter in the arbiter D ₁ is calculated using the back propagation method and the parameters are updated using Adam optimization.

And S44, training word effect removing the generation network of the sub-network. Input word effect image A _e, obtain feature S ₁(E_y(A_e) through encoder E _y, shared layer S ₁), and obtain word effect removed image through generator G ₁ Input glyph image a _s, feature S ₁(E_x(A_s is obtained by encoder E _x, shared layer S ₁). Calculating a total loss function of a word-effect removal subnetThe gradient of each parameter in E _y、S₁ and G ₁ was calculated using the back-propagation method and the parameters were updated using the Adam optimization method.

And step S45, training the handwriting migration subnetwork in two stages, wherein each stage is used for training the discrimination network and the generation network of the handwriting migration subnetwork. And judging the first stage of the network of the word effect migration subnet. Input font image A _s, obtained by encoder E _x, shared layer S ₁ Input word effect image B _e, obtain characteristics through encoder E _z, sharing layer S ₂ Will beAndConnecting the inputs as generator G ₂ in the channel dimension, generating a word-effect migration imageCalculating word effect migration conditions against resistance lossThe gradient of each parameter in the arbiter D ₂ is calculated using the back propagation method and the parameters are updated using Adam optimization. And a first stage of generating a network for training the handwriting migration subnetwork. The input image is the same as the training discrimination network, and the character effect migration image is generatedComputing total loss function of word effect migration subnetworkThe gradient of each parameter in E _x、E_z、S₁、S₂ and G ₂ was calculated using the back-propagation method and the parameters were updated using the Adam optimization method.

And S46, training a second stage of the network of the word effect migration subnet. Input font image B _s, obtained by encoder E _x, shared layer S ₁ Inputting the word effect migration image generated in step S45Obtaining features by encoder E _z, shared layer S ₂ Will beAndConnecting the inputs as generator G ₂ in the channel dimension, generating a word-effect migration imageCalculating word effect migration conditions against resistance lossThe gradient of each parameter in the arbiter D ₂ is calculated using the back propagation method and the parameters are updated using Adam optimization. And a second stage of generating a network for training the handwriting migration subnetwork. The input image is identical to the training discriminator, and a word effect migration image is generatedComputing total loss function of word effect migration subnetworkThe gradient of each parameter in E _x、E_z、S₁、S₂ and G ₂ was calculated using the back-propagation method and the parameters were updated using the Adam optimization method.

Further, step S5 is implemented as follows:

The above is a preferred embodiment of the present invention, and all changes made according to the technical solution of the present invention belong to the protection scope of the present invention when the generated functional effects do not exceed the scope of the technical solution of the present invention.

Claims

1. A method for word effect migration based on a loop generation countermeasure network, comprising the steps of:

S5, inputting the new target character effect image and the character pattern image to be migrated into a trained character effect style migration network, and outputting an image with the target character effect after character effect migration;

the step S1 is specifically implemented as follows:

Wherein (i, j) represents the position of the pixel;

the step S2 is specifically implemented as follows:

Step S22, designing a generation network and a discrimination network of a word effect migration subnet, and respectively generating probabilities that a word effect migration image and a discrimination image are real images by using the designed network;

the step S21 is specifically implemented as follows:

Step S213, designing a shared layer S ₁,S₁ to be a network with a Unet structure; features of size C.times.H.times.W obtained in step S212AndAll are used as the input of the sharing layer S ₁, the input f ₀,f₀ is sequentially processed by three convolution blocks which are the same as those in S212 to respectively obtain the characteristic f ₁ with the size of C multiplied by H/2 multiplied by W/2, the characteristic f ₂ with the size of C multiplied by H/4 multiplied by W/4 and the characteristic f ₃;f₃ with the size of C multiplied by H/8 multiplied by W/8, and the like, and the characteristic f ₁ with the size of C multiplied by H/2 multiplied by W/2, the characteristic f ₂ with the size of C multiplied by H/8 multiplied by W/8 is obtained by the activating layer, Deconvolution blocks composed of a deconvolution layer, a normalization layer and an attention module in sequence obtain a characteristic f ₄ of C multiplied by H/4 multiplied by W/4; The activation layer in the deconvolution block adopts a ReLU activation function, the deconvolution layer uses deconvolution with a convolution kernel of 4 multiplied by 4, a step length of 2 and a filling of 1, the normalization layer uses an example for normalization, and the attention module uses a convolution attention module (CBAM); the feature f ₂ and the feature f ₄ are spliced in the channel dimension to obtain a feature with the size of 2C x H/4 x W/4, and the feature is subjected to deconvolution to obtain a feature f ₅ with the size of C x H/2 x W/2; The feature f ₁ and the feature f ₅ are spliced in the channel dimension to obtain a feature with the size of 2C multiplied by H/2 multiplied by W/2, and the feature is subjected to deconvolution to obtain a feature f ₆ with the size of C multiplied by H multiplied by W; the feature f ₀ and the feature f ₆ are spliced in the channel dimension to obtain an output feature f _s with the size of 2C multiplied by H multiplied by W; When the input features areThe output characteristic f _s is then noted asWhen the input features areThe output characteristic f _s is then noted as

Step S215, designing a discrimination network of a word effect removal subnet, wherein the discrimination network consists of a discriminator D ₁, and the discriminator D ₁ adopts a structure of a Markov discriminator in PatchGAN; the discriminator D ₁ consists of 5 convolution blocks which are sequentially composed of a convolution layer, an instance normalization layer and an activation layer, wherein the convolution layer uses convolution with a convolution kernel of 4 multiplied by 4, a step length of 2 and a filling of 1, the normalization layer uses instance normalization, and the activation layer uses LeakyReLU activation functions; the inputs to arbiter D ₁ are A _e andThe result obtained by splicing in the channel dimension, namely a false sample or the result obtained by splicing A _e and A _s in the channel dimension, namely a real sample, is output as an N multiplied by 1 tensor, wherein N=32, and the probability and the calculation loss of the image as the real image are judged by the average value of the tensor;

The step S22 is specifically implemented as follows:

Step S225, designing a discrimination network of the word effect migration sub-network, wherein the discrimination network consists of a discriminator D ₂, a discriminator D ₂ adopts the structure of a Markov discriminator in PatchGAN, and the structures of a discriminator D ₂ and a discriminator D ₁ in S215 are the same; the input to arbiter D ₂ is the word effect migration image generated by A _s、B_e and generator G ₂ Splicing in the channel dimension, namely a false sample, or splicing in the channel dimension, namely a real sample, of the A _s、B_e and the standard word effect migration image A _e, wherein the output is an N multiplied by 1 tensor, N=32, and the probability and the calculation loss of the image as the real image are judged through the mean value of the tensor;

The step S3 is specifically implemented as follows:

where x represents a glyph image, G ₁(S₁(E_x (x))) represents an image encoded and reconstructed image, λ _rec represents a weight, E _x represents an encoder E _x,S₁ represents a shared layer S ₁,G₁ represents a generator G ₁,||||₁ that is an absolute value taking operation;

Wherein x represents a font image, y represents a word effect image, E _x and E _y represent two encoders E _x and E _y,S₁ represent that the shared layer S ₁,||||₁ is an absolute value taking operation;

wherein epsilon represents a fusion weight parameter;

The step S4 is specifically implemented as follows:

2. The method for word effect migration based on loop generation countermeasure network according to claim 1, wherein the step S5 is specifically implemented as follows: