CN111667006A

CN111667006A - Method for generating family font based on AttGan model

Info

Publication number: CN111667006A
Application number: CN202010508917.8A
Authority: CN
Inventors: 王存睿; 张煜
Original assignee: Dalian Minzu University
Current assignee: Dalian Minzu University
Priority date: 2020-06-06
Filing date: 2020-06-06
Publication date: 2020-09-15

Abstract

A method for generating a family font based on an AttGan model belongs to the technical field of computer graphic processing. The method for generating the family font based on the AttGan model is based on a coder-decoder structure, reasonably uses attribute classification constraint, reconstruction and counterstudy, finally can better modify a plurality of interested attributes by using one model and keep other regions unchanged, adds an attribute classification constraint layer through the AttGan network model, modifies parameter attribute values such as different character weights, character heights and the like in the family font, realizes smooth interpolation between different fonts and generates a transition font image for controlling font styles. And determining the font generation style through the parameter attribute value, thereby constructing a complete font family. The family font generation method can greatly reduce the burden of designers, reduce the cycle time of design, help the designers to solve the problem of a large amount of repetitive work, and improve the work efficiency.

Description

Method for generating family font based on AttGan model

Technical Field

The invention relates to a method for generating a family font interpolation, in particular to a method for generating a family font based on an AttGan model, belonging to the technical field of computer graphic processing.

Background

At present, many researches are carried out in China on Chinese Character Recognition (Chinese Character Recognition), and a high effect is achieved along with the application of deep learning. However, in the aspect of Chinese character generation, much research is done at home and abroad, and the research on the aspect has considerable challenges because Chinese characters have various styles, complex font structures and huge number of Chinese characters. And therefore much of the literature on algorithmic research on font generation has focused on english and latin characters. Because the number of English and Latin is many times smaller than that of Chinese characters, and the change of font style is much smaller than that of Chinese characters. Font generation is always a more active research direction, and because of a complex structure and a large number of Chinese characters, the difficulty and complexity of Chinese character font generation are higher compared with English font generation.

In recent years, parametric models based on Generative Adversarial Networks (GANs) have been popular and have achieved remarkable performance. However, the entire production process of the GANs model is uncontrollable and unpredictable, the detail recognition process is not good, and fuzzy and ghost pseudo images are usually contained in the produced font. Inspired by the success of GANs in generating tasks, the "pix 2 pix" framework learns the mapping from input to output images using conditional generation. In particular, in the face generation, models based on GANs are used to generate new images by controlling parameters, such as face images with controllable hair color, age, gender, and the like. The AttGan model is primarily a human facial attribute editing task that is aimed at generating new images by manipulating single or multiple attributes of interest (e.g., hair color, expression, beard, and age). The method makes remarkable progress on the traditional tasks of face recognition, large-scale marking data set and face attribute prediction.

The design threshold of the font family is high, although part of the designers with capability can independently complete the design of a set of fonts, if the complete design of the font family is difficult to complete, the font family is a product produced for typesetting, and different blocks such as large labels, side labels, texts, emphatics, references, annotations and the like are laid in the typesetting. The characters have different appearances and are divided into different levels, so that the page layout can be achieved. How to generate the font family quickly and auxiliarily by using the computer technology becomes a problem to be solved urgently.

Disclosure of Invention

In order to solve the problems in the prior art, the application provides a method for generating a family font based on an AttGan model. Corresponding parameters are added between codes of different font images in the font family to carry out interpolation control on the generation of the font style, so that an intermediate font image is obtained. Inputting a plurality of samples including fonts of different styles of a fine data set sample and a coarse data set sample, after model training is finished, through controlling attribute labels of the fonts such as word weight, word height, word width, middle palace and the like, using the attribute labels to unsupervised and edit font images, using reconstruction errors, label classification errors and resistance loss in a training stage to realize re-editing of corresponding font attributes of the labels, enabling parameter attribute values to determine font generation styles, and constructing a complete font family.

The technical scheme adopted by the invention is as follows: a method for generating a family font based on an AttGan model comprises the following steps:

step 1, preparation and pretreatment of a data set: and processing the fine data set sample and the coarse data set sample of the target font to generate a fine data set sample picture and a coarse data set sample picture, and normalizing to 256 × 256.

And 2, marking partial parameter attribute values of the fine data set sample picture and the coarse data set sample picture, marking the residual parameter attribute values by adopting a semi-supervised learning mode, and constructing an attribute classifier C.

Step 3, adopting an encoder G_encAnd decoder G_decTwo basic sub-networks are matched with an attribute classifier C and a discriminator D, and an adaptive Loss and a Reconstruction Loss are used as Loss functions to construct a training model based on the AttGan network.

And 4, sequentially inputting the fine data set sample picture and the coarse data set sample picture into a training model based on an AttGan network, training, optimizing and adjusting parameters of the model, and generating a transition font image of a target font style through interpolation by modifying the parameter attribute value of the font in the picture.

And 5, calculating the similarity between the original image of the target font and the image generated after interpolation through cosine similarity, and further explaining the uniformity of the font family.

And 6, combining the transition font image generated in the step 4 with the original image to obtain a complete family font image set of the GB2312 font library, and performing vectorization operation on the family font image set to generate a computer font file.

In step 4, the fine data set sample picture and the coarse data set sample picture are sequentially input into an encoder G_encMiddle, encoder G_encThe method comprises the following steps of (1) including 5 down-sampling layers, wherein each layer is subjected to convolution, batch normalization, LeakyReLu activation and coding to obtain a vector; connecting the vector obtained by coding with the font parameter attribute value embedded vector to obtain a 64-dimensional embedded vector, and sending the 64-dimensional embedded vector to a decoder G_decDecoder G_decThe method comprises 5 upsampling layers, wherein each layer is subjected to deconvolution, batch normalization and ReLu activation, and finally a transition font image of a target font style is output. The encoder G_encAnd decoder G_decAnd symmetric hopping connection using U-Net.

The discriminator D adopts a convolution neural network structure, and the discriminator D inputs a real target font image and a decoder G_decAnd outputting an image generated after interpolation, wherein the discriminator D adopts 5 cascaded Conv-LN/IN-LeakyReLu networks and a two-layer fully-connected neural network structure.

The attribute classifier C and the discriminator D share all convolutional layers.

In the font parameter attribute value embedding part, a CNN-style encoder is used for converting the selected font image into a 64-dimensional parameter attribute value embedding vector, all the images are connected along the depth and input into the style encoder to train a training model, the font images are synthesized by predicting the attribute values, smooth interpolation between different fonts is realized, and a transition font image of a target font style is generated. The training model is optimized by using an Adam optimizer, wherein Adam parameters are set to be β 1= 0.5, β 2= 0.999, and lr = 0.0002.

The parameter attribute value refers to a common attribute of the font, and preferably, the parameter attribute value includes Serif, Cursive, Display, Italic, Strong, Thin and Wide of the font.

Further, to verify the performance of the model in the present application on font family generation. And converting the input fine data set sample and the input coarse data set sample into pictures, normalizing the pictures, adjusting the sizes to be 256 × 256, converting the pictures into binary images, and ensuring that the positions and the sizes of the final Chinese characters in the original distribution and the specific distribution are consistent. When a font image is input to a neural network, in order to eliminate adverse effects caused by singular sample data, firstly, normalization processing is carried out on the input image, the gray value of picture data is mapped into a range of 0-255, a certain amount of noise is added to the data after the normalization processing, and then the data are input into a model.

Further, in step 2, the problem of property editing can be defined as a learning process of the encoder and decoder. According to the marked pictures, learning to label the unmarked pictures, wherein the process is unsupervised and aims to provide the original pictures with labels

Upper edit, it is desired to produce a real image with b-attributes. To achieve this goal, an attribute classifier C is used to limit the generation

The desired attribute b can be correctly obtained.

Further, a picture with n attributes

Wherein a = [ a1 … …. an =]After passing through the encoder, become

Further will be

Is converted into a target property b, then this target property b = [ b1 …. bn is required]Then, the decoder is used, as shown in the following formula:

the generation of the graph in the attribute classifier section should predict that the new attribute b is correctly obtained. Thus, an attribute classifier is deployed to restrict it from obtaining the desired attributes, and the tagged style embedding vector θ and its relationship are as follows:

further, in step 3, in the training model, an encoder and a decoder architecture are used as generators, the encoder inputs a plurality of font pictures with different styles, the size of the font pictures is 256 × 256, and the encoder G is used for encoding the font pictures_encThe method comprises the following steps of (1) including 5 down-sampling layers, wherein each layer is subjected to convolution, batch normalization, LeakyReLu activation and coding to obtain a vector; connecting the vector obtained by encoding with the font style embedded vector with the label in the step two, wherein the label of each attribute is a scalar which is equivalent to extending the scalar into a tensor, and the tensor is extended into a decoder G_decThe size of each layer of the image is 64-dimensional, the attribute value of each image parameter comprises Serif, Curive, Display, Italic, Strong, Thin, Wide and the like, so that the network can better predict the font and combine with the target parameter style to generate a new font during training. The vectors are then fed to a decoder G_decDecoder G_decDuring training, the optimizer uses adam with parameters set to β 1= 0.5, β 2= 0.999, and lr = 0.0002.

Further, in order to reduce neural network parameters, the attribute classifier and the discriminator share all convolutional layers; in order to reduce the character loss of the font in the encoding process, a symmetrical jump connection of U-Net is used between an encoder and a decoder. And 3, adopting a convolutional neural network structure, adopting 5 cascaded Conv-LN/IN-LeakyReLu network structures and finally adopting two layers of fully connected neural networks by the discriminator IN the step 3 and the attribute classifier constructed IN the step two. The discriminator inputs the real font image and the font image generated after predicting and changing the parameters, distinguishes the difference, and uses the adaptive Loss measurement as formula 1

（1）

Furthermore, assuming that the reference font obeys distribution, the generator generates noise through the game with the discriminator D, and in the training stage of the model, the generator tries to generate a real result to deceive the discriminator, and the discriminator aims to distinguish the difference between the generated result and the real result;

representing the discriminant penalty of the discriminator for the generated image from the real image.

Further, in order to accurately describe the similarity degree of the pixel space of the generated image and the real image, a pixel matching loss function is introduced, as shown in formula 2:

。

further, in step 5, the digital image contains more features, which is the range of application of the cosine similarity algorithm, so that simple comparison is performed on the generated font image by using the cosine similarity, where u and v represent the article A, B, n (u) represents a feature attribute set of u positive feedback, and n (v) is a feature attribute set of v positive feedback, respectively, in the cosine similarity formula shown in formula 3.

（3）。

Also further explaining the effectiveness of the family font generated by the interpolation method, the similarity between the original image and the image generated after interpolation is calculated by cosine similarity, and the manifold vector can be represented by an n-dimensional vector formula, as shown in formula 4, the given picture feature vector is

，

（4）

The invention has the beneficial effects that: the method for generating the family font based on the AttGan model is based on a coder-decoder structure, reasonably uses attribute classification constraint, reconstruction and counterstudy, finally can better modify a plurality of interested attributes by using one model and keep other regions unchanged, adds an attribute classification constraint layer through the AttGan network model, modifies parameter attribute values such as different character weights, character heights and the like in the family font, realizes smooth interpolation between different fonts and generates a transition font image for controlling font styles. And determining the font generation style through the parameter attribute value, thereby constructing a complete font family. The family font generation method can greatly reduce the burden of designers, reduce the cycle time of design, help the designers to solve the problem of a large amount of repetitive work, and improve the work efficiency.

Drawings

FIG. 1 is a flow chart of a method for generating a family font based on an AttGan model.

Fig. 2 is a network configuration diagram of a method of generating a family font based on the AttGan model.

FIG. 3 is the effect diagram of the English font generation in the embodiment.

FIG. 4 is a diagram of effects generated by Chinese fonts in the embodiment.

Detailed Description

The technical solutions in the embodiments of the present invention will be described clearly and completely with reference to the accompanying drawings in the embodiments of the present invention, and it should be understood that the described examples are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present invention without making any creative effort, shall fall within the protection scope of the present invention.

Fig. 1 shows a flowchart of a method for generating a family font based on an AttGan model, and fig. 2 shows a network structure diagram of a method for generating a family font based on an AttGan model, wherein the method for generating a family font comprises the following specific implementation steps:

firstly, converting a bold data set and a fine data set of a target font provided by a designer into a picture format, adjusting the size of the picture format, uniformly scaling the picture format to 256 × 256 sizes, adding a certain amount of noise into the normalized data, and inputting the normalized data into an encoder G_encIn (1).

An AttGan network based training model was constructed (as shown in fig. 2). In training the neural network, the font image is input to the encoder G_encIn the method, an image y having a target font style is generated, and the target font image and the generated font image are input to a discriminator to discriminate authenticity and calculate a loss function. As shown in equation (1). It is desirable for the discriminator that the larger the probability that a font image generated by the network is discriminated as false, the better. And the larger the likelihood that a font image that the generating network wishes to generate is discriminated as true, the better. Therefore, a network minimization loss function is generated, a network maximization loss function is judged, and network parameters are adjusted.

Calculating the discrimination loss between the generated image and the real image by the discriminator as shown in the following formula:

。

calculating pixel matching loss function, calculating y and target font L₁The distance is shown as follows:

。

calculating a font reconstruction loss function, and calculating the font reconstruction loss function through a Sigmoid cross entropy loss function as shown in the following formula:

。

and finally, carrying out weighted summation on the 3 errors, wherein the total loss function L is as follows:

L=λ ₁ L _rec+λ ₂ L _pixel+L _advd

wherein the content of the first and second substances,λ ₁=100，λ ₂=10。

after the network training is finished, inputting different data sets and the style embedded vectors added with the labels into a generator of the network, continuously iterating the training process, predicting font images generated after parameters are changed, and generating final images by distinguishing differences between a discriminator and a source image through an attribute classifier.

Furthermore, the verification part selects the same one hundred character sets as the input of the neural network, the character sets are generated into two types of font images of thin font and thick font, the font images of the thin font are firstly input into a trained model to generate font images of the thick font style, a conventional font image is generated through interpolation, and the average value of the cosine similarity of the generated image and the original font image is used as the final similarity.

In this embodiment, an english character data set experiment and a chinese character data set experiment are performed respectively, an english character data set is generated into two font images of a fine font and a bold font, the two font images are normalized to 256 × 256 sizes, the two font images are sequentially input into a training model based on an AttGan network, a clear english font image (as shown in fig. 3) is generated after 1000 iterations, and similarly, a chinese character data set is processed into two font images of a fine font and a bold font, and the two font images are sequentially input into the training model, so as to obtain a generated effect graph of a chinese character set (as shown in fig. 4).

The method for generating the family font based on the AttGan model reasonably uses attribute classification constraint, reconstruction and counterstudy based on an encoder-decoder structure, can finally better modify a plurality of interested attributes by using one model and keep other regions unchanged, adds an attribute classification constraint layer through the AttGan network model, realizes smooth interpolation between different fonts by modifying parameter attribute values such as different character weights, character heights and the like in the family font, generates a transition font image for controlling the font style, and ensures that the parameter attribute values determine the font generation style, thereby constructing a complete font family.

Claims

1. A method for generating a family font based on an AttGan model is characterized by comprising the following steps:

step 1, preparation and pretreatment of a data set: processing the fine data set sample and the coarse data set sample of the target font to generate a fine data set sample picture and a coarse data set sample picture, and normalizing to 256 × 256 sizes;

step 2, marking partial parameter attribute values of the fine data set sample picture and the coarse data set sample picture, marking residual parameter attribute values by adopting a semi-supervised learning mode, and constructing an attribute classifier C;

step 3, adopting an encoder G_encAnd decoder G_decTwo basic sub-networks are matched with an attribute classifier C and a discriminator D, and an adaptive Loss and a Reconstruction Loss are used as Loss functions to construct a training model based on the AttGan network;

step 4, sequentially inputting the fine data set sample picture and the coarse data set sample picture into a training model based on an AttGan network, training, optimizing and adjusting parameters of the model, and generating a transition font image of a target font style through interpolation by modifying parameter attribute values of fonts in the pictures;

step 5, calculating the similarity between the original image of the target font and the image generated after interpolation through cosine similarity, and further explaining the uniformity of the font family;

2. The method of claim 1, wherein the fine data set sample picture and the coarse data set sample picture are sequentially inputted to the encoder G in step 4_encMiddle, encoder G_encThe method comprises the following steps of (1) including 5 down-sampling layers, wherein each layer is subjected to convolution, batch normalization, LeakyReLu activation and coding to obtain a vector; connecting the vector obtained by coding with the font parameter attribute value embedded vector to obtain a 64-dimensional embedded vector, and sending the 64-dimensional embedded vector to a decoder G_decDecoder G_decThe method comprises the following steps of (1) including 5 upsampling layers, wherein each layer is subjected to deconvolution, batch normalization and ReLu activation, and finally a transition font image of a target font style is output;

the encoder G_encAnd decoder G_decAnd symmetric hopping connection using U-Net.

3. The method for generating family fonts based on the AttGan model as claimed in claim 1, wherein the discriminator D adopts a convolutional neural network structure, and the discriminator D inputs the real target font image and the decoder G_decThe output image generated after interpolation, the discriminator D adopts 5 cascaded Conv-LN/IN-LeakyReLu networks and two layers of fully connected neural network structures;

4. The method of claim 2, wherein in the font parameter attribute value embedding part, a CNN-style encoder is used to convert the selected glyph image into a 64-dimensional parameter attribute value embedding vector, all the images are connected along the depth, input to a style encoder, train the training model, synthesize the glyph image by predicting the attribute values, implement smooth interpolation between different fonts, and generate a transition glyph image of the target font style;

the training model is optimized by using an Adam optimizer, wherein Adam parameters are set to be β 1= 0.5, β 2= 0.999, and lr = 0.0002.

5. The method of claim 1, wherein the parameter attribute values comprise Serif, Cursive, Display, Italic, Strong, Thin, and Wide of the font.