CN110443864B - Automatic artistic font generation method based on single-stage small-amount sample learning - Google Patents
Automatic artistic font generation method based on single-stage small-amount sample learning Download PDFInfo
- Publication number
- CN110443864B CN110443864B CN201910670478.8A CN201910670478A CN110443864B CN 110443864 B CN110443864 B CN 110443864B CN 201910670478 A CN201910670478 A CN 201910670478A CN 110443864 B CN110443864 B CN 110443864B
- Authority
- CN
- China
- Prior art keywords
- style
- font
- artistic
- network model
- local
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T11/00—2D [Two Dimensional] image generation
- G06T11/001—Texturing; Colouring; Generation of texture or colour
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T11/00—2D [Two Dimensional] image generation
- G06T11/20—Drawing from basic elements, e.g. lines or circles
- G06T11/206—Drawing of charts or graphs
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Biomedical Technology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Processing Or Creating Images (AREA)
- Image Processing (AREA)
Abstract
The invention discloses an automatic generating method of artistic fonts based on single-stage small-amount sample modeling learning, which comprises the steps of establishing a network model (AGS-Net), pre-training the network model (AGS-Net) aiming at the existing complete synthesized artistic font word library, and enabling the model to extract style characteristics from style reference set input, extract content characteristics from content reference set input and synthesize stylized characters of specified style and content; fine-tuning the network model AGS-Net by using an artistic font word library designed by a designer with only a small number of samples; and generating a complete artistic special effect font library through the trained network model AGS-Net. The network model of the invention has smaller scale and parameter quantity. By adopting the technical scheme of the invention, the method can be expanded on any language system, is not limited to specific languages, and can achieve the optimal automatic synthesis effect of artistic fonts.
Description
Technical Field
The invention relates to an automatic generating system of artistic special effect fonts based on a small number of samples, which respectively models the shapes and texture styles of the artistic special effect fonts through an artificial neural network and realizes style migration by utilizing a data training model, and belongs to the fields of artificial intelligence and computer vision.
Background
With the rapid development of computer technology and mobile internet, computer word libraries, especially artistic word libraries, are more and more common in daily life. However, the design and production of these word libraries are mainly completed by professional manufacturers. Although many print font libraries are available at present, people have an increasing demand for more personalized artistic fonts. But designing a set of artistic fonts is more time consuming and labor intensive due to the complexity of the set compared to ordinary printed fonts. In recent years, the development of artificial intelligence technology makes it possible for a computer to automatically complete font design and font library generation.
However, the related art of designing and making a complete word stock using only a small number of samples is not yet sophisticated. For languages with a relatively small number of characters (e.g., english), it is relatively easy to design a set of fonts that are stylistically consistent with a given sample. But for languages with a large number of characters (e.g., chinese), it becomes very time consuming to do this manually. In addition, taking Chinese as an example, many characters contain shapes and structures which are very responsible, and in addition to artistic styles with various changing forms, manual design can hardly ensure consistency of the shapes, the structures, the textures and other styles.
Currently, many scholars attempt to automatically generate word stocks using artificial intelligence techniques. The literature (sheet baluja. learning type style. CoRR, abs/1603.04000,2016.) is one of the early English word stock generation methods based on the deep learning method. The literature (Yue Jiang, Zhouhui Lian, Yingmin Tang, and Jianguo Xiao. Scfont: Structure-defined Chinese font generation video decoder stack networks.2019.) proposes an end-to-end multi-stage stack model to synthesize a high-quality Chinese handwritten word library. However, both methods cannot be applied to the generation of artistic special effect style fonts, and can only be applied to printing style fonts. The document (Samaneh Azadi, Matthew Fisher, Vladimir G Kim, Zhaowen Wang, Eli Shechman, and Trevor Darrell. Multi-content GAN for now-shot font transfer. in Proceedings of the IEEE CVPR, pages 7564 and 7573,2018.) proposes an innovative Multi-stage model MC-GAN (Multi-content GAN) for generating a small number of sample artistic fonts, but the method can only be applied to 26 English capital letters and cannot be used on language systems containing more characters such as Chinese and the like; and the parameter quantity of the model is very large and is difficult to train.
Disclosure of Invention
In order to overcome the defects of the prior art, the invention provides an art font automatic generation method and system based on single-stage small-amount sample modeling learning, and by using the system, a user only needs to provide a small amount of stylized reference samples (for example, only 5 letter reference pictures are needed in English, and only 30 Chinese character reference pictures are needed in Chinese) with the same art special effect style, and specifies the characters needing to be generated, so that the system can automatically generate the specified characters with the style, and further can generate all other stylized characters in the whole word stock. The method solves the two problems that the existing multi-stage model MC-GAN can only be applied to 26 English capital letters and can not be used on language systems such as Chinese characters and the like, the parameter quantity of the model is very large and the training is difficult, and realizes better generation effect.
The technical scheme adopted by the invention is as follows:
a artistic font automatic generation method based on single-stage small-amount sample modeling learning is characterized in that a network model (AGS-Net) is established, and a network model (AGS-Net) is pre-trained aiming at an existing complete synthesized artistic font word library, so that the model can extract style characteristics from style reference set input, extract content characteristics from content reference set input and synthesize stylized characters of a specified style and content; fine-tuning the network model AGS-Net by using an artistic font word library designed by a designer with only a small number of samples; and generating a complete artistic special effect font library through the trained network model AGS-Net.
The automatic generating method of the artistic special effect font mainly comprises the following steps:
the first step is as follows: establishing a network model AGS-Net; the method mainly comprises the following operations:
1) defining characters with artistic special effect fonts, and decomposing the characters into styles and contents;
2) establishing a mapping function, and converting a picture (content reference picture) x containing contentcAnd a set X of style reference pictures containing a plurality of styles (same style)s={x1,x2,x3… mapping to stylized glyph picture y such that the mapped picture y contains xcContent of (A) and XsThe style of (1);
3) designing a deep neural network model AGS-Net, realizing the extraction of character style characteristics and character content characteristics and the generation of stylized font pictures;
4) and designing network Loss functions of the deep neural network model AGS-Net, wherein the network Loss functions comprise a confrontation Loss function, an L1 Loss function, a context Loss function Contextual Loss and a local texture fine-tuning Loss function.
In the step 1), defining characters of artistic special effect fonts and decomposing the characters into styles and contents; the styles of the artistic special effect font characters comprise a font style and a color texture style, wherein the font style is the printing font styles of the stroke thickness, the lining line and the like of the characters, and the color texture style represents the stroke color and the texture change style of the characters; the content of the characters of the artistic special effect font indicates which letter or Chinese character the characters are.
Establishing a mapping function in the step 2), which comprises the following operations:
the generation task is defined as a mapping function. Mapping function refers a piece of content to picture xcStylized picture reference set X composed of reference pictures with same styles={x1,x2,x3…, mapped to a stylized glyph image y, the mapped y containing the specified content (x)cContent) and style (X)sIs the desired target result).
Content reference picture x usedcIs a standard black and white picture with little style information (e.g., an axerial font in english). The reason for using a stylized picture reference set of identical stylized reference pictures instead of a single stylized picture is that each stylized character picture is made up of its content and style, requiring decoupling of the content and style in a manner.
Given a reference set of stylized pictures, the common features of these pictures, namely style information, are extracted by establishing a mapping function, and the content information is ignored. For an artistic font style, style reference set Rs={r1,r2,…,rnThere are n stylized pictures, n being small (a few stylized pictures, in this embodiment, n is 5 in english, and n is 30 in chinese). In one forward propagation of the network model,from RsRandomly selecting m (m)<n) pictures as stylized picture reference set Xs,And inputting the data into the deep neural network model.
Step 3) realizing the mapping of the step 2) by establishing a deep neural network model AGS-Net;
based on the mapping function established in the step 2), the invention realizes the mapping function by establishing the deep neural network AGS-Net so as to realize the extraction of style and content characteristics and the generation of stylized pictures. The network model AGS-Net established by the invention is based on generation of a countermeasure network and comprises a generator G and three discriminators (shown in figure 2). The generator contains two encoders and two decoders. The two encoders have almost the same structure, are respectively responsible for the encoding of styles and contents, and are composed of a plurality of convolutional layers. The input of the style encoder is several style pictures randomly selected from a style reference set; the input to the content encoder is a stylish (e.g., Arial font in english or bold in chinese) character picture. For both decoders, one responsible for glyph style synthesis and the other responsible for color texture style synthesis, consists of multiple upsampled convolutional layers. In both decoders we use skip connections (fig. 3). The structure of the glyph style decoder is similar to the method in the literature (Phillip Isola, et. al. image-to-image transformation with conditional adaptation network. in Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1125-1134, 2017.), and the input of each layer is the output of the previous layer and the concatenation of the layer characteristics corresponding to two encoders. Then, the font decoder outputs a font gray image ygrayTexture style decoders are structurally similar, but each layer input is additionally connected in series with corresponding layer features of a glyph style decoder, and the purpose of skip connection is to fully utilize features of different scales. At the end of the texture decoder, a convolution operation is performed after concatenation of the previous feature and the grayscale image. Discrimination of network model parts by three discriminators through cooperation of two decoders to ultimately obtain a stylized image y.Consists of the following components: shape discriminator DshaFor discriminating font gray image ygrayTrue or false; texture discriminator DtexResponsible for discriminating the genuineness of the stylized image y, and a local discriminator DlocalAnd the system is responsible for judging the truth of the local small block area of the stylized image y. Discriminating true or false means determining whether the input sample is a true sample in the data or a false sample generated by the model. The structure of the discriminator is composed of a plurality of layers of downsampled convolutional layers, and the output is a score map (score map) which represents the probability that each region of the input image is true. In the invention, the discriminator in a pix2pix model can be adopted as the discriminator.
Step 4) establishing a loss function of the deep neural network model AGS-Net, and training the model to obtain a trained model;
in the training process of the AGS-Net network model, the loss function of the network model mainly comprises 4 parts: a penalty for confrontation function, a L1 penalty function, a context penalty (Contextual Loss), and a local texture trim penalty function. The network model AGS-Net of the present invention applies countermeasure loss to the generated font gradation image and stylized image, respectively, based on generating a countermeasure network. The false samples against the loss are the images synthesized by the generator when the training characters are in the reference set RsIf so, the real sample of the discriminator is the real target (ground route) picture of the target stylized image y, otherwise, the real sample is the ground route picture of the target stylized image y from RsRandomly selects an image. To stabilize the training process, we used the L1 loss function, which was also applied to the glyph grayscale image and the stylized image, respectively. If the character is in the reference set, then there is a true target picture, the L1 penalty is not 0, otherwise 0 the scene penalty is proposed by Mechrez et al (Roey Mechrez, Itamar Talmi, and Lihi Zelnik-Manor.2018.the texture loss for image transformation with non-aligned data. in Proceedings of the ECCV.768-783.). Since L1 loss requires data spatial alignment, and scene loss does not, we will complement L1 loss. As with the L1 penalty, whether the scene penalty is 0 depends on whether the character is in the reference set. In addition, we propose local texture fine tuning loss to balance the number of true and false samples and focus on local texture generationAnd (5) effect. As shown in FIG. 4, we randomly cut out small blocks from the whole picture and input them to the local discriminator Dlocal. And taking small blocks cut from the stylized input picture as real samples, adding Gaussian (Gaussian) blur to the real small block samples to obtain false samples, and cutting the small blocks from the synthesized image to be also used as the false samples. In this way, we create a bridge between real and blurred samples, enabling the local arbiter to cause the generator to generate more real, less noisy and better detailed samples.
The second step is that: collecting or synthesizing an artistic special effect font database data set;
specifically, in this step, for the English font, the present invention uses the existing artistic special effect font proposed by MC-GAN (Samaneh Azadi, Matthew Fisher, Vladimir G Kim, Zhaowen Wang, Eli Shechtman, and Trevor Darrell. Multi-content GAN for the raw-shot font transfer. in Proceedings of the IEEE CVPR, pages 7564 and 7573,2018.) which contains the author synthesized artistic special effect font data set and the designer designed artistic special effect font data set; for Chinese font, the present invention utilizes black and white font with only font style to synthesize artistic font library by adding color and texture special effect to the font part. In addition, the invention collects the Chinese art special effect font data set.
The third step: the network model (AGS-Net) proposed by the invention is pre-trained on an artistic special effect font database, so that the network can preliminarily learn the mapping from a style reference set and a specified character to a target character. After pre-training, obtaining a preliminarily trained network model;
the fourth step: utilizing the preliminarily trained network model, aiming at a specially designed artistic special effect style font, carrying out fine adjustment on the network model on a small amount of samples to obtain a trained network model;
in order to stabilize the training process, the model training of the present invention is divided into two steps, pre-training (third step) and fine-tuning (fourth step). In the third step, the invention pre-trains the network model on the character picture collected and synthesized in the first step. The model generates all the font pictures, in English numberOn the set of data, the model generates a picture of 26 capital-letters special effect characters available, and the characters all have real target pictures. On the Chinese data set, the model generates 500 common Chinese character special effect font pictures, and the Chinese characters have real target pictures. In the pre-training process, all loss functions are not 0, and in the fourth step, the invention carries out fine adjustment on the network model aiming at the artistic special effect font. Using the pre-trained model in the second step, training the network model again, and generating all characters in the training process, wherein only a few characters are the stylized reference set RsThe characters in (1) have real target pictures.
The fifth step: and generating stylized characters with the style and the content by inputting a small number of sample pictures (such as 5 pictures in English and 30 pictures in Chinese) and a stylized character picture which are stylized by a professional design artistic special effect by using the trained model. In this step, the invention uses the trained network obtained in the fourth step to input a content picture and the reference set of the stroke-formatted picture in the fourth step into the model, so as to obtain the synthetic special effect font picture with the target content and style.
And a sixth step: and repeating the fifth step in the whole word stock, and inputting different content pictures each time to generate the whole artistic special effect font word stock. In this step, the fifth step is repeated. And inputting different content pictures and the style reference set which is the same as that in the fifth step into the network model to generate special effect font pictures with different contents, thereby obtaining the whole artistic special effect stylized word stock.
Compared with the prior art, the invention has the beneficial effects that:
the invention provides an automatic artistic font generation method based on a single-stage small amount of samples, which comprises the steps of establishing a network model (AGS-Net), pre-training the network model (AGS-Net) aiming at the existing complete synthesized artistic font word library, and enabling the model to extract style characteristics from style reference set input, extract content characteristics from content reference set input and synthesize stylized characters with specified style and content; fine-tuning the network model AGS-Net by using an artistic font word library designed by a designer with only a small number of samples; and generating a complete artistic special effect font library through the trained network model AGS-Net. The loss function of the model comprises a new loss function for optimizing the local texture special effect. The technical scheme of the invention can be expanded on any language system and is not limited to specific languages. The network model of the invention has smaller scale and parameter quantity. The invention achieves the best effect of automatically synthesizing the artistic fonts at present.
Drawings
Fig. 1 is a flow chart of an automatic generating method of art special effect fonts, which mainly includes six steps, the first step is: designing and establishing a network model AGS-Net, collecting or synthesizing an artistic special effect font database data set, and a third step: pre-training the network model (AGS-Net) provided by the invention on an artistic special effect font database data set to enable the network to preliminarily learn the mapping from the style reference set and the designated character to the target character, and fourthly: utilizing the preliminarily trained network model, aiming at an artistic special effect style font of a professional design, carrying out fine adjustment on the network model on a small amount of samples to obtain the trained network model, and carrying out the fifth step: and (5) inputting a small amount of professional design artistic special effect stylization by utilizing the trained model, and sixthly: and repeating the fifth step in the whole word stock, and inputting different content pictures each time to generate the whole artistic special effect font word stock.
FIG. 2 is a schematic structural diagram of a network model according to the present invention, which includes a generator and three discriminators, wherein the generator inputs a content reference diagram xcAnd a style reference input set XsThe outputs of the decoders are sent to discriminators for discrimination. Shape discriminator DshaFor discriminating font gray image ygrayTrue or false; texture discriminator DtexResponsible for discriminating the genuineness of the stylized image y, and a local discriminator DlocalAnd the system is responsible for judging the truth of the local small block area of the stylized image y, and the judgment of the truth represents that the input sample is a real sample in the data or a false sample generated by the model. The truth is represented by the output truth map. The structure of the discriminator is composed of a plurality of convolution layers. Arrows indicate data flow.
Fig. 3 is a schematic diagram of a skip-connection structure (skip-connection) of a generator in a network model structure adopted in the embodiment of the present invention, which shows that features of corresponding convolutional layers are combined and connected in series and input to a next layer.
Fig. 4 is a schematic diagram of operations involved in local arbiter and local texture fine tuning loss according to an embodiment of the present invention, including local arbiter and clipping patch, gaussian blur operation, and arrows indicating data flow.
FIG. 5 is a schematic of a composite data set of a pre-trained network; wherein, (a) is English data set, and (b) is Chinese data set.
FIG. 6 is a graph illustrating the comparison of the effects of different stylized reference set sizes and stylized input set sizes.
FIG. 7 is a graph comparing the impact of key modules of the network model used in the embodiments of the present invention.
FIG. 8 is a comparison of English stylized character images generated using the models of the present invention and a prior art method.
FIG. 9 is a comparison of a Chinese stylized character image generated using the model of the present invention with an existing model.
Detailed Description
The technical solutions in the embodiments of the present invention are clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it should be understood that the described examples are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The invention provides an automatic generating method of artistic fonts, which synthesizes a whole word stock by utilizing a small amount of stylized character pictures. The general flow chart of the process is shown in figure 1.
A. Network model structure
As shown in fig. 2, our network model consists of one generator and three discriminators (shape, texture, local discriminators, respectively). The generator consists of two encoders (content, style encoder) and two decoders (shape, texture decoder). As shown in fig. 3, the generator of the network model of the present invention utilizes a skip-connection structure (skip-connection) to ensure full utilization of content and style characteristics. Both encoders are identical except for the dimension of the first layer input, which is composed of 6 convolutional layers containing example normalized, non-linear functions. Both decoders consist of 6 upsampled convolutional layers (each layer containing an example normalization, a non-linear function), except for the last layer of the texture decoder, which is one more convolutional layer. Both the shape and texture discriminators generate a countermeasure network (PatchGAN) structure in small pieces, resulting in a truth map size of 14x14x 1. As shown in fig. 4, we propose a local discriminator to clip small blocks from a picture and apply gaussian blur to obtain real and false small block samples, respectively. The space size of the small block is 32x32 x3.
B. Loss function
The loss function includes four parts, a confrontation loss function, an L1 loss function, a scene loss function, and a local texture fine-tuning loss function, which are expressed as formula 1:
L(G)=Ladv+L1+LCX+Llocal(formula 1)
Wherein G represents a generator, L (G) is a loss function of the generator; l isadvIs a penalty function; l is1Is a loss function, LCXIs a scene loss function; l islocalFine tuning a loss function for the local texture;
the first part of the antagonistic loss is represented by the formulae 2 to 4:
wherein, L (D)sha) A loss function representing a shape discriminator, L (D)tex) A loss function representing the texture discriminator,representing a picture randomly selected from a stylized reference set,to representPicture after grey processing, ygrayIs the output of the shape decoder, y is the output of the texture decoder, λadvshaAnd λadvtexTo balance the weights.
The second part L1 loss function is expressed as equation 5-7:
L1=λ1grayL1ygray+λ1texL1tex(formula 7)
Wherein, ygrayIs the output of the shape decoder and,is the shape decoder output ygrayY is the output of the texture decoder,output y's ground treth picture, λ, for the texture decoder1grayAnd λ1texTo balance the weights.
The third partial contextual loss function is expressed by equations 8-9:
Lcx=λcxgrayLcxgray+λcxtexLcxtex(formula 10)
Wherein CX is the scene similarity (Roey Mechrez, Itamar Talmi, and Lihi Zelnik-Manor.2018.the contextual loss for image transformation with non-aligned data. in Proceedings of the European Conference on Computer Vision (ECCV). 768-783.), phil(. about.) represents extracting the feature of VGG19 (Karen Simony and Andrew Zisserman.2014.very deep connected networks for large-scale image registration. arXiv print arXiv:1409.1556 (2014)) on the shape or texture picture, l represents the l layer of VGG19, N represents all the layers of VGG19, and λ represents the number of layers of the VGG19cxgrayAnd λcxtexRepresenting the weight.
As shown in fig. 4, the fourth partial texture trimming loss is expressed by the following equation 11-12:
wherein, L (D)local) Representing the loss function, p, of the local arbiterrealRepresenting local patches, p, clipped from real samplesblurRepresents prealLocal patches after Gaussian blur processing, pyRepresenting local patches, λ, clipped from the output y of the texture decoderlocalTo balance the weights.
Therefore, the complete objective loss function is the maximum and minimum game, and is expressed as formula (13):
C. data set
First, a special effect font data set is collected and synthesized. As shown in fig. 5, we propose a data set using MC-GAN for english characters. We synthesized a chinese effect dataset and we collected 246 printed chinese fonts and applied 10 different gradient, texture color maps for each printed font, resulting in 2,460 synthesized effect fonts. In addition, 35 artistic special effect fonts designed by designers are collected. We all took the data set picture size of 64x64x3.
As shown in fig. 6, the main parameters involved in the present invention are: the impact of the stylized reference set size and the stylized input set size are compared. The results show that the more few samples (stylized reference set) that are visible, the better the effect, while the input size (stylized input set size) of one forward propagation has less impact.
As shown in FIG. 7, the main modules of the present invention, skip-connection in the generator, texture loss and local texture fine-tuning loss. The result shows that jump connection has a large influence on the color characteristics of the synthesized stylized picture, conditional loss has a large influence on the overall structure of the font, and the local texture fine tuning loss can effectively reduce noise in the synthesized result and improve the detail richness of the local texture.
As shown in fig. 8 and 9, the invention can be well applied to the chinese and english language system, and has very good expansion capability. The result shows that the artistic special effect font picture synthesized by the method has the highest result quality. The invention realizes the automatic generation of the artistic font special effect font library based on a small amount of samples in a single stage for the first time.
It is noted that the disclosed embodiments are intended to aid in further understanding of the invention, but those skilled in the art will appreciate that: various substitutions and modifications are possible without departing from the spirit and scope of the invention and appended claims. Therefore, the invention should not be limited to the embodiments disclosed, but the scope of the invention is defined by the appended claims.
Claims (7)
1. An automatic artistic font generation method based on single-stage small-amount sample modeling learning is characterized by establishing a network model AGS-Net, pre-training the network model AGS-Net by utilizing an existing artistic font word library, extracting style characteristics from a style reference set, extracting content characteristics from a content reference set, and synthesizing stylized characters with specified style and content; fine-tuning the network model AGS-Net by using the artistic font library with only a small number of samples, and generating a complete artistic special effect font library by using the trained network model AGS-Net; the method comprises the following steps:
the first step is as follows: establishing a deep neural network model AGS-Net based on the generated countermeasure network; the following operations are specifically executed:
1) defining characters of artistic special effect fonts and decomposing the characters into styles and contents;
2) establishing a mapping function, and referring a content to a picture xcAnd a set X of style reference pictures containing a plurality of styless={x1,x2,x3… mapping to stylized glyph picture y such that the mapped picture y contains xcContent of (A) and XsThe style of (1);
3) designing a deep neural network model AGS-Net, extracting character style characteristics and character content characteristics, and generating a stylized font picture;
the network model AGS-Net is based on a generation countermeasure network and comprises a generator G and three discriminators; the generator comprises two encoders and two decoders; the three discriminators are respectively: a shape discriminator, a texture discriminator, and a local discriminator;
the two encoders are composed of a plurality of convolution layers and are respectively used for style encoding and content encoding; the input of the style encoder is a plurality of style pictures randomly selected from a style reference set; the input of the content encoder is a character picture without style;
the two decoders are a font decoder and a texture decoder and are respectively used for font style synthesis and color texture style synthesis, and stylized images y are obtained through the two decoders;
shape discriminator DshaFor discriminating font gray image ygrayTrue or false; judging whether the input sample is a real sample in the data or a false sample generated by the model; texture discriminator DtexThe system is used for judging the truth of the stylized image y; local discriminator DlocalThe method is used for judging the truth of the local small block area of the stylized image y; the discriminators are all composed of multi-layer down-sampling convolutional layers, the output is a score graph which represents the probability that each area of the input image is true;
4) establishing a network loss function of a network model AGS-Net, wherein the network loss function comprises a confrontation loss function, an L1 loss function, a situation loss function and a local texture fine-tuning loss function; training a network model AGS-Net to obtain a trained model; the method specifically comprises the following steps:
applying countermeasure losses to the generated glyph grayscale image and the stylized image, respectively, based on generating a countermeasure network;
the false sample for resisting loss is an image synthesized by the generator, when the training character is in a reference set Rs, the real sample of the discriminator is a real target picture of the target stylized image y, otherwise, an image is randomly selected from the Rs;
the L1 loss function is applied to the glyph grayscale image and the stylized image, respectively; if the character is in the reference set, a real target picture exists, the loss of L1 is not 0, otherwise, the loss is 0;
the scene loss function supplements the L1 loss, and whether the scene loss is 0 depends on whether the character is in the reference set;
the local texture fine tuning loss function is used for balancing the number of true and false samples and paying attention to the generation effect of the local texture;
the local texture fine tuning loss function is expressed by equations 11 to 12:
wherein, L (D)local) A loss function representing a local discriminator; l islocalRepresenting a local texture fine tuning loss function; p is a radical ofrealRepresenting local patches, p, clipped from real samplesblurRepresents prealLocal patches after Gaussian blur processing, pyRepresenting local patches, λ, clipped from the output y of the texture decoderlocalIs a balance weight;
the second step is that: collecting or synthesizing an artistic special effect font database data set;
the third step: pre-training a network model AGS-Net on an artistic special effect font database data set, so that mapping from a style reference set and a designated character to a target character is preliminarily learned, and a preliminarily trained network model is obtained;
generating all font pictures during pre-training, wherein all font pictures have real target pictures; in the pre-training process, all loss functions are not 0;
the fourth step: utilizing the preliminarily trained network model to perform fine adjustment on the network model aiming at a small number of sample pictures of artistic special effect style fonts to obtain a trained network model AGS-Net; all characters are generated in the training process, and only the stylized reference set R of the artistic special effect style font is usedsThe characters in the Chinese character list have real target pictures;
the fifth step: inputting a small number of sample pictures of the artistic special effect style font and a style-free character picture in the fourth step by using the trained model to obtain a synthesized special effect font picture of the target content and style, namely generating stylized characters with the style and the content;
through the steps, the automatic generation of the artistic fonts based on the single-stage small-amount sample modeling learning is realized.
2. The method for automatically generating artistic fonts based on single-stage small-amount sample modeling learning as claimed in claim 1, wherein the discriminator can adopt a discriminator in a pix2pix model.
3. The method of claim 1 in which both decoders use skip-concatenation.
4. The method of claim 1, wherein both decoders are composed of a plurality of upsampled convolutional layers; the input of each layer in the font decoder is the output of the previous layer and the series connection of the characteristics of the corresponding layers of the two encoders; the output of the font decoder is a font gray image ygray;
Inputting the characteristics of the corresponding layer of the extra serial font style decoder into each layer of the texture style decoder; the convolution operation is performed after concatenating the previous feature and the grayscale image at the end of the texture decoder.
5. The method for automatically generating artistic fonts based on single-stage small-amount sample modeling learning as claimed in claim 1, wherein the loss function of the network model AGS-Net is expressed by formula 1:
L(G)=Ladv+L1+LCX+Llocal(formula 1)
Wherein G represents a generator, L (G) is a loss function of the generator; l isadvIs a penalty function; l is1As a function of L1 losses; l isCXIs a scene loss function; l islocalFine tuning a loss function for the local texture;
the loss-against function is expressed by formulas 2 to 4:
wherein, L (D)sha) A loss function representing a shape discriminator, L (D)tex) A loss function representing the texture discriminator,representing a picture randomly selected from a stylized reference set,to representPicture after grey processing, ygrayIs the output of the shape decoder, y is the output of the texture decoder, λadvshaAnd λadvtexIs a balance weight;
the L1 loss function is expressed as formula 5 to formula 7:
L1=λ1grayL1gray+λ1texL1tex(formula 7)
Wherein, ygrayIs the output of the shape decoder and,is the shape decoder output ygrayY is the output of the texture decoder,output y's true target picture, λ, for the texture decoder1grayAnd λ1texIs a balance weight;
the scene loss function is expressed by equations 8 to 9:
Lcx=λcxgrayLcxgray+λcxtexLcxtex(formula 10)
Wherein, CX is the scene similarity; phil() represents extraction of VGG19 features on the shape or texture map; l represents the l layer of VGG 19; n represents the number of layers of all VGG 19; lambda [ alpha ]cxgrayAnd λcxtexRepresenting a weight;
6. the method for automatically generating artistic fonts based on single-stage small-amount sample modeling learning as claimed in claim 1, wherein the style reference set Rs={r1,r2,…,rnThe method comprises n stylized pictures, wherein n specifically takes the following values: the English character n is 5, and the Chinese character n is 30.
7. The method for automatically generating artistic fonts based on single-stage small-amount sample modeling learning as claimed in claim 1, wherein the fifth step is repeatedly performed: the whole artistic special effect font library is generated by inputting different content pictures in the font library each time.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910670478.8A CN110443864B (en) | 2019-07-24 | 2019-07-24 | Automatic artistic font generation method based on single-stage small-amount sample learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910670478.8A CN110443864B (en) | 2019-07-24 | 2019-07-24 | Automatic artistic font generation method based on single-stage small-amount sample learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110443864A CN110443864A (en) | 2019-11-12 |
CN110443864B true CN110443864B (en) | 2021-03-02 |
Family
ID=68431267
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910670478.8A Active CN110443864B (en) | 2019-07-24 | 2019-07-24 | Automatic artistic font generation method based on single-stage small-amount sample learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110443864B (en) |
Families Citing this family (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111046915B (en) * | 2019-11-20 | 2022-04-15 | 武汉理工大学 | Method for generating style characters |
CN111242241A (en) * | 2020-02-17 | 2020-06-05 | 南京理工大学 | Method for amplifying etched character recognition network training sample |
CN112307714B (en) * | 2020-11-03 | 2024-03-08 | 武汉理工大学 | Text style migration method based on dual-stage depth network |
CN112364838B (en) * | 2020-12-09 | 2023-04-07 | 佛山市南海区广工大数控装备协同创新研究院 | Method for improving handwriting OCR performance by utilizing synthesized online text image |
CN112734627B (en) * | 2020-12-24 | 2023-07-11 | 北京达佳互联信息技术有限公司 | Training method of image style migration model, image style migration method and device |
CN112861806B (en) * | 2021-03-17 | 2023-08-22 | 网易(杭州)网络有限公司 | Font data processing method and device based on generation countermeasure network |
CN113962192B (en) * | 2021-04-28 | 2022-11-15 | 江西师范大学 | Method and device for generating Chinese character font generation model and Chinese character font generation method and device |
CN113140018B (en) * | 2021-04-30 | 2023-06-20 | 北京百度网讯科技有限公司 | Method for training countermeasure network model, method for establishing word stock, device and equipment |
CN113140017B (en) * | 2021-04-30 | 2023-09-15 | 北京百度网讯科技有限公司 | Method for training countermeasure network model, method for establishing word stock, device and equipment |
CN113792526B (en) * | 2021-09-09 | 2024-02-09 | 北京百度网讯科技有限公司 | Training method of character generation model, character generation method, device, equipment and medium |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108090516A (en) * | 2017-12-27 | 2018-05-29 | 第四范式(北京)技术有限公司 | Automatically generate the method and system of the feature of machine learning sample |
CN109146989A (en) * | 2018-07-10 | 2019-01-04 | 华南理工大学 | A method of birds and flowers characters in a fancy style image is generated by building neural network |
CN109635883A (en) * | 2018-11-19 | 2019-04-16 | 北京大学 | The Chinese word library generation method of the structural information guidance of network is stacked based on depth |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108510569B (en) * | 2018-01-26 | 2020-11-03 | 北京大学 | Multichannel-based artistic word generation method and system |
-
2019
- 2019-07-24 CN CN201910670478.8A patent/CN110443864B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108090516A (en) * | 2017-12-27 | 2018-05-29 | 第四范式(北京)技术有限公司 | Automatically generate the method and system of the feature of machine learning sample |
CN109146989A (en) * | 2018-07-10 | 2019-01-04 | 华南理工大学 | A method of birds and flowers characters in a fancy style image is generated by building neural network |
CN109635883A (en) * | 2018-11-19 | 2019-04-16 | 北京大学 | The Chinese word library generation method of the structural information guidance of network is stacked based on depth |
Non-Patent Citations (2)
Title |
---|
Multi-Content GAN for Few-Shot Font Style Transfer;Samaneh Azadi等;《arXiv:1712.00516v1 [cs.CV]》;20171201;全文 * |
基于CGAN网络的二阶段式艺术字体渲染方法;叶武剑等;《广东工业大学学报》;20190531;第36卷(第3期);全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN110443864A (en) | 2019-11-12 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110443864B (en) | Automatic artistic font generation method based on single-stage small-amount sample learning | |
CN109635883B (en) | Chinese character library generation method based on structural information guidance of deep stack network | |
CN111242841B (en) | Image background style migration method based on semantic segmentation and deep learning | |
CN113255813B (en) | Multi-style image generation method based on feature fusion | |
CN111553837A (en) | Artistic text image generation method based on neural style migration | |
JP2016057918A (en) | Image processing device, image processing method, and program | |
CN113343705A (en) | Text semantic based detail preservation image generation method and system | |
CN110114776A (en) | Use the system and method for the character recognition of full convolutional neural networks | |
CN112364838B (en) | Method for improving handwriting OCR performance by utilizing synthesized online text image | |
CN113222875B (en) | Image harmonious synthesis method based on color constancy | |
CN115620010A (en) | Semantic segmentation method for RGB-T bimodal feature fusion | |
CN113657404B (en) | Image processing method of Dongba pictograph | |
US12100071B2 (en) | System and method for generating images of the same style based on layout | |
US20220301106A1 (en) | Training method and apparatus for image processing model, and image processing method and apparatus | |
CN116797868A (en) | Text image generation method and diffusion generation model training method | |
Ko et al. | SKFont: skeleton-driven Korean font generator with conditional deep adversarial networks | |
Li et al. | Freepih: Training-free painterly image harmonization with diffusion model | |
CN113096133A (en) | Method for constructing semantic segmentation network based on attention mechanism | |
CN114037644B (en) | Artistic word image synthesis system and method based on generation countermeasure network | |
CN110909161B (en) | English word classification method based on density clustering and visual similarity | |
CN116229442B (en) | Text image synthesis and instantiation weight transfer learning method | |
CN118587326B (en) | Method for generating prompt-decoupling text-to-large scene remote sensing image | |
CN113128456B (en) | Pedestrian re-identification method based on combined picture generation | |
CN117593755B (en) | Method and system for recognizing gold text image based on skeleton model pre-training | |
CN112652030B (en) | Color space position layout recommendation method based on specific scene |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |