CN110443864B

CN110443864B - A method for automatic generation of artistic fonts based on single-stage few-sample learning

Info

Publication number: CN110443864B
Application number: CN201910670478.8A
Authority: CN
Inventors: 连宙辉; 高月; 郭远; 唐英敏; 肖建国
Original assignee: Peking University
Current assignee: Peking University
Priority date: 2019-07-24
Filing date: 2019-07-24
Publication date: 2021-03-02
Anticipated expiration: 2039-07-24
Also published as: CN110443864A

Abstract

The invention discloses an automatic generation method of artistic fonts based on single-stage modeling learning with a small number of samples, establishing a network model (AGS-Net), and pre-training the network model (AGS-Net) for the existing complete synthetic artistic font font library. , so that the model can extract style features from the style reference set input, extract content features from the content reference set input, and synthesize stylized characters of specified style and content; use the artistic font font library designed by designers with only a few samples to analyze the network model AGS ‑Net for fine-tuning; a complete font library for artistic special effects is generated through the trained network model AGS‑Net. The network model of the present invention is small in scale and parameter quantity. By adopting the technical scheme of the present invention, it can be expanded on any language system, not limited to a specific language, and can achieve the best automatic synthesis effect of artistic fonts.

Description

Automatic artistic font generation method based on single-stage small-amount sample learning

Technical Field

The invention relates to an automatic generating system of artistic special effect fonts based on a small number of samples, which respectively models the shapes and texture styles of the artistic special effect fonts through an artificial neural network and realizes style migration by utilizing a data training model, and belongs to the fields of artificial intelligence and computer vision.

Background

With the rapid development of computer technology and mobile internet, computer word libraries, especially artistic word libraries, are more and more common in daily life. However, the design and production of these word libraries are mainly completed by professional manufacturers. Although many print font libraries are available at present, people have an increasing demand for more personalized artistic fonts. But designing a set of artistic fonts is more time consuming and labor intensive due to the complexity of the set compared to ordinary printed fonts. In recent years, the development of artificial intelligence technology makes it possible for a computer to automatically complete font design and font library generation.

However, the related art of designing and making a complete word stock using only a small number of samples is not yet sophisticated. For languages with a relatively small number of characters (e.g., english), it is relatively easy to design a set of fonts that are stylistically consistent with a given sample. But for languages with a large number of characters (e.g., chinese), it becomes very time consuming to do this manually. In addition, taking Chinese as an example, many characters contain shapes and structures which are very responsible, and in addition to artistic styles with various changing forms, manual design can hardly ensure consistency of the shapes, the structures, the textures and other styles.

Currently, many scholars attempt to automatically generate word stocks using artificial intelligence techniques. The literature (sheet baluja. learning type style. CoRR, abs/1603.04000,2016.) is one of the early English word stock generation methods based on the deep learning method. The literature (Yue Jiang, Zhouhui Lian, Yingmin Tang, and Jianguo Xiao. Scfont: Structure-defined Chinese font generation video decoder stack networks.2019.) proposes an end-to-end multi-stage stack model to synthesize a high-quality Chinese handwritten word library. However, both methods cannot be applied to the generation of artistic special effect style fonts, and can only be applied to printing style fonts. The document (Samaneh Azadi, Matthew Fisher, Vladimir G Kim, Zhaowen Wang, Eli Shechman, and Trevor Darrell. Multi-content GAN for now-shot font transfer. in Proceedings of the IEEE CVPR, pages 7564 and 7573,2018.) proposes an innovative Multi-stage model MC-GAN (Multi-content GAN) for generating a small number of sample artistic fonts, but the method can only be applied to 26 English capital letters and cannot be used on language systems containing more characters such as Chinese and the like; and the parameter quantity of the model is very large and is difficult to train.

Disclosure of Invention

In order to overcome the defects of the prior art, the invention provides an art font automatic generation method and system based on single-stage small-amount sample modeling learning, and by using the system, a user only needs to provide a small amount of stylized reference samples (for example, only 5 letter reference pictures are needed in English, and only 30 Chinese character reference pictures are needed in Chinese) with the same art special effect style, and specifies the characters needing to be generated, so that the system can automatically generate the specified characters with the style, and further can generate all other stylized characters in the whole word stock. The method solves the two problems that the existing multi-stage model MC-GAN can only be applied to 26 English capital letters and can not be used on language systems such as Chinese characters and the like, the parameter quantity of the model is very large and the training is difficult, and realizes better generation effect.

The technical scheme adopted by the invention is as follows:

a artistic font automatic generation method based on single-stage small-amount sample modeling learning is characterized in that a network model (AGS-Net) is established, and a network model (AGS-Net) is pre-trained aiming at an existing complete synthesized artistic font word library, so that the model can extract style characteristics from style reference set input, extract content characteristics from content reference set input and synthesize stylized characters of a specified style and content; fine-tuning the network model AGS-Net by using an artistic font word library designed by a designer with only a small number of samples; and generating a complete artistic special effect font library through the trained network model AGS-Net.

The automatic generating method of the artistic special effect font mainly comprises the following steps:

the first step is as follows: establishing a network model AGS-Net; the method mainly comprises the following operations:

1) defining characters with artistic special effect fonts, and decomposing the characters into styles and contents;

2) establishing a mapping function, and converting a picture (content reference picture) x containing content_cAnd a set X of style reference pictures containing a plurality of styles (same style)_s＝{x₁,x₂,x₃… mapping to stylized glyph picture y such that the mapped picture y contains x_cContent of (A) and X_sThe style of (1);

3) designing a deep neural network model AGS-Net, realizing the extraction of character style characteristics and character content characteristics and the generation of stylized font pictures;

4) and designing network Loss functions of the deep neural network model AGS-Net, wherein the network Loss functions comprise a confrontation Loss function, an L1 Loss function, a context Loss function Contextual Loss and a local texture fine-tuning Loss function.

In the step 1), defining characters of artistic special effect fonts and decomposing the characters into styles and contents; the styles of the artistic special effect font characters comprise a font style and a color texture style, wherein the font style is the printing font styles of the stroke thickness, the lining line and the like of the characters, and the color texture style represents the stroke color and the texture change style of the characters; the content of the characters of the artistic special effect font indicates which letter or Chinese character the characters are.

Establishing a mapping function in the step 2), which comprises the following operations:

the generation task is defined as a mapping function. Mapping function refers a piece of content to picture x_cStylized picture reference set X composed of reference pictures with same style_s＝{x₁,x₂,x₃…, mapped to a stylized glyph image y, the mapped y containing the specified content (x)_cContent) and style (X)_sIs the desired target result).

Content reference picture x used_cIs a standard black and white picture with little style information (e.g., an axerial font in english). The reason for using a stylized picture reference set of identical stylized reference pictures instead of a single stylized picture is that each stylized character picture is made up of its content and style, requiring decoupling of the content and style in a manner.

Given a reference set of stylized pictures, the common features of these pictures, namely style information, are extracted by establishing a mapping function, and the content information is ignored. For an artistic font style, style reference set R_s＝{r₁,r₂,…,r_nThere are n stylized pictures, n being small (a few stylized pictures, in this embodiment, n is 5 in english, and n is 30 in chinese). In one forward propagation of the network model,from R_sRandomly selecting m (m)<n) pictures as stylized picture reference set X_s，

And inputting the data into the deep neural network model.

Step 3) realizing the mapping of the step 2) by establishing a deep neural network model AGS-Net;

based on the mapping function established in the step 2), the invention realizes the mapping function by establishing the deep neural network AGS-Net so as to realize the extraction of style and content characteristics and the generation of stylized pictures. The network model AGS-Net established by the invention is based on generation of a countermeasure network and comprises a generator G and three discriminators (shown in figure 2). The generator contains two encoders and two decoders. The two encoders have almost the same structure, are respectively responsible for the encoding of styles and contents, and are composed of a plurality of convolutional layers. The input of the style encoder is several style pictures randomly selected from a style reference set; the input to the content encoder is a stylish (e.g., Arial font in english or bold in chinese) character picture. For both decoders, one responsible for glyph style synthesis and the other responsible for color texture style synthesis, consists of multiple upsampled convolutional layers. In both decoders we use skip connections (fig. 3). The structure of the glyph style decoder is similar to the method in the literature (Phillip Isola, et. al. image-to-image transformation with conditional adaptation network. in Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1125-1134, 2017.), and the input of each layer is the output of the previous layer and the concatenation of the layer characteristics corresponding to two encoders. Then, the font decoder outputs a font gray image y_grayTexture style decoders are structurally similar, but each layer input is additionally connected in series with corresponding layer features of a glyph style decoder, and the purpose of skip connection is to fully utilize features of different scales. At the end of the texture decoder, a convolution operation is performed after concatenation of the previous feature and the grayscale image. Discrimination of network model parts by three discriminators through cooperation of two decoders to ultimately obtain a stylized image y.Consists of the following components: shape discriminator D_shaFor discriminating font gray image y_grayTrue or false; texture discriminator D_texResponsible for discriminating the genuineness of the stylized image y, and a local discriminator D_localAnd the system is responsible for judging the truth of the local small block area of the stylized image y. Discriminating true or false means determining whether the input sample is a true sample in the data or a false sample generated by the model. The structure of the discriminator is composed of a plurality of layers of downsampled convolutional layers, and the output is a score map (score map) which represents the probability that each region of the input image is true. In the invention, the discriminator in a pix2pix model can be adopted as the discriminator.

Step 4) establishing a loss function of the deep neural network model AGS-Net, and training the model to obtain a trained model;

in the training process of the AGS-Net network model, the loss function of the network model mainly comprises 4 parts: a penalty for confrontation function, a L1 penalty function, a context penalty (Contextual Loss), and a local texture trim penalty function. The network model AGS-Net of the present invention applies countermeasure loss to the generated font gradation image and stylized image, respectively, based on generating a countermeasure network. The false samples against the loss are the images synthesized by the generator when the training characters are in the reference set R_sIf so, the real sample of the discriminator is the real target (ground route) picture of the target stylized image y, otherwise, the real sample is the ground route picture of the target stylized image y from R_sRandomly selects an image. To stabilize the training process, we used the L1 loss function, which was also applied to the glyph grayscale image and the stylized image, respectively. If the character is in the reference set, then there is a true target picture, the L1 penalty is not 0, otherwise 0 the scene penalty is proposed by Mechrez et al (Roey Mechrez, Itamar Talmi, and Lihi Zelnik-Manor.2018.the texture loss for image transformation with non-aligned data. in Proceedings of the ECCV.768-783.). Since L1 loss requires data spatial alignment, and scene loss does not, we will complement L1 loss. As with the L1 penalty, whether the scene penalty is 0 depends on whether the character is in the reference set. In addition, we propose local texture fine tuning loss to balance the number of true and false samples and focus on local texture generationAnd (5) effect. As shown in FIG. 4, we randomly cut out small blocks from the whole picture and input them to the local discriminator D_local. And taking small blocks cut from the stylized input picture as real samples, adding Gaussian (Gaussian) blur to the real small block samples to obtain false samples, and cutting the small blocks from the synthesized image to be also used as the false samples. In this way, we create a bridge between real and blurred samples, enabling the local arbiter to cause the generator to generate more real, less noisy and better detailed samples.

The second step is that: collecting or synthesizing an artistic special effect font database data set;

specifically, in this step, for the English font, the present invention uses the existing artistic special effect font proposed by MC-GAN (Samaneh Azadi, Matthew Fisher, Vladimir G Kim, Zhaowen Wang, Eli Shechtman, and Trevor Darrell. Multi-content GAN for the raw-shot font transfer. in Proceedings of the IEEE CVPR, pages 7564 and 7573,2018.) which contains the author synthesized artistic special effect font data set and the designer designed artistic special effect font data set; for Chinese font, the present invention utilizes black and white font with only font style to synthesize artistic font library by adding color and texture special effect to the font part. In addition, the invention collects the Chinese art special effect font data set.

The third step: the network model (AGS-Net) proposed by the invention is pre-trained on an artistic special effect font database, so that the network can preliminarily learn the mapping from a style reference set and a specified character to a target character. After pre-training, obtaining a preliminarily trained network model;

the fourth step: utilizing the preliminarily trained network model, aiming at a specially designed artistic special effect style font, carrying out fine adjustment on the network model on a small amount of samples to obtain a trained network model;

in order to stabilize the training process, the model training of the present invention is divided into two steps, pre-training (third step) and fine-tuning (fourth step). In the third step, the invention pre-trains the network model on the character picture collected and synthesized in the first step. The model generates all the font pictures, in English numberOn the set of data, the model generates a picture of 26 capital-letters special effect characters available, and the characters all have real target pictures. On the Chinese data set, the model generates 500 common Chinese character special effect font pictures, and the Chinese characters have real target pictures. In the pre-training process, all loss functions are not 0, and in the fourth step, the invention carries out fine adjustment on the network model aiming at the artistic special effect font. Using the pre-trained model in the second step, training the network model again, and generating all characters in the training process, wherein only a few characters are the stylized reference set R_sThe characters in (1) have real target pictures.

The fifth step: and generating stylized characters with the style and the content by inputting a small number of sample pictures (such as 5 pictures in English and 30 pictures in Chinese) and a stylized character picture which are stylized by a professional design artistic special effect by using the trained model. In this step, the invention uses the trained network obtained in the fourth step to input a content picture and the reference set of the stroke-formatted picture in the fourth step into the model, so as to obtain the synthetic special effect font picture with the target content and style.

And a sixth step: and repeating the fifth step in the whole word stock, and inputting different content pictures each time to generate the whole artistic special effect font word stock. In this step, the fifth step is repeated. And inputting different content pictures and the style reference set which is the same as that in the fifth step into the network model to generate special effect font pictures with different contents, thereby obtaining the whole artistic special effect stylized word stock.

Compared with the prior art, the invention has the beneficial effects that:

the invention provides an automatic artistic font generation method based on a single-stage small amount of samples, which comprises the steps of establishing a network model (AGS-Net), pre-training the network model (AGS-Net) aiming at the existing complete synthesized artistic font word library, and enabling the model to extract style characteristics from style reference set input, extract content characteristics from content reference set input and synthesize stylized characters with specified style and content; fine-tuning the network model AGS-Net by using an artistic font word library designed by a designer with only a small number of samples; and generating a complete artistic special effect font library through the trained network model AGS-Net. The loss function of the model comprises a new loss function for optimizing the local texture special effect. The technical scheme of the invention can be expanded on any language system and is not limited to specific languages. The network model of the invention has smaller scale and parameter quantity. The invention achieves the best effect of automatically synthesizing the artistic fonts at present.

Drawings

Fig. 1 is a flow chart of an automatic generating method of art special effect fonts, which mainly includes six steps, the first step is: designing and establishing a network model AGS-Net, collecting or synthesizing an artistic special effect font database data set, and a third step: pre-training the network model (AGS-Net) provided by the invention on an artistic special effect font database data set to enable the network to preliminarily learn the mapping from the style reference set and the designated character to the target character, and fourthly: utilizing the preliminarily trained network model, aiming at an artistic special effect style font of a professional design, carrying out fine adjustment on the network model on a small amount of samples to obtain the trained network model, and carrying out the fifth step: and (5) inputting a small amount of professional design artistic special effect stylization by utilizing the trained model, and sixthly: and repeating the fifth step in the whole word stock, and inputting different content pictures each time to generate the whole artistic special effect font word stock.

FIG. 2 is a schematic structural diagram of a network model according to the present invention, which includes a generator and three discriminators, wherein the generator inputs a content reference diagram x_cAnd a style reference input set X_sThe outputs of the decoders are sent to discriminators for discrimination. Shape discriminator D_shaFor discriminating font gray image y_grayTrue or false; texture discriminator D_texResponsible for discriminating the genuineness of the stylized image y, and a local discriminator D_localAnd the system is responsible for judging the truth of the local small block area of the stylized image y, and the judgment of the truth represents that the input sample is a real sample in the data or a false sample generated by the model. The truth is represented by the output truth map. The structure of the discriminator is composed of a plurality of convolution layers. Arrows indicate data flow.

Fig. 3 is a schematic diagram of a skip-connection structure (skip-connection) of a generator in a network model structure adopted in the embodiment of the present invention, which shows that features of corresponding convolutional layers are combined and connected in series and input to a next layer.

Fig. 4 is a schematic diagram of operations involved in local arbiter and local texture fine tuning loss according to an embodiment of the present invention, including local arbiter and clipping patch, gaussian blur operation, and arrows indicating data flow.

FIG. 5 is a schematic of a composite data set of a pre-trained network; wherein, (a) is English data set, and (b) is Chinese data set.

FIG. 6 is a graph illustrating the comparison of the effects of different stylized reference set sizes and stylized input set sizes.

FIG. 7 is a graph comparing the impact of key modules of the network model used in the embodiments of the present invention.

FIG. 8 is a comparison of English stylized character images generated using the models of the present invention and a prior art method.

FIG. 9 is a comparison of a Chinese stylized character image generated using the model of the present invention with an existing model.

Detailed Description

The technical solutions in the embodiments of the present invention are clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it should be understood that the described examples are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The invention provides an automatic generating method of artistic fonts, which synthesizes a whole word stock by utilizing a small amount of stylized character pictures. The general flow chart of the process is shown in figure 1.

A. Network model structure

As shown in fig. 2, our network model consists of one generator and three discriminators (shape, texture, local discriminators, respectively). The generator consists of two encoders (content, style encoder) and two decoders (shape, texture decoder). As shown in fig. 3, the generator of the network model of the present invention utilizes a skip-connection structure (skip-connection) to ensure full utilization of content and style characteristics. Both encoders are identical except for the dimension of the first layer input, which is composed of 6 convolutional layers containing example normalized, non-linear functions. Both decoders consist of 6 upsampled convolutional layers (each layer containing an example normalization, a non-linear function), except for the last layer of the texture decoder, which is one more convolutional layer. Both the shape and texture discriminators generate a countermeasure network (PatchGAN) structure in small pieces, resulting in a truth map size of 14x14x 1. As shown in fig. 4, we propose a local discriminator to clip small blocks from a picture and apply gaussian blur to obtain real and false small block samples, respectively. The space size of the small block is 32x32 x3.

B. Loss function

The loss function includes four parts, a confrontation loss function, an L1 loss function, a scene loss function, and a local texture fine-tuning loss function, which are expressed as formula 1:

L(G)＝L_adv+L₁+L_CX+L_local(formula 1)

Wherein G represents a generator, L (G) is a loss function of the generator; l is_advIs a penalty function; l is₁Is a loss function, L_CXIs a scene loss function; l is_localFine tuning a loss function for the local texture;

the first part of the antagonistic loss is represented by the formulae 2 to 4:

wherein, L (D)_sha) A loss function representing a shape discriminator, L (D)_tex) A loss function representing the texture discriminator,

representing a picture randomly selected from a stylized reference set,

to represent

Picture after grey processing, y_grayIs the output of the shape decoder, y is the output of the texture decoder, λ_advshaAnd λ_advtexTo balance the weights.

The second part L1 loss function is expressed as equation 5-7:

L₁＝λ_1grayL_1ygray+λ_1texL_1tex(formula 7)

Wherein, y_grayIs the output of the shape decoder and,

is the shape decoder output y_grayY is the output of the texture decoder,

output y's ground treth picture, λ, for the texture decoder_1grayAnd λ_1texTo balance the weights.

The third partial contextual loss function is expressed by equations 8-9:

L_cx＝λ_cxgrayL_cxgray+λ_cxtexL_cxtex(formula 10)

Wherein CX is the scene similarity (Roey Mechrez, Itamar Talmi, and Lihi Zelnik-Manor.2018.the contextual loss for image transformation with non-aligned data. in Proceedings of the European Conference on Computer Vision (ECCV). 768-783.), phi^l(. about.) represents extracting the feature of VGG19 (Karen Simony and Andrew Zisserman.2014.very deep connected networks for large-scale image registration. arXiv print arXiv:1409.1556 (2014)) on the shape or texture picture, l represents the l layer of VGG19, N represents all the layers of VGG19, and λ represents the number of layers of the VGG19_cxgrayAnd λ_cxtexRepresenting the weight.

As shown in fig. 4, the fourth partial texture trimming loss is expressed by the following equation 11-12:

wherein, L (D)_local) Representing the loss function, p, of the local arbiter_realRepresenting local patches, p, clipped from real samples_blurRepresents p_realLocal patches after Gaussian blur processing, p_yRepresenting local patches, λ, clipped from the output y of the texture decoder_localTo balance the weights.

Therefore, the complete objective loss function is the maximum and minimum game, and is expressed as formula (13):

C. data set

First, a special effect font data set is collected and synthesized. As shown in fig. 5, we propose a data set using MC-GAN for english characters. We synthesized a chinese effect dataset and we collected 246 printed chinese fonts and applied 10 different gradient, texture color maps for each printed font, resulting in 2,460 synthesized effect fonts. In addition, 35 artistic special effect fonts designed by designers are collected. We all took the data set picture size of 64x64x3.

As shown in fig. 6, the main parameters involved in the present invention are: the impact of the stylized reference set size and the stylized input set size are compared. The results show that the more few samples (stylized reference set) that are visible, the better the effect, while the input size (stylized input set size) of one forward propagation has less impact.

As shown in FIG. 7, the main modules of the present invention, skip-connection in the generator, texture loss and local texture fine-tuning loss. The result shows that jump connection has a large influence on the color characteristics of the synthesized stylized picture, conditional loss has a large influence on the overall structure of the font, and the local texture fine tuning loss can effectively reduce noise in the synthesized result and improve the detail richness of the local texture.

As shown in fig. 8 and 9, the invention can be well applied to the chinese and english language system, and has very good expansion capability. The result shows that the artistic special effect font picture synthesized by the method has the highest result quality. The invention realizes the automatic generation of the artistic font special effect font library based on a small amount of samples in a single stage for the first time.

It is noted that the disclosed embodiments are intended to aid in further understanding of the invention, but those skilled in the art will appreciate that: various substitutions and modifications are possible without departing from the spirit and scope of the invention and appended claims. Therefore, the invention should not be limited to the embodiments disclosed, but the scope of the invention is defined by the appended claims.

Claims

1. An automatic artistic font generation method based on single-stage small-amount sample modeling learning is characterized by establishing a network model AGS-Net, pre-training the network model AGS-Net by utilizing an existing artistic font word library, extracting style characteristics from a style reference set, extracting content characteristics from a content reference set, and synthesizing stylized characters with specified style and content; fine-tuning the network model AGS-Net by using the artistic font library with only a small number of samples, and generating a complete artistic special effect font library by using the trained network model AGS-Net; the method comprises the following steps:

the first step is as follows: establishing a deep neural network model AGS-Net based on the generated countermeasure network; the following operations are specifically executed:

1) defining characters of artistic special effect fonts and decomposing the characters into styles and contents;

2) establishing a mapping function, and referring a content to a picture x_cAnd a set X of style reference pictures containing a plurality of styles_s＝{x₁,x₂,x₃… mapping to stylized glyph picture y such that the mapped picture y contains x_cContent of (A) and X_sThe style of (1);

3) designing a deep neural network model AGS-Net, extracting character style characteristics and character content characteristics, and generating a stylized font picture;

the network model AGS-Net is based on a generation countermeasure network and comprises a generator G and three discriminators; the generator comprises two encoders and two decoders; the three discriminators are respectively: a shape discriminator, a texture discriminator, and a local discriminator;

the two encoders are composed of a plurality of convolution layers and are respectively used for style encoding and content encoding; the input of the style encoder is a plurality of style pictures randomly selected from a style reference set; the input of the content encoder is a character picture without style;

the two decoders are a font decoder and a texture decoder and are respectively used for font style synthesis and color texture style synthesis, and stylized images y are obtained through the two decoders;

shape discriminator D_shaFor discriminating font gray image y_grayTrue or false; judging whether the input sample is a real sample in the data or a false sample generated by the model; texture discriminator D_texThe system is used for judging the truth of the stylized image y; local discriminator D_localThe method is used for judging the truth of the local small block area of the stylized image y; the discriminators are all composed of multi-layer down-sampling convolutional layers, the output is a score graph which represents the probability that each area of the input image is true;

4) establishing a network loss function of a network model AGS-Net, wherein the network loss function comprises a confrontation loss function, an L1 loss function, a situation loss function and a local texture fine-tuning loss function; training a network model AGS-Net to obtain a trained model; the method specifically comprises the following steps:

applying countermeasure losses to the generated glyph grayscale image and the stylized image, respectively, based on generating a countermeasure network;

the false sample for resisting loss is an image synthesized by the generator, when the training character is in a reference set Rs, the real sample of the discriminator is a real target picture of the target stylized image y, otherwise, an image is randomly selected from the Rs;

the L1 loss function is applied to the glyph grayscale image and the stylized image, respectively; if the character is in the reference set, a real target picture exists, the loss of L1 is not 0, otherwise, the loss is 0;

the scene loss function supplements the L1 loss, and whether the scene loss is 0 depends on whether the character is in the reference set;

the local texture fine tuning loss function is used for balancing the number of true and false samples and paying attention to the generation effect of the local texture;

the local texture fine tuning loss function is expressed by equations 11 to 12:

wherein, L (D)_local) A loss function representing a local discriminator; l is_localRepresenting a local texture fine tuning loss function; p is a radical of_realRepresenting local patches, p, clipped from real samples_blurRepresents p_realLocal patches after Gaussian blur processing, p_yRepresenting local patches, λ, clipped from the output y of the texture decoder_localIs a balance weight;

the third step: pre-training a network model AGS-Net on an artistic special effect font database data set, so that mapping from a style reference set and a designated character to a target character is preliminarily learned, and a preliminarily trained network model is obtained;

generating all font pictures during pre-training, wherein all font pictures have real target pictures; in the pre-training process, all loss functions are not 0;

the fourth step: utilizing the preliminarily trained network model to perform fine adjustment on the network model aiming at a small number of sample pictures of artistic special effect style fonts to obtain a trained network model AGS-Net; all characters are generated in the training process, and only the stylized reference set R of the artistic special effect style font is used_sThe characters in the Chinese character list have real target pictures;

the fifth step: inputting a small number of sample pictures of the artistic special effect style font and a style-free character picture in the fourth step by using the trained model to obtain a synthesized special effect font picture of the target content and style, namely generating stylized characters with the style and the content;

through the steps, the automatic generation of the artistic fonts based on the single-stage small-amount sample modeling learning is realized.

2. The method for automatically generating artistic fonts based on single-stage small-amount sample modeling learning as claimed in claim 1, wherein the discriminator can adopt a discriminator in a pix2pix model.

3. The method of claim 1 in which both decoders use skip-concatenation.

4. The method of claim 1, wherein both decoders are composed of a plurality of upsampled convolutional layers; the input of each layer in the font decoder is the output of the previous layer and the series connection of the characteristics of the corresponding layers of the two encoders; the output of the font decoder is a font gray image y_gray；

Inputting the characteristics of the corresponding layer of the extra serial font style decoder into each layer of the texture style decoder; the convolution operation is performed after concatenating the previous feature and the grayscale image at the end of the texture decoder.

5. The method for automatically generating artistic fonts based on single-stage small-amount sample modeling learning as claimed in claim 1, wherein the loss function of the network model AGS-Net is expressed by formula 1:

L(G)＝L_adv+L₁+L_CX+L_local(formula 1)

Wherein G represents a generator, L (G) is a loss function of the generator; l is_advIs a penalty function; l is₁As a function of L1 losses; l is_CXIs a scene loss function; l is_localFine tuning a loss function for the local texture;

the loss-against function is expressed by formulas 2 to 4:

representing a picture randomly selected from a stylized reference set,

to represent

Picture after grey processing, y_grayIs the output of the shape decoder, y is the output of the texture decoder, λ_advshaAnd λ_advtexIs a balance weight;

the L1 loss function is expressed as formula 5 to formula 7:

L₁＝λ_1grayL_1gray+λ_1texL_1tex(formula 7)

Wherein, y_grayIs the output of the shape decoder and,

is the shape decoder output y_grayY is the output of the texture decoder,

output y's true target picture, λ, for the texture decoder_1grayAnd λ_1texIs a balance weight;

the scene loss function is expressed by equations 8 to 9:

L_cx＝λ_cxgrayL_cxgray+λ_cxtexL_cxtex(formula 10)

Wherein, CX is the scene similarity; phi^l() represents extraction of VGG19 features on the shape or texture map; l represents the l layer of VGG 19; n represents the number of layers of all VGG 19; lambda [ alpha ]_cxgrayAnd λ_cxtexRepresenting a weight;

the final objective loss function is expressed as:

6. the method for automatically generating artistic fonts based on single-stage small-amount sample modeling learning as claimed in claim 1, wherein the style reference set R_s＝{r₁,r₂,…,r_nThe method comprises n stylized pictures, wherein n specifically takes the following values: the English character n is 5, and the Chinese character n is 30.

7. The method for automatically generating artistic fonts based on single-stage small-amount sample modeling learning as claimed in claim 1, wherein the fifth step is repeatedly performed: the whole artistic special effect font library is generated by inputting different content pictures in the font library each time.