CN117058266A

CN117058266A - Handwriting word generation method based on skeleton and outline

Info

Publication number: CN117058266A
Application number: CN202311313408.XA
Authority: CN
Inventors: 曾锦山; 章燕; 汪叶飞; 熊佳鹭; 汪蕊
Original assignee: Jiangxi Normal University
Current assignee: Jiangxi Normal University
Priority date: 2023-10-11
Filing date: 2023-10-11
Publication date: 2023-11-14
Anticipated expiration: 2043-10-11
Also published as: CN117058266B

Abstract

The invention discloses a method for generating calligraphy characters based on skeleton and outline, which includes the following steps: establishing a model; the model uses the CycleGAN model as a backbone network, and the CycleGAN model includes two sets of generative adversarial networks, and the model also includes Con, Ske , IPaD and SCF; Step 2, train the model; the source domain style Chinese character image is used as the original image input model, and the original image is converted into the target style image through the first group of generative adversarial networks, and the second group of generative adversarial networks is used to convert the original image into a target style image. The target style images output by the first set of generative adversarial networks are converted into reconstructed images. During the training process, the model is optimized by calculating the loss of the entire model; Step 3: Obtain the optimized model for automatic generation of calligraphy fonts. The present invention introduces an effective skeleton-contour fusion module to fuse skeleton information and contour information, and can achieve high-quality content style performance in the absence of accurate paired font samples.

Description

A calligraphy character generation method based on skeleton and outline

技术领域Technical field

本发明属于计算机视觉技术领域，具体涉及一种基于骨架和轮廓的书法字生成方法。The invention belongs to the field of computer vision technology, and specifically relates to a calligraphy character generation method based on skeleton and outline.

背景技术Background technique

中国书法是一种以汉字为基础的艺术形式，主要用画笔书写。近年来，随着人工智能技术的快速发展，对中国书法自动生成的研究逐渐出现，致力于文化遗产的数字保护和继承，并建立了一个可广泛应用的中国书法文本数据库。然而，书法汉字的自动生成在技术上具有相当大的挑战性，主要体现有以下两个方面：1、书法字符的形状多种多样，而书法字体的整体形状也非常不同。2、书法字大多是传统字，其结构比简化字更复杂。Chinese calligraphy is an art form based on Chinese characters and is written primarily with a brush. In recent years, with the rapid development of artificial intelligence technology, research on the automatic generation of Chinese calligraphy has gradually emerged, committed to the digital protection and inheritance of cultural heritage, and established a widely applicable Chinese calligraphy text database. However, the automatic generation of calligraphy Chinese characters is technically quite challenging, mainly reflected in the following two aspects: 1. The shapes of calligraphy characters are diverse, and the overall shapes of calligraphy fonts are also very different. 2. Most calligraphy characters are traditional characters, and their structures are more complex than simplified characters.

针对上述两个挑战，现有的汉字生成方法通常被认为是图像到图像的转换问题。现有技术中，有些采用Pix2Pix模型进行中文字体生成，通过构建直接从标准字体字符中生成书法字符的深度神经网络模型实现书法字体的生成。另一种现有技术则构建了有效的书法生成模型LF-Font，通过利用成对的字符和组件来提取内容和风格表示，但是这些模型需要成对的数据进行训练，收集大量成对的样本往往是不切实际和繁重的，特别是对于某些字体生成问题，如古代书法字体，这导致现有技术在小样本情况下难以得到足够的配对字体，导致这些模型难以得到准确可靠的结果。To address the above two challenges, existing Chinese character generation methods are usually considered as an image-to-image conversion problem. In the existing technology, some use the Pix2Pix model to generate Chinese fonts, and realize the generation of calligraphy fonts by building a deep neural network model that directly generates calligraphy characters from standard font characters. Another existing technology builds an effective calligraphy generation model LF-Font, which extracts content and style representation by utilizing pairs of characters and components. However, these models require paired data for training and a large number of paired samples are collected. It is often impractical and cumbersome, especially for certain font generation problems, such as ancient calligraphy fonts, which makes it difficult for existing techniques to obtain sufficient paired fonts in small sample cases, making it difficult for these models to obtain accurate and reliable results.

为了解决数据配对的问题，一些技术人员采用CycleGAN模型来基于未配对的数据生成中文字体，如可变形生成模型DG-Font。该技术引入了某些笔画编码来缓解模式崩溃的问题，有些现有技术还通过使用少量配对样本作为监督，提出了其半监督变量，另一些则利用多个分块变换（square-block transformations）来捕捉汉字的字形结构，还有的现有技术使用了汉字的轮廓来获取全局信息。In order to solve the problem of data pairing, some technicians use CycleGAN models to generate Chinese fonts based on unpaired data, such as the deformable generation model DG-Font. This technique introduces certain stroke encodings to alleviate the problem of mode collapse. Some existing techniques also propose their semi-supervised variants by using a small number of paired samples as supervision, and others utilize multiple square-block transformations. To capture the glyph structure of Chinese characters, other existing technologies use the outline of Chinese characters to obtain global information.

尽管这些有监督、无监督和自我监督的模型对一般中文字体的生成非常有效，但由于汉字多样的形状和不同字体件非常不同的风格，这些现有技术在应用于中国书法生成时的效果仍然不令人满意，特别是难以产生高质量的内容风格的表现，这也是中国书法生成的关键。上述技术中有些仍需要一定量成对的数据为生成结果提供重要的监督，但是收集成对数据的数量是非常困难的。而单纯利用文字的骨架或轮廓，生成字体的风格或内容上常常有一些缺陷，仍不能满足中国书法字体的自动生成需求。Although these supervised, unsupervised, and self-supervised models are very effective for general Chinese font generation, these existing techniques are still ineffective when applied to Chinese calligraphy generation due to the diverse shapes of Chinese characters and the very different styles of different font pieces. It is unsatisfactory, especially the difficulty in producing high-quality performance of content and style, which is also the key to the generation of Chinese calligraphy. Some of the above techniques still require a certain amount of paired data to provide significant supervision for generating results, but collecting the amount of paired data is very difficult. However, simply using the skeleton or outline of text to generate fonts often has some flaws in the style or content, and still cannot meet the needs for automatic generation of Chinese calligraphy fonts.

发明内容Contents of the invention

本发明的目的是提供一种基于骨架和轮廓的书法字生成方法，用于解决现有技术中在没有足够成对字体监督的情况下，生成的中国书法字体难以产生高质量的内容风格表现的技术问题。The purpose of the present invention is to provide a method for generating calligraphy characters based on skeleton and outline, which is used to solve the problem in the existing technology that the generated Chinese calligraphy fonts are difficult to produce high-quality content and style expression without sufficient paired font supervision. technical problem.

所述的一种基于骨架和轮廓的书法字生成方法，包括下列步骤。The method for generating calligraphy characters based on skeleton and outline includes the following steps.

步骤一、建立模型；所述模型以CycleGAN模型为骨干网络，CycleGAN模型包含两组生成对抗网络。Step 1: Establish a model; the model uses the CycleGAN model as the backbone network, and the CycleGAN model includes two sets of generative adversarial networks.

步骤二、对所述模型进行训练；输入模型的汉字字体风格为源域风格，源域风格的汉字图像即源域图像，采集源域风格的汉字图像做训练样本，需要生成的书法字体图像的字体风格为目标风格，目标风格的书法字体图像即目标域图像，采集目标域图像形成书法数据集；源域图像在训练时作为原始图像输入模型，通过第一组生成对抗网络将原始图像转化为目标风格图像，通过第二组生成对抗网络将第一组生成对抗网络输出的目标风格图像转化为重构图像，目标风格图像的字体风格应与目标风格一致，重构图像的字体风格应与源域风格一致，训练过程中通过计算整个模型的损失，对模型进行优化，优化目标是让整个模型的损失最小化。Step 2: Train the model; the Chinese character font style input to the model is the source domain style, and the Chinese character image in the source domain style is the source domain image. The Chinese character images in the source domain style are collected as training samples. The calligraphy font image that needs to be generated is The font style is the target style, and the calligraphy font image of the target style is the target domain image. The target domain image is collected to form a calligraphy data set; the source domain image is used as the original image input model during training, and the original image is converted into For the target style image, the target style image output by the first group of generative adversarial networks is converted into a reconstructed image through the second group of generative adversarial networks. The font style of the target style image should be consistent with the target style, and the font style of the reconstructed image should be consistent with the source The domain style is consistent. During the training process, the model is optimized by calculating the loss of the entire model. The optimization goal is to minimize the loss of the entire model.

步骤三、获得优化后的模型用于书法字体自动生成。Step 3: Obtain the optimized model for automatic generation of calligraphy fonts.

其中，两组生成对抗网络中均包括轮廓提取模块Con、骨架提取模块Ske和骨架-轮廓融合模块SCF，所述模型还包括不精确配对数据模块IPaD。Among them, both sets of generative adversarial networks include a contour extraction module Con, a skeleton extraction module Ske, and a skeleton-contour fusion module SCF. The model also includes an inexact pairing data module IPaD.

所述步骤二中，两组生成对抗网络均通过轮廓提取模块Con和骨架提取模块Ske分别提取骨架信息和轮廓信息，并将骨架信息和轮廓信息通过骨架-轮廓融合模块SCF融合后在生成器中与输入生成器的图像拼接，再由相应生成器处理生成图像。In the second step, both groups of generative adversarial networks extract skeleton information and contour information respectively through the contour extraction module Con and the skeleton extraction module Ske, and fuse the skeleton information and contour information through the skeleton-contour fusion module SCF in the generator. It is spliced with the image input to the generator, and then processed by the corresponding generator to generate the image.

不精确配对数据模块IPaD自动识别书法数据集中的字符并记录为识别标签，再根据目标风格图像在书法数据集中进行不精确配对，配对时允许对有关的书法数据集使用错误的识别标签,从而得到不精确配对数据。The inexact pairing data module IPaD automatically recognizes the characters in the calligraphy data set and records them as identification tags, and then performs inexact pairing in the calligraphy data set based on the target style image. During pairing, it allows the use of wrong identification tags for the relevant calligraphy data sets, thus obtaining Inexact pairing data.

整个模型的损失包括第一代对抗性损失L _advy、第二代对抗性损失L _advx、循环一致性损失L _cyc、骨架一致性损失L _ske、轮廓一致性损失L _con和不精确的配对损失L _inex。The losses of the entire model include the first-generation adversarial loss L _advy , the second-generation adversarial loss L _advx , cycle consistency loss L _cyc , skeleton consistency loss L _ske , contour consistency loss L _con and imprecise pairing loss L _inex .

优选的，所述步骤一中，第一组生成对抗网络包括构建的生成器一G _y和鉴别器一D _y，第二组生成对抗网络包括构建的生成器二G _x和鉴别器二D _x；生成器一G _y用于将原始图像转化为目标风格图像，鉴别器一D _y用来判别生成的目标风格图像与目标域图像之间字体风格是否一致；第二组生成对抗网络采用相反的过程对第一组生成对抗网络输出的结果进行重构，即通过生成器二G _x将目标风格图像转化为源域风格的重构图像，鉴别器二D _x用来判别生成的重构图像与源域图像之间字体风格是否一致。Preferably, in step one, the first group of generative adversarial networks includes the constructed generator G _y and the discriminator D _y , and the second group of generative adversarial networks includes the constructed generator G _x and the discriminator D _x ; The generator G _y is used to convert the original image into a target style image, and the discriminator D _y is used to determine whether the font style between the generated target style image and the target domain image is consistent; the second group of generative adversarial networks uses the opposite The process reconstructs the results output by the first set of generative adversarial networks, that is, the target style image is converted into a reconstructed image of the source domain style through the generator 2 G _x , and the discriminator 2 D _x is used to distinguish between the generated reconstructed image and Whether the font style between source domain images is consistent.

优选的，所述步骤二中，在第一组生成对抗网络中，源域图像x作为输入的原始图像分别通过骨架提取模块Ske和轮廓提取模块Con处理，对应提取到骨架信息sx和轮廓信息cx，骨架信息sx和轮廓信息cx二者通过骨架-轮廓融合模块SCF融合；原始图像x输入生成器一G _y，生成器一G _y在处理过程中，将原始图像x与骨架-轮廓融合模块SCF所得的骨架特征E _asx和轮廓特征E _bcx在通道层次进行拼接，处理后生成目标风格图像，采集目标域图像y组成目标域数据集Y，将目标风格图像/>和目标域数据集Y中的目标域图像y分别输入鉴别器一D _y判断二者经鉴别器一D _y返回的结果是否一致，以此评估目标风格图像/>的真实性。Preferably, in the second step, in the first group of generative adversarial networks, the original image of the source domain image x as input is processed by the skeleton extraction module Ske and the contour extraction module Con respectively, and the corresponding skeleton information sx and contour information cx are extracted. , the skeleton information sx and the contour information cx are fused through the skeleton-contour fusion module SCF; the original image x is input to the generator G _y , and during the processing, the generator G _y combines the original image x with the skeleton-contour fusion module SCF The obtained skeleton features E _asx and contour features E _bcx are spliced at the channel level, and the target style image is generated after processing. , collect the target domain image y to form the target domain data set Y , and convert the target style image/> and the target domain image y in the target domain data set Y are respectively input into the discriminator D _y to determine whether the results returned by the discriminator D _y are consistent, so as to evaluate the target style image/> authenticity.

优选的，向骨架-轮廓融合模块SCF输入给定一个汉字的骨架信息和轮廓信息后，骨架-轮廓融合模块SCF首先将它们输入对应的骨架编码器和轮廓编码器，以产生对应的骨架特征E _sx和轮廓特征E _cx；然后将编码的骨架特征E _sx和轮廓特征E _cx相加得到特征E _scx并使用SoftMax函数得到归一化特征c ^Z；基于归一化特征c ^Z，使用注意力权重公式计算相应的骨架特征E _sx的权重a _c和轮廓特征E _cx的权重b _c；最后，将计算出的权重a _c和b _c乘以对应的骨架特征E _sx和轮廓特征E _cx，得到融合权重的骨架特征E _asx和融合权重的轮廓特征E _bcx，计算式描述如下所示：Preferably, after inputting the skeleton information and contour information of a given Chinese character to the skeleton-contour fusion module SCF, the skeleton-contour fusion module SCF first inputs them into the corresponding skeleton encoder and contour encoder to generate the corresponding skeleton feature E _sx and contour feature E _cx ; then add the encoded skeleton feature E _sx and contour feature E _cx to obtain the feature E _scx and use the SoftMax function to obtain the normalized feature c ^Z ; based on the normalized feature c ^Z , use the attention weight The formula calculates the weight a _c of the corresponding skeleton feature E _sx and the weight b _c of the contour feature E _cx ; finally, multiply the calculated weights a _c and b _c by the corresponding skeleton feature E _sx and contour feature E _cx to obtain the fusion The weighted skeleton feature E _asx and the fused weighted contour feature E _bcx are calculated as follows:

,/> ,/>

其中，a _c、b _c和c ^Z中的c都表示的通道c上的计算，A和B是两个可学习参数的矩阵。 Among them, a _c , b _c and c c in ^Z all represent the calculation on channel c , and A and B are matrices of two learnable parameters.

优选的，在第二组生成对抗网络中，目标风格图像再通过骨架提取模块Ske和轮廓提取模块Con处理，提取到相应的骨架信息/>和轮廓信息/>，骨架信息/>和轮廓信息/>二者通过骨架-轮廓融合模块SCF融合；目标风格图像/>输入生成器二G _x，生成器二G _x在处理过程中，将目标风格图像/>与骨架-轮廓融合模块SCF融合所得的相应骨架特征和相应轮廓特征在通道层次进行拼接，重构生成与源域风格一致的重构图像/>；采集源域图像x组成源域数据集X，重构图像/>和源域数据集X中的源域图像x输入鉴别器二D _x后判断二者经鉴别器二D _x返回的结果是否一致，以此评估目标重构图像/>的真实性。Preferably, in the second group of generative adversarial networks, the target style image Then through the skeleton extraction module Ske and the contour extraction module Con, the corresponding skeleton information is extracted/> and contour information/> , skeleton information/> and contour information/> The two are fused through the skeleton-contour fusion module SCF; target style image/> Input generator two G _x . During the processing, generator two G _x converts the target style image/> The corresponding skeleton features and corresponding contour features obtained by fusion with the skeleton-contour fusion module SCF are spliced at the channel level, and reconstructed to generate a reconstructed image consistent with the style of the source domain/> ;Collect the source domain image x to form the source domain data set X , and reconstruct the image/> After inputting the source domain image x in the source domain _data set X to the discriminator D _x , it is judged whether the results returned by the two discriminators D authenticity.

优选的，所述步骤二中，CycleGAN模型中第一组生成对抗网络中，由鉴别器一D _y计算得到目标风格图像与目标域图像之间在字体风格上的差异，即第一代对抗性损失L _advy，用于优化生成器一G _y；第二组生成对抗网络的输入是基于第一组生成对抗网络中生成器一G _y的输出，第二组生成对抗网络中鉴别器二D _x计算源域图像和重构图像之间在字体风格上的差异，即第二代对抗性损失L _advx，用于优化生成器二G _x。Preferably, in the second step, in the first group of generative adversarial networks in the CycleGAN model, the difference in font style between the target style image and the target domain image is calculated by the discriminator D _y , that is, the first generation adversarial The loss L _advy is used to optimize the generator one G _y ; the input of the second group of generative adversarial networks is based on the output of the generator one G _y in the first group of generative adversarial networks, and the discriminator two D _x in the second group of generative adversarial networks The difference in font style between the source domain image and the reconstructed image is calculated, that is, the second-generation adversarial loss L _advx , which is used to optimize the generator two G _x .

循环一致性损失L _cyc、骨架一致性损失L _ske、轮廓一致性损失L _con、不精确的配对损失L _inex均对应优化生成器二G _x和生成器一G _y；循环一致性损失L _cyc是源域风格的原始图像x和重构图像间的损失；骨架一致性损失L _ske是原始图像x的骨架信息sx和重构图像/>中提取出的骨架信息/>间的损失；轮廓一致性损失L _con是原始图像x的轮廓信息cx和重构图像/>中提取出的轮廓信息/>间的损失，不精确的配对损失L _inex是不精确配对数据y _inex和对应到不精确配对数据的目标风格图像/>之间的损失。Cycle consistency loss L _cyc , skeleton consistency loss L _ske , contour consistency loss L _con , and imprecise pairing loss L _inex all correspond to the optimized generator two G _x and generator one G _y ; the cycle consistency loss L _cyc is Original image x and reconstructed image in source domain style The loss between; the skeleton consistency loss L _ske is the skeleton information sx of the original image x and the reconstructed image/> Skeleton information extracted from The loss between; the contour consistency loss L _con is the contour information cx of the original image x and the reconstructed image/> Contour information extracted from The loss between, the imprecise pairing loss L _inex is the imprecise pairing data y _inex and the target style image corresponding to the imprecise pairing data/> between losses.

优选的，第二代对抗性损失L _advx、第一代对抗性损失L _advy、循环一致性损失L _cyc、骨架一致性损失L _ske、轮廓一致性损失L _con、不精确的配对损失L _inex的算式依次如下：Preferably, the second generation adversarial loss L _advx , the first generation adversarial loss L _advy , cycle consistency loss L _cyc , skeleton consistency loss L _ske , contour consistency loss L _con , and imprecise pairing _loss Linex The calculation formula is as follows:

其中，E_x~X[ ]表示在给定源域数据集X中的源域图像x分布下对[ ]里面数据的期望值，表示在给定重构图像集合/>中的重构图像/>分布下对[ ]里面数据的期望值，logD_x(x)表示鉴别器二D _x将源域图像x识别为源域图像的概率，log(1-logD_x(/>))表示鉴别器二D _x将重构图像/>识别为不是源域图像的概率；E_y~Y[ ]表示在给定目标域数据集Y中的目标域图像y分布下对[ ]里面数据的期望值，/>表示在给定目标风格图像集合/>中的目标风格图像/>分布下对[ ]里面数据的期望值，logD_y(y)表示鉴别器一D _y将目标域图像y识别为目标域图像的概率，log(1-logD_y(/>))表示鉴别器一D _y将目标风格图像/>识别为不是目标域图像的概率；/>表示在给定源域数据集X中的源域图像x以及给定重构图像集合/>中的重构图像/>的分布下对|| ||₁里面数据的范数的期望值，Ske(x)和Ske(/>)分别表示通过骨架提取模块Ske对源域图像x和重构图像/>处理所得的结果，Con(x)和Con(/>)分别表示通过轮廓提取模块Con对源域图像x和重构图像/>处理所得的结果；/>表示重构图像/>的集合，Y _inex表示不精确配对数据y _inex的集合，/>表示对应到不精确配对数据的目标风格图像，/>表示对应到不精确配对数据的目标风格图像/>的集合，表示在给定集合Y _inex中的不精确配对数据y _inex以及给定集合/>中的对应到不精确配对数据的目标风格图像/>的分布下对|| ||₁里面数据的范数的期望值。Among them, E _{x ~ X} [ ] represents the expected value of the data in [ ] under the given source domain image x distribution in the source domain data set X , Represented in a given set of reconstructed images/> Reconstructed image in/> The expected value of the data in [ ] under the distribution, log D _x ( x ) represents the probability that the discriminator D _x recognizes the source domain image x as the source domain image, log (1- log D _x (/> )) means that the discriminator D _x will reconstruct the image/> The probability of being recognized as not being a source domain image; E _{y ~ Y} [ ] represents the expected value of the data in [ ] under the given target domain image y distribution in the target domain data set Y , /> Represents a collection of images in a given target style/> Target style image in/> The expected value of the data in [ ] under the distribution, log D _y ( y ) represents the probability that the discriminator D _y recognizes the target domain image y as the target domain image, log (1- log D _y (/> )) means that the discriminator D _y converts the target style image/> The probability of being recognized as an image that is not the target domain;/> Represents a source domain image x in a given source domain data set X and a given set of reconstructed images /> Reconstructed image in/> The expected value of the norm of the data in || || ₁ under the distribution, Ske ( x ) and Ske (/> ) respectively represent the source domain image x and reconstructed image /> through the skeleton extraction module Ske Process the results obtained, Con ( x ) and Con (/> ) respectively represent the source domain image x and reconstructed image /> through the contour extraction module Con Processing results;/> Represents reconstructed image/> The set of Y _inex represents the set of inexact paired data y _inex , /> Represents the target style image corresponding to the imprecisely paired data, /> Represents target style images corresponding to imprecisely paired data/> collection of Represents the inexact paired data y _inex in the given set Y _inex and the given set /> Correspondence to target style image of imprecise paired data/> The expected value of the norm of the data in || || ₁ under the distribution.

优选的，整个模型的模型损失的算式如下：Preferably, the model loss of the entire model The calculation formula is as follows:

该式中，λ _cyc、λ _ske、λ _con、λ _inex分别是对应循环一致性损失L _cyc、骨架一致性损失L _ske、轮廓一致性损失L _con、不精确的配对损失L _inex的四个可调的超参数，表示相应损失在整个模型损失中的权重。In this formula, λ _cyc , λ _ske , λ _con , and λ _inex are four possible parameters corresponding to the cycle consistency loss L _cyc , the skeleton consistency loss L _ske , the contour consistency loss L _con , and the inaccurate pairing loss L _inex respectively. The tuned hyperparameter represents the weight of the corresponding loss in the entire model loss.

本发明具有以下优点：由于书法字体更加复杂，包括连笔画、笔画锐度、粗细等多种书法风格特征，这些特征很难单独使用骨架、笔画编码或其他组件来表征。因此，本方案引入轮廓来表示这些风格特征。而单纯的轮廓信息也不能确定字符的内容，因此引入了一个有效的骨架-轮廓融合模块来融合骨架信息和轮廓信息。本方案还由不精确配对数据模块IPaD自动识别书法数据集中的字符并记录为识别标签，得到不精确配对数据集。不精确配对数据集用于计算所生成图像与不精确配对数据集中相应的不精确配对图像之间的图像级损失。基于上述技术特点，本方案能综合利用骨架或轮廓信息，并且无需较多数量的配对样本即可实现对中国书法字体的自动生成，并能实现高质量的内容风格表现。The present invention has the following advantages: since calligraphy fonts are more complex and include multiple calligraphy style features such as connected strokes, stroke sharpness, thickness, etc., these features are difficult to characterize using skeletons, stroke codes or other components alone. Therefore, this scheme introduces silhouettes to represent these stylistic features. The pure outline information cannot determine the content of the character, so an effective skeleton-contour fusion module is introduced to fuse the skeleton information and outline information. This solution also uses the imprecise paired data module IPaD to automatically identify characters in the calligraphy data set and record them as identification tags to obtain an imprecise paired data set. The imprecise paired dataset is used to compute the image-level loss between the generated image and the corresponding imprecise paired image in the imprecise paired dataset. Based on the above technical characteristics, this solution can comprehensively utilize skeleton or outline information, and can automatically generate Chinese calligraphy fonts without requiring a large number of paired samples, and can achieve high-quality content style expression.

附图说明Description of the drawings

图1为本发明一种基于骨架和轮廓的书法字生成方法的模型流程图。Figure 1 is a model flow chart of a calligraphy character generation method based on skeleton and outline according to the present invention.

图2为本发明中骨架-轮廓融合模块SCF的工作流程示意图。Figure 2 is a schematic diagram of the workflow of the skeleton-contour fusion module SCF in the present invention.

图3为本发明与现有技术在汉字生成结果上的比较图。Figure 3 is a comparison diagram of Chinese character generation results between the present invention and the prior art.

图4为正楷字体和于右任、诸遂良二人书法字体的对比图。Figure 4 is a comparison of the block script fonts and the calligraphy fonts of Yu Youren and Zhu Suiliang.

图5为本发明将正楷字体的四组不同汉字分别转化为八大山人、黄庭坚、诸遂良和弘一法师的书法字体的效果图。Figure 5 is a rendering of the present invention's conversion of four different sets of Chinese characters in block script fonts into calligraphy fonts of Master Bada Shanren, Huang Tingjian, Zhu Suiliang and Master Hongyi respectively.

具体实施方式Detailed ways

下面对照附图，通过对实施例的描述，对本发明具体实施方式作进一步详细的说明，以帮助本领域的技术人员对本发明的发明构思、技术方案有更完整、准确和伸入的理解。The specific embodiments of the present invention will be further described in detail by describing the embodiments with reference to the accompanying drawings to help those skilled in the art have a more complete, accurate and thorough understanding of the inventive concepts and technical solutions of the present invention.

如图1-图2所示，本发明提供了一种基于骨架和轮廓的书法字生成方法，包括下列步骤。As shown in Figures 1 and 2, the present invention provides a calligraphy character generation method based on skeleton and outline, which includes the following steps.

步骤一、建立模型。Step 1. Build the model.

CycleGAN模型，即循环生成对抗模型，该模型是一种非监督学习模型。CycleGAN模型包含两组生成对抗网络，第一组包括构建的生成器一G _y和鉴别器一D _y，第二组生成对抗网络包括构建的生成器二G _x和鉴别器二D _x。在本方案中，生成器一G _y用于将原始图像转化为目标风格图像，鉴别器一D _y用来判别生成的目标风格图像与目标域图像之间字体风格是否一致，即判别目标风格图像的真实性。第二组生成对抗网络采用相反的过程对第一组生成对抗网络输出的结果进行重构，即通过生成器二G _x将目标风格图像转化为源域风格的重构图像，鉴别器二D _x用来判别生成的重构图像与源域图像之间字体风格是否一致，即判别重构图像的真实性。CycleGAN model, namely cycle generative adversarial model, is an unsupervised learning model. The CycleGAN model contains two groups of generative adversarial networks. The first group includes the constructed generator G _y and the discriminator D _y . The second group of generative adversarial networks includes the constructed generator G _x and the discriminator D _x . In this scheme, the generator G _y is used to convert the original image into a target style image, and the discriminator D _y is used to determine whether the font style between the generated target style image and the target domain image is consistent, that is, to determine the target style image. authenticity. The second group of generative adversarial networks uses the opposite process to reconstruct the output results of the first group of generative adversarial networks, that is, the target style image is converted into a reconstructed image of the source domain style through the generator two G _x , and the discriminator two D _x It is used to determine whether the font style between the generated reconstructed image and the source domain image is consistent, that is, to determine the authenticity of the reconstructed image.

上述生成对抗网络中的生成器均包括括编码器、转换器和解码器。CycleGAN模型中第一组生成对抗网络中通过鉴别器一D _y计算得到目标风格图像与目标域图像之间在字体风格上的差异，鉴别器一D _y的损失结合生成器一G _y的损失形成第一代对抗性损失L _advy，用于优化生成器一G _y；第二组生成对抗网络的输入是基于第一组生成对抗网络中生成器一G _y的输出，第二组生成对抗网络中鉴别器二D _x计算源域图像和重构图像之间在字体风格上的差异，鉴别器二D _x的损失结合生成器二G _x的损失形成第二代对抗性损失L _advx，用于优化生成器二G _x。训练时一般先训练鉴别器一D _y和鉴别器二D _x，再基于鉴别器处理得到的两代对抗性损失分别对相应的生成器进行优化。常规CycleGAN模型中对生成器的训练，实质上是让上述两代对抗性损失最小化的过程。训练过程还可以交替的训练鉴别器和生成器。The generators in the above generative adversarial networks include encoders, converters and decoders. In the first group of generative adversarial networks in the CycleGAN model, the difference in font style between the target style image and the target domain image is calculated through the discriminator D _y . The loss of the discriminator D _y is combined with the loss of the generator G _y to form The first-generation adversarial loss L _advy is used to optimize the generator- G _y ; the input of the second group of generative adversarial networks is based on the output of the generator- G _y in the first group of generative adversarial networks. The discriminator 2 D _x calculates the difference in font style between the source domain _{image and the reconstructed image. The loss of the discriminator 2 D x} _is combined _with the loss of the generator 2 G Generator two G _x . During training, the discriminator D _y and the discriminator D _x are generally trained first, and then the corresponding generators are optimized based on the two generations of adversarial losses obtained by the discriminator processing. The training of the generator in the conventional CycleGAN model is essentially the process of minimizing the adversarial losses of the above two generations. The training process can also alternately train the discriminator and generator.

在本方案中，以CycleGAN模型作为基础模型，从而能够学习源域和目标域之间的两个映射，CycleGAN模型能够引入循环一致性损失来帮助克服配对数据的限制。本方案建立的模型以CycleGAN模型为骨干网络，CycleGAN模型包含两组生成对抗网络，两组生成对抗网络中均包括轮廓提取模块Con、骨架提取模块Ske和骨架-轮廓融合模块SCF。在轮廓提取模块中，由于书法特征图像通常以灰色表示，因此能通过使用著名的Canny算子很容易地实现轮廓信息的提取。在骨架提取模块中，采用了现有的一些简单规则相同的骨架方案（如论文：Jie Zhou, Yefei Wang, Yiyang Yuan, Qing Huang, and Jinshan Zeng, “Sgce-font: Skeleton guided channel expansion for chinese font generation,”arXivpreprint arXiv:2211.14475,2022.中所公开的提取方法）来有效地提取骨架信息。此外，本模型还设有不精确配对数据模块IPaD，对于不精确配对数据模块，使用现有的汉字识别方法（ChineseCharacter Recognition，简写CCR，例如论文：Jinshan Zeng, Ruiying Xu,Yu Wu, Hongwei Li, and Jiaxing Lu, “Zero-shot chinese character recognitionwith stroke and radical-level decompositions,”in Proceedings of theInternational Joint Conference on Neural Networks,2023.中所公开的识别方法）来自动识别书法数据集中的字符并记录为识别标签，在生成目标风格图像后依据目标风格图像进行相似性配对。不精确配对数据模块IPaD与现有技术不同之处在于：配对时允许对有关的书法数据集使用错误的识别标签，即配对结果是与原始图像相似但不同的汉字。这里虽然有些书法汉字被识别出是错误的，但它们仍然可以为相关的书法汉字提供一些重要的参考信息。In this solution, the CycleGAN model is used as the basic model to learn two mappings between the source domain and the target domain. The CycleGAN model can introduce cycle consistency loss to help overcome the limitations of paired data. The model established in this solution uses the CycleGAN model as the backbone network. The CycleGAN model contains two sets of generative adversarial networks. Both sets of generative adversarial networks include the contour extraction module Con, the skeleton extraction module Ske, and the skeleton-contour fusion module SCF. In the contour extraction module, since calligraphy feature images are usually represented in gray, the contour information can be easily extracted by using the famous Canny operator. In the skeleton extraction module, some existing skeleton schemes with the same simple rules are used (such as the paper: Jie Zhou, Yefei Wang, Yiyang Yuan, Qing Huang, and Jinshan Zeng, “Sgce-font: Skeleton guided channel expansion for chinese font generation," arXivpreprint arXiv:2211.14475, 2022.) to effectively extract skeleton information. In addition, this model also has an inexact paired data module IPaD. For the inexact paired data module, the existing Chinese Character Recognition method (CCR) is used. For example, papers: Jinshan Zeng, Ruiying Xu, Yu Wu, Hongwei Li, and Jiaxing Lu, “Zero-shot chinese character recognition with stroke and radical-level decompositions,” in Proceedings of the International Joint Conference on Neural Networks, 2023.) to automatically recognize characters in calligraphy datasets and record as recognition Label, after generating the target style image, perform similarity matching based on the target style image. The difference between the imprecise matching data module IPaD and the existing technology is that it allows the use of wrong identification labels for the relevant calligraphy data sets during matching, that is, the matching result is a Chinese character that is similar but different from the original image. Although some calligraphy Chinese characters are recognized incorrectly here, they can still provide some important reference information for related calligraphy Chinese characters.

与简体中文字体相比，书法字体更加复杂，包括连笔画、笔画锐度、粗细等多种书法风格特征，这些特征很难单独使用骨架、笔画编码或其他组件来表征。因此，引入轮廓来表示这些风格特征。而单纯的轮廓信息也不能确定字符的内容，因此引入了一个有效的骨架-轮廓融合模块来融合骨架信息和轮廓信息。骨架-轮廓融合模块的架构如图2所示。Compared with Simplified Chinese fonts, calligraphy fonts are more complex and include multiple calligraphy style features such as connected strokes, stroke sharpness, thickness, etc. These features are difficult to characterize using skeletons, stroke encodings, or other components alone. Therefore, silhouettes are introduced to represent these stylistic features. The pure outline information cannot determine the content of the character, so an effective skeleton-contour fusion module is introduced to fuse the skeleton information and outline information. The architecture of the skeleton-contour fusion module is shown in Figure 2.

步骤二、对所述模型进行训练。Step 2: Train the model.

上述模型将骨架-轮廓融合模块SCF与不精确配对数据模块IPaD集成在一起。所提出的模型融合了汉字的骨架和轮廓信息，提供了全面的结构监督信息。The above model integrates the skeleton-contour fusion module SCF and the inexact paired data module IPaD. The proposed model integrates the skeleton and outline information of Chinese characters to provide comprehensive structural supervision information.

训练的基本工作流程包括：对所述模型进行训练；输入模型的汉字字体风格为源域风格，源域风格的汉字图像即源域图像，采集源域风格的汉字图像做训练样本，需要生成的书法字体图像的字体风格为目标风格，目标风格的书法字体图像即目标域图像，采集目标域图像形成书法数据集；源域图像在训练时作为原始图像输入模型，通过第一组生成对抗网络将原始图像转化为目标风格图像，通过第二组生成对抗网络将第一组生成对抗网络输出的目标风格图像转化为重构图像，目标风格图像的字体风格应与目标风格一致，重构图像的字体风格应与源域风格一致，训练过程中通过计算整个模型的损失，对模型进行优化，优化目标是让整个模型的损失最小化。同时不精确配对数据模块IPaD自动识别书法数据集中的字符并记录为识别标签，再根据目标风格图像在书法数据集中进行不精确配对，即配对时允许对有关的书法数据集使用错误的识别标签。The basic workflow of training includes: training the model; the Chinese character font style input to the model is the source domain style, the Chinese character image in the source domain style is the source domain image, and the Chinese character images in the source domain style are collected as training samples, which need to be generated The font style of the calligraphy font image is the target style, and the calligraphy font image of the target style is the target domain image. The target domain image is collected to form a calligraphy data set; the source domain image is used as the original image input model during training, and the first set of generative adversarial networks is used to generate the calligraphy font image. The original image is converted into a target style image, and the target style image output by the first group of generative adversarial networks is converted into a reconstructed image through the second group of generative adversarial networks. The font style of the target style image should be consistent with the target style, and the font of the reconstructed image The style should be consistent with the source domain style. During the training process, the model is optimized by calculating the loss of the entire model. The optimization goal is to minimize the loss of the entire model. At the same time, the inexact pairing data module IPaD automatically recognizes the characters in the calligraphy data set and records them as identification tags, and then performs inexact pairing in the calligraphy data set based on the target style image, that is, the wrong identification tags are allowed to be used for the relevant calligraphy data set during pairing.

具体来说，在第一组生成对抗网络中，源域图像x作为输入的原始图像分别通过骨架提取模块Ske和轮廓提取模块Con处理，对应提取到骨架信息sx和轮廓信息cx，骨架信息sx和轮廓信息cx二者通过骨架-轮廓融合模块SCF融合。骨架-轮廓融合模块SCF属于一种交叉注意力模块，向骨架-轮廓融合模块SCF输入给定一个汉字的骨架信息和轮廓信息后，骨架-轮廓融合模块SCF首先将它们输入相关的编码器（即对应的骨架编码器和轮廓编码器）以产生对应的骨架特征E _sx和轮廓特征E _cx；然后将编码的骨架特征E _sx和轮廓特征E _cx相加得到特征E _scx并使用SoftMax函数得到归一化特征c ^Z。基于归一化特征c ^Z，使用注意力权重公式计算相应的骨架特征E _sx的权重a _c和轮廓特征E _cx的权重b _c。最后，将计算出的权重a _c和b _c乘以对应的骨架特征E _sx和轮廓特征E _cx，得到融合权重的骨架特征E _asx和融合权重的轮廓特征E _bcx，这里计算式描述如下所示：Specifically, in the first group of generative adversarial networks , the original image of the source domain image The contour information cx is fused through the skeleton-contour fusion module SCF. The skeleton-contour fusion module SCF is a kind of cross-attention module. After inputting the skeleton information and contour information of a Chinese character to the skeleton-contour fusion module SCF, the skeleton-contour fusion module SCF first inputs them into the relevant encoder (i.e. Corresponding skeleton encoder and contour encoder) to generate the corresponding skeleton feature E _sx and contour feature E _cx ; then add the encoded skeleton feature E _sx and contour feature E _cx to obtain the feature E _scx and use the SoftMax function to obtain the normalization Characterization c ^Z . Based on the normalized feature c ^Z , the attention weight formula is used to calculate the weight a _c of the corresponding skeleton feature E _sx and the weight b _c of the contour feature E _cx . Finally, the calculated weights a _c and b _c are multiplied by the corresponding skeleton features E _sx and contour features E _cx to obtain the fused weighted skeleton feature E _asx and the fused weighted contour feature E _bcx , where the calculation formula is described as follows :

,/> ,/>

原始图像x输入生成器一G _y，生成器一G _y在处理过程中，将原始图像x与骨架-轮廓融合模块SCF所得的骨架特征E _asx和轮廓特征E _bcx在通道层次进行拼接，处理后生成目标风格图像，之后通过鉴别器一D _y评估目标风格图像/>的真实性，即目标风格图像/>和目标域图像分别输入鉴别器一D _y判断二者经鉴别器一D _y返回的结果是否一致。The original image x is input to the generator G _y . During the processing, the generator G _y splices the original image x with the skeleton feature E _asx and the contour feature E _bcx obtained by the skeleton-contour fusion module SCF at the channel level. After processing Generate target style image , and then evaluate the target style image/> through the discriminator D _y authenticity, that is, the target style image/> and the target domain image are respectively input into the discriminator- D _y to determine whether the results returned by the two discriminators- D _y are consistent.

之后在第二组生成对抗网络中，目标风格图像再通过骨架提取模块Ske和轮廓提取模块Con处理，提取到相应的骨架信息/>和轮廓信息/>，骨架信息/>和轮廓信息/>二者通过骨架-轮廓融合模块SCF融合。目标风格图像/>输入生成器二G _x，生成器二G _x在处理过程中，将目标风格图像/>与骨架-轮廓融合模块SCF融合所得的相应骨架特征和相应轮廓特征在通道层次进行拼接，重构生成与源域风格一致的重构图像/>。之后在鉴别器二D _x评估重构图像/>的真实性，将重构图像/>和源域数据集X输入鉴别器二D _x后判断二者经鉴别器二D _x返回的结果是否一致。Then in the second set of generative adversarial networks, the target style image Then through the skeleton extraction module Ske and the contour extraction module Con, the corresponding skeleton information is extracted/> and contour information/> , skeleton information/> and contour information/> The two are fused through the skeleton-contour fusion module SCF. Target style image/> Input generator two G _x . During the processing, generator two G _x converts the target style image/> The corresponding skeleton features and corresponding contour features obtained by fusion with the skeleton-contour fusion module SCF are spliced at the channel level, and reconstructed to generate a reconstructed image consistent with the style of the source domain/> . The reconstructed image is then evaluated in the discriminator D _x The authenticity will reconstruct the image/> After inputting the discriminator D _x with the source domain data set X , it is judged whether the results returned by the two discriminators D _x are consistent.

根据上面描述的工作流程，本方案提出的模型损失包括循环一致性损失L _cyc、骨架一致性损失L _ske、轮廓一致性损失L _con、不精确的配对损失L _inex、以及两代对抗性损失L _advx和L _advy六个主要组成部分。两代对抗性损失中，L _advx对应生成器二G _x和鉴别器二D _x的第二代对抗性损失，L _advy对应生成器一G _y和鉴别器一D _y的第一代对抗性损失；循环一致性损失L _cyc是源域风格的原始图像x和重构图像间的损失。上述两代对抗性损失和循环一致损失是CycleGAN模型自身存在的损失函数，在训练中通过损失函数最小化实现对模型的优化，完成相应模型训练。According to the workflow described above, the model losses proposed in this scheme include cycle consistency loss L _cyc , skeleton consistency loss L _ske , contour consistency loss L _con , imprecise pairing loss L _inex , and two generations of adversarial loss L _advx and L _advy six main components. Among the two generations of adversarial losses, L _advx corresponds to the second generation adversarial loss of generator two G _x and discriminator two D _x , and L _advy corresponds to the first generation adversarial loss of generator one G _y and discriminator one D _y ;The cycle consistency loss L _cyc is the original image x and the reconstructed image of the source domain style loss of time. The above two generations of adversarial loss and cycle consistent loss are the loss functions of the CycleGAN model itself. During training, the model is optimized by minimizing the loss function and the corresponding model training is completed.

由于本申请方案还分别提取了图像的骨架信息和轮廓信息，因此本模型还存在轮廓一致性损失和骨架一致性损失。骨架一致性损失L _ske是原始图像x的骨架信息sx和重构图像中提取出的骨架信息/>间的损失；轮廓一致性损失L _con是原始图像x的轮廓信息cx和重构图像/>中提取出的轮廓信息/>间的损失。最后由于本方案对目标域数据集采用了不精确配对数据模块IPaD对书法数据集进行不精确配对，因此损失中还包含不精确的配对损失。Since the solution of this application also extracts the skeleton information and contour information of the image respectively, this model also has contour consistency loss and skeleton consistency loss. The skeleton consistency loss L _ske is the skeleton information sx of the original image x and the reconstructed image Skeleton information extracted from The loss between; the contour consistency loss L _con is the contour information cx of the original image x and the reconstructed image/> Contour information extracted from loss of time. Finally, because this solution uses the inexact pairing data module IPaD for the target domain data set to perform inexact pairing on the calligraphy data set, the loss also includes imprecise pairing loss.

生成器一G _y生成的目标风格图像如果无法从目标域图像中实现精确配对，则进行不精确配对，即配对时允许对有关的书法数据集使用错误的识别标签，由此得到不精确配对数据y _inex，此时相应的目标风格图像/>即为对应到不精确配对数据y _inex的目标风格图像/>。而不精确的配对损失L _inex是不精确配对数据y _inex和对应到不精确配对数据的目标风格图像/>之间的损失。上述损失中循环一致性损失L _cyc、骨架一致性损失L _ske、轮廓一致性损失L _con和不精确的配对损失L _inex均用于优化生成器一G _y和生成器二G _x。上述损失函数的算式具体如下：The target style image generated by generator one G _y If an accurate pairing cannot be achieved from the target domain image, an inexact pairing is performed, that is, the wrong identification label is allowed to be used for the relevant calligraphy data set during pairing, thereby obtaining the imprecise pairing data y _inex , at this time the corresponding target style image /> That is, the target style image corresponding to the inaccurate pairing data y _inex /> . The imprecise pairing loss L _inex is the imprecise pairing data y _inex and the target style image corresponding to the imprecise pairing data/> between losses. Among the above losses, cycle consistency loss L _cyc , skeleton consistency loss L _ske , contour consistency loss L _con and imprecise pairing loss L _inex are all used to optimize generator one G _y and generator two G _x . The calculation formula of the above loss function is as follows:

其中，E_x~X[ ]表示在给定源域数据集X中的源域图像x分布下对[ ]里面数据的期望值，表示在给定重构图像集合/>中的重构图像/>分布下对[ ]里面数据的期望值，logD_x(x)表示鉴别器二D _x将源域图像x识别为源域图像的概率，鉴别器二D _x的损失越小，logD_x(x)越大，第二代对抗性损失越小。log(1-logD_x(/>))表示鉴别器二D _x将重构图像/>识别为不是源域图像的概率；随训练过程对生成器二G _x的优化，生成器二G _x的损失越小，就表明重构图像/>与源域图像x在字体风格上差异越小，log(1-logD_x(/>))越小，鉴别器二D _x将重构图像/>正确识别的概率也越小，这导致鉴别器二D _x的损失越大，同时第二代对抗性损失也越小。E_y~Y[ ]表示在给定目标域数据集Y中的目标域图像y分布下对[ ]里面数据的期望值，表示在给定目标风格图像集合/>中的目标风格图像/>分布下对[ ]里面数据的期望值，logD_y(y)表示鉴别器一D _y将目标域图像y识别为目标域图像的概率，鉴别器一D _y的损失越小，logD_y(y)越大，第一代对抗性损失越小。log(1-logD_y(/>))表示鉴别器一D _y将目标风格图像/>识别为不是目标域图像的概率；随训练过程对生成器一G _y的优化，生成器一G _y的损失越小，就表明目标风格图像/>与目标域图像y在字体风格上差异越小，log(1-logD_y(/>))越小，鉴别器一D _y将目标风格图像/>正确识别的概率也越小，这导致鉴别器一D _y的损失越大，同时第一代对抗性损失也越小。/>表示在给定源域数据集X中的源域图像x以及给定重构图像集合/>中的重构图像/>的分布下对|| ||₁里面数据的范数的期望值，Ske(x)和Ske(/>)分别表示通过骨架提取模块Ske对源域图像x和重构图像/>处理所得的结果，Con(x)和Con(/>)分别表示通过轮廓提取模块Con对源域图像x和重构图像/>处理所得的结果；/>表示重构图像/>的集合，Y _inex表示不精确配对数据y _inex的集合，/>表示对应到不精确配对数据的目标风格图像，/>表示对应到不精确配对数据的目标风格图像/>的集合，表示在给定集合Y _inex中的不精确配对数据y _inex以及给定集合/>中的对应到不精确配对数据的目标风格图像/>的分布下对|| ||₁里面数据的范数的期望值。Among them, E _{x ~ X} [ ] represents the expected value of the data in [ ] under the given source domain image x distribution in the source domain data set X , Represented in a given set of reconstructed images/> Reconstructed image in/> The expected value of the data in [ _] under the distribution _, log D _x ( x ) represents the probability that the discriminator D _x recognizes the source domain image ), the smaller the second-generation adversarial loss. log (1- log D _x (/> )) means that the discriminator D _x will reconstruct the image/> The probability of being recognized as not a source domain image; as the training process optimizes the generator two G _x , the smaller the loss of the generator two G _x , it indicates the reconstructed image/> The smaller the difference in font style between the source image x and the source domain image x , the smaller the difference between log (1- log D _x (/> )) is smaller, the discriminator D _x will reconstruct the image/> The probability of correct identification is also smaller, which results in a larger loss for the discriminator D _x and a smaller second-generation adversarial loss. E _{y ~ Y} [ ] represents the expected value of the data in [ ] under the given target domain image y distribution in the target domain data set Y , Represents a collection of images in a given target style/> Target style image in/> The expected value of the data in [ ] under the distribution, log D _y ( y ) represents the probability that the discriminator D _y will recognize the target domain image y as the target domain image. The smaller the loss of the discriminator D _y , log D _y ( y ), the smaller the first-generation adversarial loss. log (1- log D _y (/> )) means that the discriminator D _y converts the target style image/> The probability of being recognized as an image that is not the target domain; as the training process optimizes the generator G _y , the smaller the loss of the generator G _y , it indicates the target style image/> The smaller the difference in font style between the target domain image y and the target domain image y, the smaller the difference between log (1- log D _y (/> )) is smaller, the discriminator D _y will target style image/> The probability of correct identification is also smaller, which results in a larger loss of the discriminator D _y and a smaller first-generation adversarial loss. /> Represents a source domain image x in a given source domain data set X and a given set of reconstructed images /> Reconstructed image in/> The expected value of the norm of the data in || || ₁ under the distribution, Ske ( x ) and Ske (/> ) respectively represent the source domain image x and reconstructed image /> through the skeleton extraction module Ske Process the results obtained, Con ( x ) and Con (/> ) respectively represent the source domain image x and reconstructed image /> through the contour extraction module Con Processing results;/> Represents reconstructed image/> The set of Y _inex represents the set of inexact paired data y _inex , /> Represents the target style image corresponding to the imprecisely paired data, /> Represents target style images corresponding to imprecisely paired data/> collection of Represents the inexact paired data y _inex in the given set Y _inex and the given set /> Correspondence to target style image of imprecise paired data/> The expected value of the norm of the data in || || ₁ under the distribution.

在CycleGAN模型中，整个模型（该模型即基于骨架、轮廓和不精确配对数据的书法字生成方法的模型，英文简写SCI-Font，其中S、C、I依次对应骨架提取模块Ske、轮廓提取模块Con和不精确配对数据模块IPaD）的模型损失、所有生成器G的损失以及所有鉴别器D的损失之间的关系可以通过下面的表达式描述：In the CycleGAN model, the entire model (the model is the model of the calligraphy character generation method based on skeleton, contour and imprecise paired data, the English abbreviation SCI-Font, where S, C, and I correspond to the skeleton extraction module Ske and the contour extraction module in sequence Model loss for Con and Inexact Paired Data module IPaD) , the relationship between the losses of all generators G and the losses of all discriminators D can be described by the following expression:

其中，表示模型中鉴别器D损失越大，生成器G损失越小，该表达式的含义是要在所有生成器G的损失以及所有鉴别器D损失这二者的取值范围内，找到能够使取得最小值的情况。此时所有生成器G的损失为最小值，所有鉴别器D的损失为最大值，并且同时使/>取得最优解。依据该关系对该模型训练时利用训练中反馈的模型损失对模型进行优化，减少模型损失。in, Indicates that the greater the loss of the discriminator D in the model, the smaller the loss of the generator G. The meaning of this expression is to find the value that can be used within the range of the losses of all generators G and the losses of all discriminators D. obtain the minimum value. At this time, the loss of all generators G is the minimum value, and the loss of all discriminators D is the maximum value, and at the same time/> Obtain the optimal solution. Based on this relationship, when training the model, the model loss fed back during training is used to optimize the model and reduce the model loss.

结合其他损失函数，整个模型的模型损失的算式如下：Combined with other loss functions, the model loss of the entire model The calculation formula is as follows:

该式中，λ _cyc、λ _ske、λ _con、λ _inex分别是对应循环一致性损失L _cyc、骨架一致性损失L _ske、轮廓一致性损失L _con、不精确的配对损失L _inex的四个可调的超参数，表示相应损失在整个模型损失中的权重，对超参数进行优化，选择一组最优超参数，以提高学习的性能和效果。In this formula, λ _cyc , λ _ske , λ _con , and λ _inex are four possible parameters corresponding to the cycle consistency loss L _cyc , the skeleton consistency loss L _ske , the contour consistency loss L _con , and the inaccurate pairing loss L _inex respectively. The adjusted hyperparameters represent the weight of the corresponding loss in the entire model loss. The hyperparameters are optimized and a set of optimal hyperparameters are selected to improve the performance and effect of learning.

基于上述模型和训练方式，本方案模型融合了汉字的骨架和轮廓信息，然后用作显式表示以增强由解码器产生的潜在内容风格表示，这样能有效捕捉书法字体的内容和风格特性。注意到收集配对数据的困难，利用了一些自动的中文字符识别技术生成不精确的配对数据集进一步用于监督模型性能，不精确的成对数据能更好的监督源域与目标域之间的字形差异，虽然有些书法汉字被识别出是错误的，但它们仍然可以为相关的书法汉字提供一些重要的参考信息，这些都为生成书法字的内容提供重要技术支持。Based on the above models and training methods, the model of this scheme integrates the skeleton and outline information of Chinese characters, and then uses it as an explicit representation to enhance the latent content style representation produced by the decoder, which can effectively capture the content and style characteristics of calligraphy fonts. Noting the difficulty in collecting paired data, we used some automatic Chinese character recognition technology to generate imprecise paired data sets to further supervise model performance. Inexact paired data can better supervise the relationship between the source domain and the target domain. Glyph differences, although some calligraphy Chinese characters are recognized as errors, they can still provide some important reference information for related calligraphy Chinese characters, which provide important technical support for generating the content of calligraphy characters.

应用本发明提供的方法进行字体生成实验，并与其他现有的字体生成技术进行对比，所得生成字体的对比结果如图3所示，不同生成方法的生成结果则由上至下排列，所用汉字由左至右依次分为三组，每组四个不同汉字，由左至右依次为由正楷字体生成柳公权的书法字体、由正楷字体生成颜真卿的书法字体和由正楷字体生成欧阳修的书法字体；图中标记的圆圈表示生成字体时出现的缺损错误，图中用方框框选出的汉字表示生成字体的形状不准确，出现了模式坍塌现象。其中倒数第二行即采用本发明的方法及模型（英文简写SCI-Font）。该图体现了本方法对书法字的生成效果较好。图4为正楷字体和于右任、诸遂良二人书法字体的对比图，可见相同汉字在不同书法字体中笔画和风格变化很大，还存在简繁体变化，体现书法字体的笔画复杂和风格多变。图5为应用本方法将正楷字体的四组不同汉字分别转化为八大山人、黄庭坚、诸遂良和弘一法师的书法字体的效果图，四组字体中，第一组的“悼”、第二组的“秉”、第三组的“蜀”和第四组的“郝”均生成了与输入的汉字不同的汉字，而书法字体风格则符合要求，从而体现本方法中存在不精确配对现象。The method provided by the present invention is used to conduct font generation experiments and compared with other existing font generation technologies. The comparison results of the generated fonts are shown in Figure 3. The generation results of different generation methods are arranged from top to bottom. The Chinese characters used Divided into three groups from left to right, each group has four different Chinese characters. From left to right, the calligraphy font of Liu Gongquan is generated from the block font, the calligraphy font of Yan Zhenqing is generated from the block font, and the calligraphy font of Ouyang Xiu is generated from the block font; The circles marked in the figure indicate defective errors that occur when generating fonts. The Chinese characters selected with boxes in the figure indicate that the shape of the generated fonts is inaccurate and mode collapse occurs. The penultimate row adopts the method and model of the present invention (English abbreviation SCI-Font). This figure shows that this method has a better effect on the generation of calligraphy characters. Figure 4 is a comparison of the block script fonts and the calligraphy fonts of Yu Youren and Zhu Suiliang. It can be seen that the strokes and styles of the same Chinese characters vary greatly in different calligraphy fonts, and there are also changes in simplified and traditional fonts, reflecting the complexity of strokes and the variety of styles in calligraphy fonts. Change. Figure 5 is a rendering of applying this method to convert four different groups of Chinese characters in block script fonts into calligraphy fonts of Bada Shanren, Huang Tingjian, Zhu Suiliang and Master Hongyi. Among the four groups of fonts, the first group of "mourning", the second group of Chinese characters "Bing" in the third group, "Shu" in the third group and "Hao" in the fourth group all generate Chinese characters that are different from the input Chinese characters, while the calligraphy font style meets the requirements, thus reflecting the phenomenon of inaccurate matching in this method .

上面结合附图对本发明进行了示例性描述，显然本发明具体实现并不受上述方式的限制，只要采用了本发明的发明构思和技术方案进行的各种非实质性的改进，或未经改进将本发明构思和技术方案直接应用于其它场合的，均在本发明保护范围之内。The present invention has been exemplarily described above in conjunction with the accompanying drawings. It is obvious that the specific implementation of the present invention is not limited by the above-mentioned manner, as long as various non-substantive improvements are made using the inventive concepts and technical solutions of the present invention, or without improvement. Direct application of the concepts and technical solutions of the present invention to other situations shall fall within the protection scope of the present invention.

Claims

1. A calligraphy character generation method based on skeleton and outline, including the following steps:

Step 1: Establish a model; the model uses the CycleGAN model as the backbone network, and the CycleGAN model includes two sets of generative adversarial networks;

Step 2: Train the model; the Chinese character font style input to the model is the source domain style, and the Chinese character image in the source domain style is the source domain image. The Chinese character images in the source domain style are collected as training samples. The calligraphy font image that needs to be generated is The font style is the target style, and the calligraphy font image of the target style is the target domain image. The target domain image is collected to form a calligraphy data set; the source domain image is used as the original image input model during training, and the original image is converted into For the target style image, the target style image output by the first group of generative adversarial networks is converted into a reconstructed image through the second group of generative adversarial networks. The font style of the target style image should be consistent with the target style, and the font style of the reconstructed image should be consistent with the source The domain style is consistent. During the training process, the model is optimized by calculating the loss of the entire model. The optimization goal is to minimize the loss of the entire model;

Step 3: Obtain the optimized model for automatic generation of calligraphy fonts;

It is characterized in that: both sets of generative adversarial networks include a contour extraction module Con, a skeleton extraction module Ske and a skeleton-contour fusion module SCF, and the model also includes an inexact pairing data module IPaD;

In the second step, both groups of generative adversarial networks extract skeleton information and contour information respectively through the contour extraction module Con and the skeleton extraction module Ske, and fuse the skeleton information and contour information through the skeleton-contour fusion module SCF in the generator. The image is spliced with the input generator, and then processed by the corresponding generator to generate the image;

The inexact pairing data module IPaD automatically recognizes the characters in the calligraphy data set and records them as identification tags, and then performs inexact pairing in the calligraphy data set based on the target style image. During pairing, it allows the use of wrong identification tags for the relevant calligraphy data sets, thus obtaining Inexact pairing data;

The losses of the entire model include the first-generation adversarial loss L _advy , the second-generation adversarial loss L _advx , cycle consistency loss L _cyc , skeleton consistency loss L _ske , contour consistency loss L _con and imprecise pairing loss L _inex .

2. A method _for generating calligraphy characters based on skeleton and outline according to claim 1, characterized in that: in the step one, the first group of generative adversarial networks includes a constructed generator- Gy and a discriminator- D _y , the second group of generative adversarial networks includes the built generator G _x and discriminator D _x ; the generator G _y is used to convert the original image into a target style image, and the discriminator D _y is used to identify the generated target Whether the font style is consistent between the style image and the target domain image; the second group _of generative adversarial networks uses the opposite process to reconstruct the results output by the first group of generative adversarial networks, that is, the target style image is converted into For the reconstructed image in the source domain style, the discriminator D _x is used to determine whether the font style between the generated reconstructed image and the source domain image is consistent.

3. A method for generating calligraphy characters based on skeleton and outline according to claim 2, characterized in that: in the second step, in the first group of generative adversarial networks, the source domain image x is used as the original image of the input respectively. Through the processing of the skeleton extraction module Ske and the contour extraction module Con, the skeleton information sx and the contour information cx are extracted correspondingly. The skeleton information sx and the contour information cx are fused through the skeleton-contour fusion module SCF; the original image x is input to the generator - G _y , during the processing process, the generator G _y splices the original image x with the skeleton feature E _asx and contour feature E _bcx obtained by the skeleton-contour fusion module SCF at the channel level, and generates the target style image after processing , collect the target domain image y to form the target domain data set Y , and convert the target style image/> and the target domain image y in the target domain data set Y are respectively input into the discriminator D _y to determine whether the results returned by the discriminator D _y are consistent, so as to evaluate the target style image/> authenticity.

4. A method for generating calligraphy characters based on skeleton and contour according to claim 3, characterized in that: after inputting the skeleton information and contour information of a given Chinese character to the skeleton-contour fusion module SCF, the skeleton-contour fusion module SCF first inputs them into the corresponding skeleton encoder and contour encoder to generate the corresponding skeleton feature E _sx and contour feature E _cx ; then the encoded skeleton feature E _sx and contour feature E _cx are added to obtain the feature E _scx and used The SoftMax function obtains the normalized feature c ^Z ; based on the normalized feature c ^Z , use the attention weight formula to calculate the weight a _c of the corresponding skeleton feature E _sx and the weight b _c of the contour feature E _cx ; finally, the calculated The weights a _c and b _c are multiplied by the corresponding skeleton feature E _sx and contour feature E _cx to obtain the skeleton feature E _asx of the fused weight and the contour feature E _bcx of the fused weight. The calculation formula is described as follows:

,/>

Among them, a _c , b _c and c c in ^Z all represent the calculation on channel c , and A and B are matrices of two learnable parameters.

5. A method for generating calligraphy characters based on skeleton and outline according to claim 4, characterized in that: in the second group of generative adversarial networks, the target style image Then through the skeleton extraction module Ske and the contour extraction module Con, the corresponding skeleton information is extracted/> and contour information/> , skeleton information/> and contour information/> The two are fused through the skeleton-contour fusion module SCF; target style image/> Input generator two G _x . During the processing, generator two G _x converts the target style image/> The corresponding skeleton features and corresponding contour features obtained by fusion with the skeleton-contour fusion module SCF are spliced at the channel level, and reconstructed to generate a reconstructed image consistent with the style of the source domain/> ;Collect the source domain image x to form the source domain data set X , and reconstruct the image/> After inputting the source domain image x in _the source domain data set X to the discriminator D _x , it is judged whether the results returned by the two discriminators D authenticity.

6. A method for generating calligraphy characters based on skeleton and outline according to claim 5, characterized in that: in the second step, in the first group of generative adversarial networks in the CycleGAN model, the discriminator D _y is calculated. The difference in font style between the target style image and the target domain image, that is, the first generation adversarial loss L _advy , is used to optimize the generator one G _y ; the input of the second set of generative adversarial networks is based on the first set of generative adversarial The output of the generator G _y in the network, the discriminator D _x in the second group of generative adversarial networks calculates the difference in font style between the source domain image and the reconstructed image, that is, the second generation adversarial loss L _advx , using In optimizing generator two G _x ;

Cycle consistency loss L _cyc , skeleton consistency loss L _ske , contour consistency loss L _con , and imprecise pairing loss L _inex all correspond to the optimized generator two G _x and generator one G _y ; the cycle consistency loss L _cyc is Original image x and reconstructed image in source domain style The loss between; the skeleton consistency loss L _ske is the skeleton information sx of the original image x and the reconstructed image/> Skeleton information extracted from The loss between; the contour consistency loss L _con is the contour information cx of the original image x and the reconstructed image/> Contour information extracted from The loss between, the imprecise pairing loss L _inex is the imprecise pairing data y _inex and the target style image corresponding to the imprecise pairing data/> between losses.

7. A calligraphy character generation method based on skeleton and outline according to claim 6, characterized in that: the second generation adversarial loss L _advx , the first generation adversarial loss L _advy , the cycle consistency loss L _cyc , The calculation formulas of skeleton consistency loss L _ske , contour consistency loss L _con , and inaccurate pairing loss L _inex are as follows:

Among them, E _{x ~ X} [ ] represents the expected value of the data in [ ] under the given source domain image x distribution in the source domain data set X , Represented in a given set of reconstructed images/> Reconstructed image in/> The expected value of the data in [ ] under the distribution, log D _x ( x ) represents the probability that the discriminator D _x recognizes the source domain image x as the source domain image, log (1- log D _x (/> )) means that the discriminator D _x will reconstruct the image/> The probability of being recognized as not being a source domain image; E _{y ~ Y} [ ] represents the expected value of the data in [ ] under the given target domain image y distribution in the target domain data set Y , /> Represents a collection of images in a given target style/> Target style image in/> The expected value of the data in [ ] under the distribution, log D _y ( y ) represents the probability that the discriminator D _y recognizes the target domain image y as the target domain image, log (1- log D _y (/> )) means that the discriminator D _y converts the target style image/> The probability of being recognized as an image that is not the target domain;/> Represents a source domain image x in a given source domain data set X and a given set of reconstructed images /> Reconstructed image in/> The expected value of the norm of the data in || || ₁ under the distribution, Ske ( x ) and Ske (/> ) respectively represent the source domain image x and reconstructed image /> through the skeleton extraction module Ske Process the results obtained, Con ( x ) and Con (/> ) respectively represent the source domain image x and reconstructed image /> through the contour extraction module Con Processing results;/> Represents reconstructed image/> The set of Y _inex represents the set of inexact paired data y _inex , /> Represents the target style image corresponding to the imprecisely paired data, /> Represents target style images corresponding to imprecisely paired data/> collection of Represents the inexact paired data y _inex in the given set Y _inex and the given set /> Correspondence to target style image of imprecise paired data/> The expected value of the norm of the data in || || ₁ under the distribution.

8. A method for generating calligraphy characters based on skeleton and outline according to claim 7, characterized in that:

Model loss for the entire model The calculation formula is as follows:

In this formula, λ _cyc , λ _ske , λ _con , and λ _inex are four possible parameters corresponding to the cycle consistency loss L _cyc , the skeleton consistency loss L _ske , the contour consistency loss L _con , and the inaccurate pairing loss L _inex respectively. The tuned hyperparameter represents the weight of the corresponding loss in the entire model loss.