CN117058266B

CN117058266B - Handwriting word generation method based on skeleton and outline

Info

Publication number: CN117058266B
Application number: CN202311313408.XA
Authority: CN
Inventors: 曾锦山; 章燕; 汪叶飞; 熊佳鹭; 汪蕊
Original assignee: Jiangxi Normal University
Current assignee: Jiangxi Normal University
Priority date: 2023-10-11
Filing date: 2023-10-11
Publication date: 2023-12-26
Anticipated expiration: 2043-10-11
Also published as: CN117058266A

Abstract

The invention discloses a handwriting word generation method based on a skeleton and a contour, which comprises the following steps: establishing a model; the model takes a CycleGAN model as a backbone network, the CycleGAN model comprises two groups of generating countermeasure networks, and the model also comprises Con, ske, IPaD and SCF; training the model; the method comprises the steps that a source domain style Chinese character image is used as an original image input model, the original image is converted into a target style image through a first group of generating countermeasure network, the target style image output by the first group of generating countermeasure network is converted into a reconstructed image through a second group of generating countermeasure network, and the model is optimized through calculating the loss of the whole model in the training process; and thirdly, obtaining an optimized model for automatically generating calligraphic fonts. The invention introduces an effective skeleton-outline fusion module to fuse skeleton information and outline information, and can realize high-quality content style expression under the condition of lacking accurate paired font samples.

Description

Handwriting word generation method based on skeleton and outline

Technical Field

The invention belongs to the technical field of computer vision, and particularly relates to a handwriting word generation method based on a framework and a contour.

Background

Chinese handwriting is an artistic form based on Chinese characters and is mainly written by using a painting brush. In recent years, with the rapid development of artificial intelligence technology, the research of automatic generation of Chinese calligraphy is gradually developed, aiming at digital protection and inheritance of cultural heritage, and a Chinese calligraphy text database which can be widely applied is established. However, automatic generation of calligraphy kanji is technically quite challenging, mainly embodying the following two aspects: 1. the shape of the calligraphic characters is various, and the overall shape of the calligraphic fonts is very different. 2. Most of the handwriting words are traditional words, and the structure of the handwriting words is more complex than that of the simplified words.

For the two challenges, existing methods of generating chinese characters are generally considered as image-to-image conversion problems. In the prior art, some Chinese fonts are generated by adopting a Pix2Pix model, and the generation of the handwriting fonts is realized by constructing a deep neural network model for directly generating the handwriting characters from standard font characters. Another prior art technique builds an efficient handwriting generation model LF-Font that extracts content and style representations by using paired characters and components, but these models require paired data to train, and collecting a large number of paired samples is often impractical and burdensome, especially for some Font generation problems, such as ancient handwriting fonts, which makes it difficult for the prior art to obtain enough paired fonts in the case of small samples, making it difficult for these models to obtain accurate and reliable results.

To solve the problem of data pairing, some technicians employ the CycleGAN model to generate chinese fonts, such as the deformable generation model DG-vent, based on unpaired data. Some of the prior art uses a small number of paired samples as supervision, and proposes semi-supervision variables, and other uses a plurality of block transformations (square-block transformations) to capture the font structure of Chinese characters, and other prior art uses the outline of Chinese characters to obtain global information.

Although these supervised, unsupervised and self-supervised models are very effective for general chinese font generation, these prior art techniques are still unsatisfactory when applied to chinese handwriting generation due to the diverse shapes of chinese characters and the very different styles of different font pieces, and in particular, it is difficult to produce high quality representation of content styles, which is also key to chinese handwriting generation. Some of the above techniques still require a certain amount of paired data to provide important supervision for the outcome, but collecting the amount of paired data is very difficult. The style or content of the generated fonts often has some defects by simply utilizing the skeleton or outline of the characters, and the automatic generation requirement of the Chinese handwriting fonts can not be met.

Disclosure of Invention

The invention aims to provide a handwriting character generation method based on a skeleton and a contour, which is used for solving the technical problem that the generated Chinese handwriting characters are difficult to generate high-quality content style expression under the condition that the supervision of characters is insufficient in the prior art.

The method for generating the calligraphy characters based on the skeleton and the outline comprises the following steps.

Step one, establishing a model; the model uses a CycleGAN model as a backbone network, and the CycleGAN model comprises two groups of generation countermeasure networks.

Training the model; the method comprises the steps of inputting a Chinese character font style of a model into a source domain style, acquiring a Chinese character image of the source domain style, taking the Chinese character image of the source domain style as a training sample, taking the font style of a handwriting font image to be generated as a target style, acquiring a handwriting font image of the target style, namely a target domain image, and acquiring the target domain image to form a handwriting data set; the source domain image is used as an original image input model during training, the original image is converted into a target style image through a first group of generating countermeasure network, the target style image output by the first group of generating countermeasure network is converted into a reconstructed image through a second group of generating countermeasure network, the font style of the target style image is consistent with the target style, the font style of the reconstructed image is consistent with the source domain style, the model is optimized through calculating the loss of the whole model during training, and the optimization target is to minimize the loss of the whole model.

And thirdly, obtaining an optimized model for automatically generating calligraphic fonts.

The two groups of generating countermeasure networks comprise a contour extraction module Con, a skeleton extraction module Ske and a skeleton-contour fusion module SCF, and the model further comprises an inaccurate pairing data module IPaD.

In the second step, the two groups of generating countermeasure networks respectively extract skeleton information and outline information through an outline extraction module Con and a skeleton extraction module Ske, fuse the skeleton information and the outline information through a skeleton-outline fusion module SCF, splice the skeleton information and the outline information with the image of the input generator in the generator, and process the image by the corresponding generator to generate the image.

The imprecise pairing data module IPaD automatically identifies characters in the handwriting data set and records the characters as identification tags, imprecise pairing is carried out in the handwriting data set according to the target style image, and wrong identification tags are allowed to be used for the related handwriting data set during pairing, so that imprecise pairing data are obtained.

The loss of the whole model comprises the first generation resistance lossL _advy Resistance loss of the second generationL _advx Cycle consistency lossL _cyc Loss of skeleton consistencyL _ske Loss of contour consistencyL _con And inaccurate pairing lossL _inex 。

Preferably, in the first step, the first group of generating the countermeasure network includes a first generatorG _y And discriminator oneD _y The second group of generating countermeasure networks includes a second generator constructedG _x And discriminator IID _x The method comprises the steps of carrying out a first treatment on the surface of the Generator IG _y A discriminator for converting an original image into a target style imageD _y The method is used for judging whether the font style between the generated target style image and the target domain image is consistent; the second set of generated countermeasure networks uses the reverse process to reconstruct the results of the first set of generated countermeasure network outputs, i.e., via the second generatorG _x Converting the target style image into a reconstructed image of the source domain style, and a discriminator IID _x And judging whether the font style between the generated reconstructed image and the source domain image is consistent.

Preferably, in the second step, in the first group generating countermeasure network, the source domain imagexThe original image as input is processed by a skeleton extraction module Ske and a contour extraction module Con respectively, corresponding toExtracting skeleton informationsxAnd profile informationcxSkeleton informationsxAnd profile informationcxThe two are fused by a framework-profile fusion module SCF; original imagexInput generator oneG _y Generator oneG _y During the processing, the original image is processedxSkeleton characteristics obtained by the skeleton-outline fusion module SCFE _asx And profile featuresE _bcx Splicing at the channel level, and generating a target style image after processingAcquisition of target domain imagesyComposing target domain data setsYTarget style image->And a target domain datasetYTarget domain image in (a)yRespectively input into discriminator ID _y Judging the two through a discriminator ID _y Whether the returned results are consistent, thereby evaluating the target style image +.>Is the authenticity of (a).

Preferably, after inputting skeleton information and contour information given for a Chinese character into the skeleton-contour fusion module SCF, the skeleton-contour fusion module SCF first inputs them into corresponding skeleton encoder and contour encoder to generate corresponding skeleton characteristicsE _sx And profile featuresE _cx The method comprises the steps of carrying out a first treatment on the surface of the The encoded skeleton feature is then usedE _sx And profile featuresE _cx Adding to obtain featuresE _scx And using SoftMax function to obtain normalized featuresc ^Z The method comprises the steps of carrying out a first treatment on the surface of the Based on normalized featuresc ^Z Computing corresponding skeletal features using an attention weight formulaE _sx Weights of (2)a _c And profile featuresE _cx Weights of (2)b _c The method comprises the steps of carrying out a first treatment on the surface of the Finally, the calculated weight is calculateda _c Andb _c multiplying by corresponding skeleton featuresE _sx And profile featuresE _cx Obtaining the skeleton characteristics of the fusion weightsE _asx And fusing the profile features of the weightsE _bcx The computational formula is described as follows:

,/>

wherein,a _c 、b _c andc ^Z in (a) and (b)cAll of which represent channelscThe calculation of the above-mentioned formula,AandBis a matrix of two learnable parameters.

Preferably, in the second set of generated countermeasure networks, the target style imageThen the corresponding skeleton information is extracted by the skeleton extraction module Ske and the outline extraction module Con>And profile information->Skeleton information->And profile information->The two are fused by a framework-profile fusion module SCF; target style image->Input generator IIG _x Generator twoG _x During the processing, the target style image +.>Corresponding skeleton features and corresponding outline features obtained through fusion with a skeleton-outline fusion module SCF are spliced in a channel level, and reconstructed images consistent with the source domain style are reconstructed to generate +.>The method comprises the steps of carrying out a first treatment on the surface of the Acquiring source domain imagesxComposing source domain datasetsXReconstructing an image +.>And source domain data setXIn a source domain image of a video cameraxInput discriminator IID _x After judging that they pass through a discriminator IID _x Whether the returned results are consistent, thereby evaluating the target reconstructed image +.>Is the authenticity of (a).

Preferably, in the second step, the first group of the CycleGAN models is generated by the identifier one in the antagonizing networkD _y Calculating the difference in font style between the target style image and the target domain image, i.e. the first generation of contrast lossL _advy For optimizing generator oneG _y The method comprises the steps of carrying out a first treatment on the surface of the The second set of inputs to the generation of the countermeasure network is based on the first set of generators in the generation of the countermeasure networkG _y The second group generates a second discriminator in the antagonism networkD _x Calculating differences in font style between source domain image and reconstructed image, i.e. second generation contrast lossL _advx For optimizing generator twoG _x 。

Cycle consistency lossL _cyc Loss of skeleton consistencyL _ske Loss of contour consistencyL _con Imprecise pairing lossL _inex All correspond to the optimization generator IIG _x Sum generator oneG _y The method comprises the steps of carrying out a first treatment on the surface of the Cycle consistency lossL _cyc Is the original image of the source domain stylexAnd reconstructing the imageLoss of space; loss of skeleton consistencyL _ske Is the original imagexSkeleton information of (2)sxAnd reconstructing an image +.>Skeleton information extracted from the Chinese medicine->Loss of space; contour consistency lossL _con Is the original imagexProfile information of (a)cxAnd reconstructing an image +.>Profile information extracted from the same->Loss of each other, imprecise pairing lossL _inex Is imprecise pairing datay _inex And a target style image corresponding to imprecise paired data +.>And loss between them.

Preferably, the second generation is against resistance lossL _advx Resistance loss of the first generationL _advy Cycle consistency lossL _cyc Loss of skeleton consistencyL _ske Loss of contour consistencyL _con Imprecise pairing lossL _inex The formula of (c) is as follows:

wherein E is _{x X~} [ ]Representing a dataset in a given source domainXIn a source domain image of a video cameraxLower pair of distribution []The expected value of the data within the data,representing +.>Reconstructed image +.>Lower pair of distribution []The expected value of the data within the data,logD _x (x) Representing discriminator twoD _x Image source domainxThe probability of being identified as a source domain image,log(1-logD _x (/>) Representing a discriminator twoD _x Will reconstruct the image +.>A probability of being identified as not being a source domain image; e (E) _{y Y~} [ ]Representing data sets in a given target domainYTarget domain image in (a)yLower pair of distribution []Expected value of the data inside->Representing the image set +.>Target style image->Lower pair of distribution []The expected value of the data within the data,logD _y (y) Representing discriminator oneD _y Image of target domainyThe probability of being identified as a target domain image,log(1-logD _y (/>) Representing a discriminator oneD _y The target style image->A probability of being identified as not being a target domain image; />Representing a dataset in a given source domainXIn a source domain image of a video cameraxGiven a set of reconstructed images +.>Reconstructed image +.>Lower pair of distribution (1) ₁ The expected value of the norm of the data inside,Ske(x) AndSke(/>) Respectively representing the source domain images by the skeleton extraction module SkexAnd reconstructing an image +.>The result of the processing is that,Con(x) AndCon(/>) Respectively representing the source domain images by the contour extraction module ConxAnd reconstructing an image +.>Processing the obtained result; />Representing reconstructed image +.>Is a set of (a) and (b),Y _inex representing imprecise pairing datay _inex Set of->Representing a target style image corresponding to imprecise paired data,/->Representing a target style image corresponding to imprecise paired data +.>Set of->Expressed in a given setY _inex Imprecise pairing data in (a)y _inex Given collection +.>In the target style image corresponding to inaccurate pairing data +.>Lower pair of distribution (1) ₁ Expected value of norm of the data inside.

Preferably, model loss of the entire modelThe formula is as follows:

in this case, the number of the cells,λ _cyc 、λ _ske 、λ _con 、λ _inex respectively corresponding cycle consistency lossL _cyc Loss of skeleton consistencyL _ske Loss of contour consistencyL _con Imprecise pairing lossL _inex Is used to represent the weight of the corresponding loss in the overall model loss.

The invention has the following advantages: as handwriting fonts are more complex, including multiple handwriting style features such as continuous strokes, stroke sharpness, thickness, etc., these features are difficult to characterize using skeletons, stroke codes, or other components alone. Thus, the present solution introduces contours to represent these style features. The content of the character cannot be determined by the simple outline information, so that an effective skeleton-outline fusion module is introduced to fuse the skeleton information and the outline information. The scheme also automatically identifies characters in the handwriting data set by the inaccurate pairing data module IPaD and records the characters as identification tags to obtain the inaccurate pairing data set. The imprecise pairing data set is used to calculate image level losses between the generated image and the corresponding imprecise paired image in the imprecise pairing data set. Based on the technical characteristics, the scheme can comprehensively utilize skeleton or contour information, can automatically generate Chinese calligraphy fonts without a large number of paired samples, and can realize high-quality content style expression.

Drawings

FIG. 1 is a model flow chart of a method for generating calligraphy characters based on a skeleton and a contour.

Fig. 2 is a schematic workflow diagram of the skeleton-contour fusion module SCF according to the present invention.

FIG. 3 is a graph comparing the results of the present invention with the results of the prior art.

Fig. 4 is a diagram showing a comparison of the regular script and the right-hand and Zhu Suiliang-hand calligraphic fonts.

Fig. 5 is an effect diagram of the invention for converting four different Chinese characters of regular script into calligraphic fonts of the eight mountain people, huang Tingjian, kansui and hong-yi.

Detailed Description

The following detailed description of the embodiments of the invention, given by way of example only, is presented in the accompanying drawings to aid in a more complete, accurate, and thorough understanding of the inventive concepts and aspects of the invention by those skilled in the art.

As shown in fig. 1-2, the invention provides a method for generating calligraphy characters based on a skeleton and a contour, which comprises the following steps.

Step one, building a model.

The CycleGAN model, i.e. the cyclic generation countermeasure model, is an unsupervised learning model. The CycleGAN model contains two sets of generation countermeasure networks, the first set including a built generator oneG _y And discriminator oneD _y The second group of generating countermeasure networks includes a second generator constructedG _x And discriminator IID _x . In this scheme, generator oneG _y A discriminator for converting an original image into a target style imageD _y And judging whether the font style between the generated target style image and the target domain image is consistent, namely judging the authenticity of the target style image. The second set of generated countermeasure networks uses the reverse process to reconstruct the results of the first set of generated countermeasure network outputs, i.e., via the second generatorG _x Converting the target style image into a reconstructed image of the source domain style, and a discriminator IID _x And judging whether the font style between the generated reconstructed image and the source domain image is consistent, namely judging the authenticity of the reconstructed image.

The above-described generators in the generation countermeasure network each include an encoder, a converter, and a decoder. First group in CycleGAN model generates a first pass identifier in an antagonism networkD _y Calculating to obtain difference between target style image and target domain image in font style, and distinguishing oneD _y Loss of (2) combined with generator oneG _y The loss of (2) forms a first generation of resistance lossL _advy For optimizing generator oneG _y The method comprises the steps of carrying out a first treatment on the surface of the The second set of inputs to the generation of the countermeasure network is based on the first set of generators in the generation of the countermeasure networkG _y The second group generates a second discriminator in the antagonism networkD _x Calculating difference in font style between source domain image and reconstructed image, discriminator twoD _x Loss combination generator twoG _x Formation of a second generation of resistance lossL _advx For optimizing generator twoG _x . The discriminator is generally trained first during trainingD _y And discriminator IID _x And optimizing corresponding generators based on the two generations of resistance loss obtained by the discriminator processing. Training of the generator in the conventional CycleGAN model is essentially a process that minimizes the two generations of resistance loss described above. The training process may also alternate training discriminators and generators.

In the scheme, a CycleGAN model is used as a basic model, so that two mappings between a source domain and a target domain can be learned, and the CycleGAN model can introduce a cycle consistency loss to help overcome the limitation of paired data. The model established by the scheme takes a CycleGAN model as a backbone network, the CycleGAN model comprises two groups of generating countermeasure networks, and the two groups of generating countermeasure networks comprise a contour extraction module Con, a skeleton extraction module Ske and a skeleton-contour fusion module SCF. In the contour extraction module, since the handwriting feature image is usually expressed in gray, the extraction of contour information can be easily achieved by using a well-known Canny operator. In the skeleton extraction module, existing skeleton schemes with the same simple rules (such as the extraction methods disclosed in J ie Zhou, yelei Wang, yiyang Yuan, qing Huang, and Jinhan Zeng, "Sgce-font: skeleton guided channel expansion for chinese font generation," arXiv preprint arXiv:2211.14475,2022) are adopted to effectively extract skeleton information. In addition, the model is also provided with an imprecise pairing data module IPaD, for the imprecise pairing data module, the existing Chinese character recognition method (ChineseCharacter Recognition, abbreviated CCR, for example, paper: jinhan Zeng, ruiying Xu, yu Wu, hongwei Li and Jiaxing Lu, "Zero-shot chinese character recognition with stroke and radical-level decompositions," in Proceedings of the International Joint Conference on Neural Networks,2023. The recognition method disclosed in the specification) is used for automatically recognizing characters in a handwriting data set and recording the characters as recognition tags, and similarity pairing is carried out according to a target style image after the target style image is generated. The imprecise pairing data module IPaD differs from the prior art in that: the pairing allows the use of incorrect identification tags for the relevant handwriting dataset, i.e. the pairing result is a similar but different kanji to the original image. Although some of the calligraphy Chinese characters are identified as being wrong, the calligraphy Chinese characters can still provide important reference information for the relevant calligraphy Chinese characters.

Handwriting fonts are more complex than simplified chinese fonts, including multiple handwriting style features of continuous strokes, stroke sharpness, thickness, etc., which are difficult to characterize using skeletons, stroke codes, or other components alone. Thus, contours are introduced to represent these style characteristics. The content of the character cannot be determined by the simple outline information, so that an effective skeleton-outline fusion module is introduced to fuse the skeleton information and the outline information. The framework of the skeleton-contour fusion module is shown in fig. 2.

Training the model.

The above model integrates the skeleton-contour fusion module SCF with the imprecise pairing data module IPaD. The proposed model fuses the skeleton and outline information of Chinese characters and provides comprehensive structural supervision information.

The basic workflow of training includes: training the model; the method comprises the steps of inputting a Chinese character font style of a model into a source domain style, acquiring a Chinese character image of the source domain style, taking the Chinese character image of the source domain style as a training sample, taking the font style of a handwriting font image to be generated as a target style, acquiring a handwriting font image of the target style, namely a target domain image, and acquiring the target domain image to form a handwriting data set; the source domain image is used as an original image input model during training, the original image is converted into a target style image through a first group of generating countermeasure network, the target style image output by the first group of generating countermeasure network is converted into a reconstructed image through a second group of generating countermeasure network, the font style of the target style image is consistent with the target style, the font style of the reconstructed image is consistent with the source domain style, the model is optimized through calculating the loss of the whole model during training, and the optimization target is to minimize the loss of the whole model. Meanwhile, the imprecise pairing data module IPaD automatically identifies characters in the handwriting data set and records the characters as identification tags, and imprecise pairing is carried out in the handwriting data set according to the target style image, namely, wrong identification tags are allowed to be used for related handwriting data sets during pairing.

Specifically, in a first set of generated countermeasure networks, source domain imagesxThe original image as input is processed by a skeleton extraction module Ske and a contour extraction module Con respectively, and skeleton information is correspondingly extractedsxAnd profile informationcxSkeleton informationsxAnd profile informationcxThe two are fused by a skeleton-contour fusion module SCF. The skeleton-contour fusion module SCF belongs to a cross attention module, and after skeleton information and contour information of a Chinese character are input into the skeleton-contour fusion module SCF, the skeleton-contour fusion module SCF firstly inputs the skeleton information and the contour information into related encoders (namely corresponding skeleton encoders and contour encoders) to generate corresponding skeleton characteristicsE _sx And profile featuresE _cx The method comprises the steps of carrying out a first treatment on the surface of the The encoded skeleton feature is then usedE _sx And profile featuresE _cx Adding to obtain featuresE _scx And using SoftMax function to obtain normalized featuresc ^Z . Based on normalized featuresc ^Z Computing corresponding skeletal features using an attention weight formulaE _sx Weights of (2)a _c And profile featuresE _cx Weights of (2)b _c . Finally, the calculated weight is calculateda _c Andb _c multiplying by corresponding skeleton featuresE _sx And profile featuresE _cx Obtaining the fusion weightSkeleton characteristics of (C)E _asx And fusing the profile features of the weightsE _bcx The computational formula is described herein as follows:

,/>

Original imagexInput generator oneG _y Generator oneG _y During the processing, the original image is processedxSkeleton characteristics obtained by the skeleton-outline fusion module SCFE _asx And profile featuresE _bcx Splicing at the channel level, and generating a target style image after processingAfter that, pass through a discriminator ID _y Evaluation target style image->Is the reality of the target style image +.>And the target domain image are respectively input into a discriminator ID _y Judging the two through a discriminator ID _y Whether the returned results are consistent.

Then in a second set of generated countermeasure networks, the target style imageThen the corresponding skeleton information is extracted by the skeleton extraction module Ske and the outline extraction module Con>And profile information->Skeleton information->And profile information->The two are fused by a skeleton-contour fusion module SCF. Target style image->Input generator IIG _x Generator twoG _x During the processing, the target style image +.>Corresponding skeleton features and corresponding outline features obtained through fusion with a skeleton-outline fusion module SCF are spliced in a channel level, and reconstructed images consistent with the source domain style are reconstructed to generate +.>. After that in the second discriminatorD _x Evaluating the reconstructed image +.>Is to reconstruct an image +.>And source domain data setXInput discriminator IID _x After judging that they pass through a discriminator IID _x Whether the returned results are consistent.

According to the workflow described above, the model penalty proposed by the present solution includes a loop consistency penaltyL _cyc Loss of skeleton consistencyL _ske Loss of contour consistencyL _con Imprecise pairing lossL _inex Resistance loss of two generationsL _advx AndL _advy six main components. In two generations of the loss of resistance,L _advx corresponding generator IIG _x And discriminator IID _x Is used for the second generation of resistance loss,L _advy corresponding generator oneG _y And discriminator oneD _y Is a first generation resistance loss of (a); cycle consistency lossL _cyc Is the original image of the source domain stylexAnd reconstructing the imageLoss of the two. The two generations of resistance loss and the cycle consistency loss are loss functions existing in the CycleGAN model, the model is optimized through the minimization of the loss functions in the training, and the corresponding model training is completed.

Because the scheme also extracts the skeleton information and the outline information of the image respectively, the model also has outline consistency loss and skeleton consistency loss. Loss of skeleton consistencyL _ske Is the original imagexSkeleton information of (2)sxAnd reconstructing the imageSkeleton information extracted from the Chinese medicine->Loss of space; contour consistency lossL _con Is the original imagexProfile information of (a)cxAnd reconstructing an image +.>Profile information extracted from the same->Loss of the two. Finally, the target domain data set is subjected to inaccurate pairing by adopting an inaccurate pairing data module IPaD, so that inaccurate pairing loss is also included in the loss.

Generator IG _y Generated target style imageIf accurate pairing cannot be realized from the target domain image, inaccurate pairing is performed, namely, incorrect identification labels are allowed to be used for related handwriting data sets during pairing, so that inaccurate pairing data is obtainedy _inex At this time, the corresponding target style image +.>I.e. to inaccurate pairing datay _inex Target style image->. Without accurate pairing lossL _inex Is imprecise pairing datay _inex And a target style image corresponding to imprecise paired data +.>And loss between them. Of the above losses, the cyclic consistency lossL _cyc Loss of skeleton consistencyL _ske Loss of contour consistencyL _con And inaccurate pairing lossL _inex Are all used for optimizing the generator oneG _y Sum generator twoG _x . The above-mentioned loss function is calculated as follows:

wherein E is _{x X~} [ ]Representing a dataset in a given source domainXIn a source domain image of a video cameraxLower pair of distribution []The expected value of the data within the data,representing +.>Reconstructed image +.>Lower pair of distribution []The expected value of the data within the data,logD _x (x) Representing discriminator twoD _x Image source domainxProbability of identifying as source domain image, discriminator twoD _x The smaller the loss of (c) is,logD _x (x) The larger the second generation, the smaller the resistance loss.log(1-logD _x (/>) Representing a discriminator twoD _x Will reconstruct the image +.>A probability of being identified as not being a source domain image; training process pair generator IIG _x Is optimized by generator twoG _x The smaller the loss of (2) the more reconstructed image is indicated>With source domain imagesxThe smaller the difference in font style is,log(1-logD _x (/>) Smaller), discriminator twoD _x Will reconstruct the image +.>The smaller the probability of correct recognition, which results in a second discriminatorD _x The greater the loss of (c) while the smaller the second generation resistance loss. E (E) _{y Y~} [ ]Representing data sets in a given target domainYTarget domain image in (a)yLower pair of distribution []The expected value of the data within the data,representing the image set +.>Target style image->Lower pair of distribution []The expected value of the data within the data,logD _y (y) Representing discriminator oneD _y Image of target domainyProbability of identifying as target domain image, discriminator oneD _y The smaller the loss of (c) is,logD _y (y) The larger the first generation, the smaller the resistance loss.log(1-logD _y (/>) Representing a discriminator oneD _y The target style image->A probability of being identified as not being a target domain image; training process is followed to generator oneG _y Is optimized, generator oneG _y The smaller the loss of (2) is, the indication of the target style image +.>With the target domain imageyThe smaller the difference in font style is,log(1-logD _y (/>) Smaller, discriminator one)D _y The target style image->The smaller the probability of correct recognition, which results in a discriminator oneD _y The greater the loss of (c), the less the first generation resistance loss. />Representing a dataset in a given source domainXIn a source domain image of a video cameraxGiven a set of reconstructed images +.>Reconstructed image +.>Lower pair of distribution (1) ₁ The expected value of the norm of the data inside,Ske(x) AndSke(/>) Respectively representing the source domain images by the skeleton extraction module SkexAnd reconstructing an image +.>The result of the processing is that,Con(x) AndCon(/>) Respectively representing the source domain images by the contour extraction module ConxAnd reconstructing an image +.>Processing the obtained result; />Representing reconstructed image +.>Is set of (1)The combination of the two components is carried out,Y _inex representing imprecise pairing datay _inex Set of->Representing a target style image corresponding to imprecise paired data,/->Representing target style images corresponding to imprecise paired dataSet of->Expressed in a given setY _inex Imprecise pairing data in (a)y _inex Given collection +.>In the target style image corresponding to inaccurate pairing data +.>Lower pair of distribution (1) ₁ Expected value of norm of the data inside.

In the CycleGAN model, the model loss of the whole model (the model is a model of a handwriting word generation method based on a skeleton, a contour and inaccurate pairing data, english shorthand SCI-Font, wherein S, C, I sequentially corresponds to a skeleton extraction module Ske, a contour extraction module Con and an inaccurate pairing data module IPaD)The relationship between the losses of all generators G and the losses of all discriminators D can be described by the following expression:

wherein,the larger the discriminator D loss is in the representation model, the smaller the generator G loss is, the meaning of the expression is to find the values that enable +.>And (3) obtaining the minimum value. At this time, the loss of all generators G is the minimum, the loss of all discriminators D is the maximum, and +.>And obtaining an optimal solution. And optimizing the model by using the model loss fed back in training when the model is trained according to the relation, so that the model loss is reduced.

Model loss of the entire model in combination with other loss functionsThe formula is as follows:

in this case, the number of the cells,λ _cyc 、λ _ske 、λ _con 、λ _inex respectively corresponding cycle consistency lossL _cyc Loss of skeleton consistencyL _ske Loss of contour consistencyL _con Imprecise pairing lossL _inex The four adjustable super parameters of the model are used for representing the weight of the corresponding loss in the whole model loss, optimizing the super parameters and selecting a group of optimal super parameters so as to improve the learning performance and effect.

Based on the model and the training mode, the scheme model fuses the skeleton and outline information of the Chinese characters and then is used as an explicit representation to strengthen the potential content style representation generated by a decoder, so that the content and style characteristics of the handwriting fonts can be effectively captured. Note that the difficulty of collecting paired data is that some automatic chinese character recognition techniques are utilized to generate an inaccurate paired data set for further use in supervising model performance, and the inaccurate paired data can better supervise the font differences between the source domain and the target domain, and although some calligraphy chinese characters are identified as erroneous, they can still provide some important reference information for the relevant calligraphy chinese characters, which all provide important technical support for generating the content of the calligraphy characters.

The method provided by the invention is applied to a font generation experiment, and the comparison results of the generated fonts are compared with other existing font generation technologies, as shown in figure 3, the generation results of different generation methods are arranged from top to bottom, the Chinese characters used are divided into three groups from left to right in turn, each group of four different Chinese characters are respectively a handwriting font of Liu Gongquan generated by regular script from left to right in turn, a handwriting font of Yan Zhenqing generated by regular script and a handwriting font of Ouyang Xiu generated by regular script from left to right; the marked circles in the figure represent defect errors when fonts are generated, chinese characters selected by the boxes in the figure represent inaccurate shapes of the generated fonts, and a mode collapse phenomenon occurs. The penultimate line adopts the method and model (English abbreviated SCI-Font) of the invention. The graph shows that the method has a good effect of generating the calligraphy characters. Fig. 4 is a diagram of a regular script font and a right-hand and Zhu Suiliang-hand calligraphy font, and it can be seen that the strokes and styles of the same Chinese character in different calligraphy fonts are greatly changed, and there is a simplified and complex variation, which shows that the strokes and styles of the calligraphy fonts are complex. Fig. 5 is an effect diagram of converting four different Chinese characters of the regular script font into the calligraphic fonts of the eight mountain people, huang Tingjian, kansui and Hongshan, respectively, wherein in the four fonts, the first group of 'mourn', the second group of 'in order', the third group of 'Shu' and the fourth group of 'in order' generate Chinese characters different from the input Chinese characters, and the calligraphic font style meets the requirements, so that the phenomenon of inaccurate pairing exists in the method.

While the invention has been described above with reference to the accompanying drawings, it will be apparent that the invention is not limited to the above embodiments, but is capable of being modified or applied to other applications without modification, as long as various insubstantial modifications of the inventive concept and technical solutions are adopted, all within the scope of the invention.

Claims

1. A handwriting word generation method based on a skeleton and a contour comprises the following steps:

step one, establishing a model; the model takes a CycleGAN model as a backbone network, and the CycleGAN model comprises two groups of generating countermeasure networks;

training the model; the method comprises the steps of inputting a Chinese character font style of a model into a source domain style, acquiring a Chinese character image of the source domain style, taking the Chinese character image of the source domain style as a training sample, taking the font style of a handwriting font image to be generated as a target style, acquiring a handwriting font image of the target style, namely a target domain image, and acquiring the target domain image to form a handwriting data set; the method comprises the steps that when a source domain image is used as an original image input model in training, the original image is converted into a target style image through a first group of generating countermeasure network, the target style image output by the first group of generating countermeasure network is converted into a reconstructed image through a second group of generating countermeasure network, the font style of the target style image is consistent with the target style, the font style of the reconstructed image is consistent with the source domain style, the model is optimized through calculating the loss of the whole model in the training process, and the optimization target is to minimize the loss of the whole model;

step three, obtaining an optimized model for automatically generating handwriting fonts;

the method is characterized in that: the two groups of generated countermeasure networks comprise a contour extraction module Con, a skeleton extraction module Ske and a skeleton-contour fusion module SCF, and the model also comprises an inaccurate pairing data module IPaD;

in the second step, the two groups of generating countermeasure networks respectively extract skeleton information and outline information through an outline extraction module Con and a skeleton extraction module Ske, fuse the skeleton information and the outline information through a skeleton-outline fusion module SCF, splice the skeleton information and the outline information with the image of the input generator in the generator, and process the image by the corresponding generator to generate an image;

the imprecise pairing data module IPaD automatically identifies characters in the handwriting data set and records the characters as identification tags, imprecise pairing is carried out in the handwriting data set according to the target style image, and the use of the wrong identification tags on the related handwriting data set is allowed during pairing, so that imprecise pairing data is obtained;

the loss of the whole model comprises the first generation resistance lossL _advy Resistance loss of the second generationL _advx Cycle consistency lossL _cyc Loss of skeleton consistencyL _ske Loss of contour consistencyL _con And inaccurate pairing lossL _inex ；

In the first step, the first group of generating the countermeasure network includes a generator one constructedG _y And discriminator oneD _y The second group of generating countermeasure networks includes a second generator constructedG _x And discriminator IID _x The method comprises the steps of carrying out a first treatment on the surface of the Generator IG _y A discriminator for converting an original image into a target style imageD _y The method is used for judging whether the font style between the generated target style image and the target domain image is consistent; the second set of generated countermeasure networks uses the reverse process to reconstruct the results of the first set of generated countermeasure network outputs, i.e., via the second generatorG _x Converting the target style image into a reconstructed image of the source domain style, and a discriminator IID _x Judging whether the font style between the generated reconstructed image and the source domain image is consistent;

in the second step, in the first group generation countermeasure network, the source domain imagexThe original image as input is processed by a skeleton extraction module Ske and a contour extraction module Con respectively, and skeleton information is correspondingly extractedsxAnd profile informationcxSkeleton informationsxAnd profile informationcxThe two are fused by a framework-profile fusion module SCF; original imagexInput generator oneG _y GeneratingDevice IG _y During the processing, the original image is processedxSkeleton characteristics obtained by the skeleton-outline fusion module SCFE _asx And profile featuresE _bcx Splicing at the channel level, and generating a target style image after processingAcquisition of target domain imagesyComposing target domain data setsYTarget style image->And a target domain datasetYTarget domain image in (a)yRespectively input into discriminator ID _y Judging the two through a discriminator ID _y Whether the returned results are consistent, thereby evaluating the target style image +.>Is the authenticity of (a).

2. The method for generating calligraphy characters based on skeleton and outline according to claim 1, wherein: after inputting skeleton information and contour information given for a Chinese character to skeleton-contour fusion module SCF, skeleton-contour fusion module SCF inputs them into corresponding skeleton encoder and contour encoder to generate corresponding skeleton featureE _sx And profile featuresE _cx The method comprises the steps of carrying out a first treatment on the surface of the The encoded skeleton feature is then usedE _sx And profile featuresE _cx Adding to obtain featuresE _scx And using SoftMax function to obtain normalized featuresc ^Z The method comprises the steps of carrying out a first treatment on the surface of the Based on normalized featuresc ^Z Computing corresponding skeletal features using an attention weight formulaE _sx Weights of (2)a _c And profile featuresE _cx Weights of (2)b _c The method comprises the steps of carrying out a first treatment on the surface of the Finally, the calculated weight is calculateda _c Andb _c multiplying by corresponding skeleton bitsSign of signE _sx And profile featuresE _cx Obtaining the skeleton characteristics of the fusion weightsE _asx And fusing the profile features of the weightsE _bcx The computational formula is described as follows:

,/>

3. The method for generating calligraphy characters based on skeleton and outline according to claim 2, wherein: in a second set of generated countermeasure networks, the target style imageThen the corresponding skeleton information is extracted by the skeleton extraction module Ske and the outline extraction module Con>And profile information->Skeleton information->And profile information->The two are fused by a framework-profile fusion module SCF; target style image->Input generator IIG _x Generator twoG _x During processing, the target style image is processedCorresponding skeleton features and corresponding outline features obtained through fusion with a skeleton-outline fusion module SCF are spliced in a channel level, and reconstructed images consistent with the source domain style are reconstructed to generate +.>The method comprises the steps of carrying out a first treatment on the surface of the Acquiring source domain imagesxComposing source domain datasetsXReconstructing an image +.>And source domain data setXIn a source domain image of a video cameraxInput discriminator IID _x After judging that they pass through a discriminator IID _x Whether the returned results are consistent, thereby evaluating the target reconstructed image +.>Is the authenticity of (a).

4. A method of generating a handwriting based on skeleton and outline as claimed in claim 3, wherein: in the second step, the first group of the CycleGAN models is used for generating the countermeasure network by the identifier ID _y Calculating the difference in font style between the target style image and the target domain image, i.e. the first generation of contrast lossL _advy For optimizing generator oneG _y The method comprises the steps of carrying out a first treatment on the surface of the The second set of inputs to the generation of the countermeasure network is based on the first set of generators in the generation of the countermeasure networkG _y The second group generates a second discriminator in the antagonism networkD _x Calculating differences in font style between source domain image and reconstructed image, i.e. second generation contrast lossL _advx For optimizing generator twoG _x ；

5. The method for generating calligraphy characters based on skeleton and outline according to claim 4, wherein: second generation resistance lossL _advx Resistance loss of the first generationL _advy Cycle consistency lossL _cyc Loss of skeleton consistencyL _ske Loss of contour consistencyL _con Imprecise pairing lossL _inex The formula of (c) is as follows:

wherein E is _{x X~} [ ]Representing a dataset in a given source domainXIn a source domain image of a video cameraxLower pair of distribution []The expected value of the data within the data,representing +.>Reconstructed image +.>Lower pair of distribution []The expected value of the data within the data,logD _x (x) Representing discriminator twoD _x Image source domainxThe probability of being identified as a source domain image,log(1-log D _x (/>) Representing a discriminator twoD _x Will reconstruct the image +.>A probability of being identified as not being a source domain image; e (E) _{y Y~} [ ]Representing data sets in a given target domainYTarget domain image in (a)yLower pair of distribution []Expected value of the data inside->Representing the image set +.>Target style image->Lower pair of distribution []The expected value of the data within the data,logD _y (y) Representing discriminator oneD _y Image of target domainyThe probability of being identified as a target domain image,log(1-logD _y (/>) Representing a discriminator oneD _y The target style image->A probability of being identified as not being a target domain image; />Representing a dataset in a given source domainXIn a source domain image of a video cameraxGiven a set of reconstructed images +.>Reconstructed image +.>Lower pair of distribution (1) ₁ The expected value of the norm of the data inside,Ske(x) AndSke(/>) Respectively representing the source domain images by the skeleton extraction module SkexAnd reconstructing an image +.>The result of the processing is that,Con(x) AndCon(/>) Respectively representing the source domain images by the contour extraction module ConxAnd reconstructing an image +.>Processing the obtained result; />Representing reconstructed image +.>Is a set of (a) and (b),Y _inex representing imprecise pairing datay _inex Set of->Representing a target style image corresponding to imprecise paired data,/->Representing a target style image corresponding to imprecise paired data +.>Is a set of (a) and (b),expressed in a given setY _inex Imprecise pairing data in (a)y _inex Given collection +.>In the target style image corresponding to inaccurate pairing data +.>Lower pair of distribution (1) ₁ Expected value of norm of the data inside.

6. The method for generating calligraphy characters based on skeleton and outline according to claim 5, wherein:

model loss of the entire modelThe formula is as follows: