CN115240201B

CN115240201B - Chinese character generation method for alleviating network mode collapse problem by using Chinese character skeleton information

Info

Publication number: CN115240201B
Application number: CN202211146858.XA
Authority: CN
Inventors: 曾锦山; 周杰; 徐瑞英; 程诺; 黄箐
Original assignee: Jiangxi Normal University
Current assignee: Jiangxi Normal University
Priority date: 2022-09-21
Filing date: 2022-09-21
Publication date: 2022-12-23
Anticipated expiration: 2042-09-21
Also published as: CN115240201A

Abstract

The invention discloses a Chinese character generation method for alleviating the problem of network mode collapse by utilizing Chinese character skeleton information, which comprises the following steps: extracting a corresponding skeleton image from a source domain image, splicing the source domain image and the corresponding skeleton image together, inputting the spliced source domain image and the corresponding skeleton image into a generator to generate an image with a target style, and putting the image into a discriminator to discriminate whether the image is true or false; extracting a corresponding skeleton image from the image with the target style, splicing the extracted skeleton image with the target style, inputting a splicing result into another generator to generate an image with a source domain style, and putting the image with the source domain style into another discriminator for discrimination; and thirdly, extracting a skeleton image from the image reconstructed and generated by the generator, and calculating pixel-level loss of the extracted skeleton image and the skeleton image of the source domain style extracted in the first step, wherein the pixel-level loss is used as a part of network gradient return and is used for optimizing the model in training.

Description

Chinese character generation method for alleviating network mode collapse problem by utilizing Chinese character skeleton information

Technical Field

The invention belongs to the technical field of computer vision, and particularly relates to a Chinese character generation method for alleviating the problem of network mode collapse by utilizing Chinese character skeleton information.

Background

The Chinese character generation is a very difficult task, the Chinese character font is very complex, the number of the common Chinese characters is large, and the generation time of a character library is long. In the early days, the relevant technicians first extracted some dominant features of Chinese characters, such as strokes, components, etc., and then generated new Chinese characters using some traditional machine learning methods. The good early-stage feature extraction has great influence on the effect of the method, and the early-stage feature extraction is usually made by hands, so that the method is very time-consuming and labor-consuming.

Some recent methods enhance the effectiveness of the network by introducing paired data sets, but in real life, the paired data sets are difficult to obtain, and particularly, the data sets are limited in ancient writing repair and handwriting generation. Moreover, the paired data sets are manually partitioned by manually partitioning a given data set, which requires a very large amount of manpower and material resources. In order to solve the problem of difficult acquisition of paired data in the process of generating Chinese characters, some prior arts also explore to some extent in this respect, but these methods heavily rely on additional training steps or add some other additional labels. For the neural network, the extra training steps increase the training cost, and the extra labels are manually made, so that much energy is consumed. And the unpaired model currently used has a common problem, namely, mode collapse.

Some methods begin to focus on the problem of pattern collapse and attempt to alleviate the problem of pattern collapse in the process of generating chinese characters from several perspectives, such as AAAI meeting records entitled "reducing pattern collapse in chinese font generation by stroke coding" propose the use of a form of adding one-hot stroke coding, but this method can only determine whether a certain stroke exists in the font for the extracted stroke information, and does not consider the relationship between the stroke and the whole chinese character. For example, generating the two words 'already' and 'already' is indistinguishable in this approach because the strokes of the two words are identical. Yet another example is the word 'king' which is also indistinguishable in this way because the underlying strokes that make up them are identical. There are also techniques, such as the one proposed in the thesis "self-supervised Chinese font generation based on block transformation", which consider dividing a Chinese character into four parts and let the network learn the spatial structure information between the four parts. However, the spatial structure information learned by this method is very shallow and no constraint is imposed on the stroke detail part.

Disclosure of Invention

The invention aims to provide a Chinese character generation method for relieving the problem of network mode collapse by utilizing Chinese character skeleton information, which is used for solving the technical problem of mode collapse in the network generation process in the prior art and simultaneously ensuring the rapidness and low cost of the Chinese character generation method.

The Chinese character generation method for alleviating the network mode collapse problem by utilizing the Chinese character skeleton information comprises the following steps:

step one, from a source domain image

Extracting corresponding source domain skeleton image

Source domain image

And corresponding source domain skeleton images

Spliced together input generator

To generate a target-style image

And combining the target-style images

Put-in discriminator

Middle discrimination target style image

True or false;

step two, from the target style image

Extracting corresponding target style skeleton image

And extracting the target style skeleton image

And target style images

Splicing, the splicing result is input into another generator

Generating source domain style images

And combining the source domain style images

Put into another discriminator

Judging;

step three, the slave generator

Reconstructing a generated source domain style image

Extracting source domain style skeleton image

For the extracted source domain style skeleton image

And the source domain skeleton image extracted in the step one

Pixel level penalties are calculated as part of the network gradient backprojection and used to optimize the model during training.

Preferably, in the first step, the source domain image

For RGB three-channel image, each skeleton image is single-channel gray image, and the concrete splicing operation is to make source domain image

The RGB three channels are used for splicing and extracting the source domain skeleton image

The gray single channel is finally combined into four channels of information and put into a generator of the network

Generating RGB three-channel target style image

。

Preferably, in the second step, the target style image is displayed

The RGB three channels are used for splicing and lifting the extracted target style skeleton image

The gray scale single channel is finally combined into a four-channel information to be put into a generator in the network

Generating RGB three-channel source domain style image

。

Preferably, in the third step, an error value between the skeleton of the source domain image and the skeleton of the source domain style image generated by reconstruction is calculated and used as a part of network gradient return for optimizing the model in training, the error value of the network gradient return is the pixel level loss, and the pixel level loss after optimization is smaller than a set loss threshold value, which indicates that the source domain style image generated by reconstruction is

The source domain image in the skeleton layer and the step one

Similarly.

Preferably, the cycle generation network used by the method comprises a framework extraction and integration module, a font reconstruction and generation module, two generators, two discriminators and a framework loss calculation module.

Preferably, the skeleton extraction and integration module is configured to extract an input source domain image into a source domain skeleton image, splice the extracted source domain skeleton image and the source domain image in a channel dimension, combine the source domain skeleton image and the source domain image into a four-channel information, and place the four-channel information into a generator in a network

A target style image is generated.

Preferably, the reconstruction generation font module is used for generating the generator

Extracting a corresponding target style skeleton image from the generated target style image, splicing the generated target style image and the target style skeleton image, and then transmitting the generated four-channel information into a generator

To generate a source domain style image.

Preferably, the two generators are respectively a generator for generating a source domain style image

And a generator for generating the target style image

The input of the two generators is a four-channel image generated by splicing, the four-channel image passes through a series of convolution layers, and the output of the two generators is a three-channel image.

Preferably, the two discriminator modules are used for judging whether the input image is a real image or a false image generated by a network, and the discriminator and the generator have a contradictory relation so as to mutually optimize the capacities of the two parties.

Preferably, the model is optimized during training by calculating an error value between the skeleton of the source domain image and the skeleton of the reconstructed source domain style image as part of a network gradient back pass.

The invention has the following advantages: 1. the invention can relieve the problem of mode collapse in the network generation process by utilizing the spatial structure information of the framework, and the framework information can provide more comprehensive overall information compared with the stroke information and the segmented local spatial information and can also restrict the generation effect of the network on the stroke details. 2. The invention uses the CycleGAN network, and solves the problem of pairing the data sets by using the idea of cyclic generation. 3. The invention extracts the skeleton information by using an automatic skeleton extraction algorithm without manually extracting features, thereby solving the problem of manually extracting the features. 4. The invention is convenient for extracting the skeleton information, can easily generate a set of Chinese character fonts and solves the problem of overhigh Chinese character generation cost. 5. The method can be easily expanded to other network models, and has strong universality.

Drawings

FIG. 1 is a flow chart of a Chinese character generation method for alleviating the network mode collapse problem based on Chinese character skeleton information according to the present invention.

FIG. 2 is a diagram of a skeleton extraction integration module according to the present invention.

FIG. 3 is a schematic diagram of a module for generating a font by reconstruction according to the present invention.

FIG. 4 is a diagram of a module for calculating skeletal loss according to the present invention.

FIG. 5 is an effect diagram of font generation by each model.

FIG. 6 is a diagram of font generation effects with and without the application of the method of the present invention to the Attention GAN.

FIG. 7 is a diagram of font generation effects of the application of the method of the present invention to FUNIT and the non-application of the method of the present invention.

FIG. 8 is a diagram of font generation effects for SQ-GAN with and without the method of the present invention applied.

Fig. 9 is a diagram showing the font generation effect of the method of the present invention applied to the StrokeGAN and the method of the present invention not applied thereto.

Fig. 10 is a diagram showing font generation effects of the method of the present invention applied to UGATIT and the method of the present invention not applied thereto.

Attention GAN, FUNIT, SQ-GAN, stroke GAN and UGATIT in the attached drawings are English abbreviation of corresponding models.

Detailed Description

The following detailed description of the present invention will be given in conjunction with the accompanying drawings, for a more complete and accurate understanding of the inventive concept and technical solutions of the present invention by those skilled in the art.

The first embodiment is as follows:

as shown in FIGS. 1-4, the present invention provides a method for generating Chinese characters by using Chinese character skeleton information to alleviate the problem of network mode collapse, comprising the following steps.

Step one, from a source domain image

Extracting corresponding source domain skeleton image

Source domain image

And corresponding source domain skeleton images

Spliced together input generator

To generate a target-style image

And combining the object style images

Put-in discriminator

Middle discrimination target style image

True and false.

Source domain image

The method is characterized in that the method is an RGB three-channel image, each skeleton image is a single-channel gray image, and the specific splicing operation is to use a source domain image

Generating a target-style image

An RGB three-channel image.

Step two,From the target-style images, according to the idea of cyclically generating a network

Extracting corresponding target style skeleton image

And extracting the target style skeleton image

And target style images

Splicing, the splicing result is input into another generator

Generating source domain style images

And combining the source domain style images

Put into another discriminator

And (6) judging.

The splicing operation is similar to the first step, and the target style image is spliced

Generating a source-domain style image

Also an RGB three-channel image.

Step three, the slave generator

Reconstructing a generated source domain style image

Extracting source domain style skeleton image

For the extracted source domain style skeleton image

And the source domain skeleton image extracted in the step one

Computing pixel-level penalties to ensure optimized reconstruction of the generated source-domain-style image

The source domain image in the skeleton layer and the step one

Similarly. After optimization, the pixel-level loss is less than a set loss threshold value, namely, the source domain style image generated by reconstruction is indicated

The source domain image in the skeleton layer and the step one

Similarly. This step is used as part of the network gradient return by calculating the error value between the skeleton of the source domain image and the skeleton of the source domain style image generated by the reconstruction and is used to optimize the model during training.

The network adopts a cycle generation network, and aims to realize self-supervision through the cycle generation network without using a pairing data set. Through the process of using X- > Y- > X, pseudo-pairing of the network is achieved, and therefore training can be conducted on a large number of unpaired data sets. In real life, most of the acquired target font data is not matched with the converted font, such as generation of ancient writing repair and handwriting. The existing data set method needing pairing cannot directly utilize the data sets, a large amount of data preprocessing operation is needed, and the method realizes that a model is trained by using unpaired data sets by applying a cyclic generation idea.

The circular generating network used by the method comprises a framework extracting and integrating module, a character style reconstructing and generating module, two generators, two discriminators and a framework loss calculating module. The functions and specific implementation of each module are as follows.

And the framework extraction and integration module is used for extracting the input source domain image into a source domain framework image and splicing the extracted source domain framework image and the source domain image in a channel dimension. The skeleton image is a gray single-channel image, and the splicing operation is to splice three RGB channels of the source domain image and the single channel of the extracted source domain skeleton image, and finally combine the three channels into a four-channel information to be put into a generator in the network

A target style image is generated.

A reconstruction generating font module for generating the generator

And extracting a corresponding target style skeleton image from the generated target style image. Splicing the generated target style image and the target style skeleton image by adopting the same splicing method as the previous skeleton extraction and integration module, splicing a single channel of the target style skeleton image on an RGB channel of the generated target style image to form four-channel information, and then transmitting the four-channel information into a generator

To generate a source domain style image.

The two generators are respectively generators for generating source domain style images

And a generator for generating the target style image

For the two generators, the network is all a four-channel image, namely, the four-channel information obtained by the skeleton extraction module is transmitted to the network, and the network passes through a series of convolution layers to obtain a three-channel image.

Two discriminator modules for judging whether the input image is a real image or a false image generated by a network, and respectively being discriminators for discriminating the true and false of the source domain style image

And discriminator for discriminating true and false of target style image

。

The discriminator and the generator have contradictory relations, the capabilities of the discriminator and the generator can be mutually optimized, the generator hopes that the generated image can cheat the discriminator, and the discriminator hopes that the incoming image can be correctly judged to be true or false.

And the skeleton loss calculating module is used for calculating an error value between the skeleton of the source domain image and the skeleton of the source domain style image generated by reconstruction as a part of network gradient feedback and optimizing the model in training. The method adopts a cycle generation network and uses a data set with non-pairing relation for training, so that no one-to-one corresponding image pair exists in the data set. When the network is trained, gradient backhaul is required, so that the network is required to calculate a loss value to provide the network with the gradient backhaul.

FIG. 5 shows the effect of generating fonts for each type of model. The following is the effect of the method on other models (SK, standing for the use of our method).

FIG. 6 shows the application of our method to the Attention GAN, with its input expanded to four-channel information, the rest of the network remaining unchanged. It can be found that our method has a large boost in the Attention GAN application.

FIG. 7 shows the method of applying FUNIT, which decomposes the content and style, expands the skeleton information to the content module, and finds that the generation effect of the model can be improved well.

Figure 8 shows the application of our method to SQ-GAN with its input expanded into four-channel information, with the rest of the network remaining unchanged. Our method can be found to have a large improvement on SQ-GAN.

Fig. 9 shows the application of our method to the StrokeGAN, with its inputs augmented to four-channel information, the rest of the network remaining unchanged. Our method can be found to have a large boost on StrokeGAN.

Fig. 10 shows the application of our method to ugatt, with its input expanded into four-channel information, the rest of the network remaining unchanged. Our method can be found to have a large improvement in UGATI applications.

The method is provided with a skeleton loss calculating module to calculate the pixel level loss between skeleton images, which not only can optimize the generator adopted by the method, but also can relieve the problem of mode collapse in the network generating process by utilizing the spatial structure information of the skeleton, and the skeleton information can provide more comprehensive overall information compared with stroke information and segmented local spatial information, and can also restrict the generating effect of the network on the stroke details.

The invention is described above with reference to the accompanying drawings, it is obvious that the specific implementation of the invention is not limited by the above-mentioned manner, and it is within the scope of the invention to adopt various insubstantial modifications of the inventive concept and solution of the invention, or to apply the inventive concept and solution directly to other applications without modification.

Claims

1. A Chinese character generation method for alleviating the problem of network mode collapse by utilizing Chinese character skeleton information is characterized by comprising the following steps of: comprises the following steps:

step one, from a source domain image

Extracting corresponding source domain skeleton image

Source domain image

And corresponding source domain skeleton images

Spliced together input generator

To generate a target-style image

And combining the target-style images

Put-in discriminator

Middle discrimination target style image

True or false;

in the first step, the source domain image

Generating RGB three-channel target style image

；

Step two, from the target style image

Extracting corresponding target style skeleton image

And extracting the target style skeleton image

And target style images

Splicing, the splicing result is input into another generator

Generating source domain style images

And combining the source domain style images

Put into another discriminator

Judging;

in the second step, the target style image is taken

Generating RGB three-channel source domain style image

；

Step three, the slave generator

Reconstructing a generated source domain style image

Extracting source domain style skeleton image

For the extracted source domain style skeleton image

And the source domain skeleton image extracted in the step one

Pixel level penalties are calculated as part of the network gradient pass-back and used to optimize the model during training.

2. The method for generating Chinese characters using Chinese character skeleton information to alleviate network mode collapse problems of claim 1, wherein: in the third step, an error value between the skeleton of the source domain image and the skeleton of the source domain style image generated by reconstruction is calculated and used as a part of network gradient return for optimizing the model in training, wherein the error value of the network gradient return is the pixel level loss, and the pixel level loss after optimization is less than a set loss threshold value, namely the pixel level loss indicates that the source domain style image generated by reconstruction is

The source domain image in the skeleton layer and the step one

Similarly.

3. The method for generating Chinese characters utilizing Chinese character skeleton information to alleviate network mode collapse problems of any of claims 1-2, wherein: the cycle generation network used by the method comprises a framework extraction and integration module, a font reconstruction and generation module, two generators, two discriminators and a framework loss calculation module.

4. The method for generating Chinese characters according to claim 3, wherein said method comprises: a skeleton extraction integration module for extracting the input source domain image into a source domain skeleton image, splicing the extracted source domain skeleton image and the source domain image in channel dimension, combining into a generator for inputting four-channel information into the network

Generating targetsAnd (5) style images.

5. The method of claim 3, wherein the method further comprises the following steps: a reconstruction generating font module for generating a generator

To generate a source domain style image.

6. The method of claim 3, wherein the method further comprises the following steps: the two generators are respectively generators for generating source domain style images

And a generator for generating the target style image

7. The method of claim 6, wherein the method further comprises the following steps: the two discriminator modules are used for judging whether the input image is a real image or a false image generated by a network, and the discriminator and the generator have a contradictory relation and mutually optimize the capacities of the two parties.

8. The method of claim 3, wherein the method further comprises the following steps: and the skeleton loss calculating module is used for calculating an error value between the skeleton of the source domain image and the skeleton of the source domain style image generated by reconstruction as a part of network gradient feedback and optimizing the model in training.