CN114255159A

CN114255159A - Handwritten text image generation method and device, electronic equipment and storage medium

Info

Publication number: CN114255159A
Application number: CN202111571764.2A
Authority: CN
Inventors: 赵坤; 杨争艳; 吴嘉嘉
Original assignee: iFlytek Co Ltd
Current assignee: University of Science and Technology of China USTC; iFlytek Co Ltd
Priority date: 2021-12-21
Filing date: 2021-12-21
Publication date: 2022-03-29

Abstract

The invention provides a method and a device for generating a handwritten text image, electronic equipment and a storage medium, wherein the method comprises the following steps: acquiring a content image and a reference handwritten text image; and carrying out style migration on the content image based on a reference writing style contained in the reference handwritten text image to obtain a target handwritten text image, wherein the target handwritten text image has the same text content as the content image. According to the handwritten text image generation method, the handwritten text image generation device, the electronic equipment and the storage medium, reference writing style information of a reference handwritten text image is effectively decoupled, and handwritten text images of different writers can be generated; in addition, the method can generate the handwritten text image comprising a plurality of lines of texts, and compared with the mode that only a single handwritten character can be generated and then spliced to form a word or a sentence, the method effectively improves the generation efficiency and quality of the handwritten text image.

Description

Handwritten text image generation method and device, electronic equipment and storage medium

Technical Field

The present invention relates to the field of image processing technologies, and in particular, to a method and an apparatus for generating a handwritten text image, an electronic device, and a storage medium.

Background

With the development of image generation technology, the generation of handwritten text images is receiving more and more extensive attention. The handwritten text image generation method comprises two major categories of online generation and offline generation. Compared with an online generation method, the offline generation method is simple in data acquisition and can quickly acquire a large amount of data for handwriting generation.

The existing method for generating the offline handwritten text image mainly aims at the handwritten data synthesis of a single character, and learns the handwriting style of a user by using a generation model through a single character image handwritten by the given user to generate the handwritten image.

This method can only generate a single handwritten word and cannot generate text line images at the word level or sentence level. In addition, a single model can only learn the handwriting style of a single user, and is difficult to be applied to the generation of handwritten texts of a plurality of users in a large scale, so that the generalization performance is poor.

Disclosure of Invention

The invention provides a method and a device for generating a handwritten text image, electronic equipment and a storage medium, which are used for solving the defects that only a single handwritten character can be generated, a text line image at a word level or a sentence level cannot be generated, and the generalization performance is poor in the prior art.

The invention provides a method for generating a handwritten text image, which comprises the following steps:

acquiring a content image and a reference handwritten text image;

and carrying out style migration on the content image based on a reference writing style contained in the reference handwritten text image to obtain a target handwritten text image, wherein the target handwritten text image has the same text content as the content image.

According to a method for generating a handwritten text image provided by the present invention, the performing style migration on the content image based on a reference writing style included in the reference handwritten text image to obtain a target handwritten text image includes:

extracting content features of the content image to obtain content features;

performing writing style characteristic extraction on the reference handwritten text image to obtain writing style characteristics, wherein the writing style characteristics are used for representing the reference writing style;

and generating the target handwritten text image based on the content features and the writing style features.

According to a method for generating a handwritten text image provided by the present invention, generating the target handwritten text image based on the content feature and the writing style feature includes:

performing feature migration on the content features based on the writing style features to obtain migration features;

and generating the target handwritten text image based on the migration characteristics.

According to the method for generating a handwritten text image, the method for performing feature migration on the content features based on the writing style features to obtain migration features comprises the following steps:

and performing feature migration on the content features based on a plurality of layers of residual error networks to obtain migration features, wherein the normalization parameters of each layer of residual error network in the plurality of layers of residual error networks are determined based on the writing style features.

based on a handwritten text image generation model, applying a reference writing style contained in the reference handwritten text image, and performing style migration on the content image to obtain a target handwritten text image;

the handwritten text image generation model is obtained by training based on a sample content image, a sample handwritten text image and a label handwritten text image.

According to the method for generating the handwritten text image, the handwritten text image generation model is obtained by training based on the following steps:

based on an initial model, applying a sample writing style contained in the sample handwritten text image, and carrying out style migration on the sample content image to obtain a predicted handwritten text image;

and training the initial model to obtain the handwritten text image generation model based on at least one of the difference of the predicted handwritten text image and the sample content image in the text content, the difference of the predicted handwritten text image and the label handwritten text image in the image characteristics and the difference of the predicted handwritten text image and the sample handwritten text image in the writing style.

According to a method for generating a handwritten text image provided by the present invention, training the initial model based on a difference between the predicted handwritten text image and a label handwritten text image in image features includes:

respectively judging the predicted handwritten text image and the label handwritten text image based on a judging model to obtain a judging result of the predicted handwritten text image and the label handwritten text image;

and performing countermeasure training on the initial model and the discriminant model based on the discriminant result.

The present invention also provides a handwritten text image generating apparatus, including:

an image acquisition unit configured to acquire a content image and a reference handwritten text image;

and the style migration unit is used for carrying out style migration on the content image based on a reference writing style contained in the reference handwritten text image to obtain a target handwritten text image, and the text content of the target handwritten text image is the same as that of the content image.

The present invention also provides an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the steps of the handwritten text image generation method as described in any of the above when executing the program.

The present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the handwritten text image generation method as described in any of the above.

The invention also provides a computer program product comprising a computer program which, when executed by a processor, carries out the steps of the method for generating a handwritten text image as described in any of the above.

According to the handwritten text image generation method, the handwritten text image generation device, the electronic equipment and the storage medium, the target handwritten text image obtained through style migration is the same as the text content of the content image, and is the same as or similar to the writing style of the reference handwritten text image. The method effectively decouples the reference writing style information of the reference handwritten text image, and can generate the handwritten text images of different writers; in addition, the method can generate the handwritten text image comprising a plurality of lines of texts, and compared with the mode that only a single handwritten character can be generated and then spliced to form a word or a sentence, the method effectively improves the generation efficiency and quality of the handwritten text image.

Drawings

In order to more clearly illustrate the technical solutions of the present invention or the prior art, the drawings needed for the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.

FIG. 1 is a flow chart of a handwritten text image generation method provided by the present invention;

FIG. 2 is a flowchart illustrating step 120 of the handwritten text image generation method provided in the present invention;

FIG. 3 is a flowchart illustrating a method for generating a target handwritten text image according to the present invention;

FIG. 4 is a schematic diagram of training data for a handwritten text image generation model provided by the present invention;

FIG. 5 is a flow chart of a training method for generating a model of a handwritten text image according to the present invention;

FIG. 6 is a schematic flow chart of the initial model confrontation training provided by the present invention;

FIG. 7 is a schematic structural diagram of a model for generating a handwritten text image according to the present invention;

FIG. 8 is a schematic structural diagram of a handwritten text image generating apparatus provided in the present invention;

fig. 9 is a schematic structural diagram of an electronic device provided by the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention clearer, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is obvious that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

With the development of image generation technology, the generation of handwritten text images is concerned more and more widely, for example, in the field of artistic creation, a user can convert standard characters (such as Song dynasty) into characters with a specific handwriting style, and the requirement of personalized display is met; in the field of information security, a classification model is trained by mixing a generated handwritten text image and a real handwritten text image to distinguish a true handwritten text image from a false handwritten text image, so that the method can be used for identifying a forged signature and improving the security level of the signature; in addition, in the field of multi-language text recognition, a large-scale handwriting synthesis image of a specific language can be synthesized through a small amount of handwriting images and used as training data of text recognition to improve the accuracy of the handwriting text recognition.

The current common handwritten text image synthesis schemes include two categories, namely online handwriting synthesis and offline handwriting synthesis. The online handwriting synthesis is to generate a series of character track coordinate points, imitate the process of handwriting characters by a human, and combine the track points to obtain the generated characters, and the method needs to acquire online track point data to learn the writing style of the handwriting human; and the off-line handwriting synthesis is to directly generate a handwritten text image, and compared with on-line handwriting text image synthesis, the data acquisition is simple, and a large amount of data can be quickly acquired for handwriting generation.

The existing handwritten image generation technology is mainly to generate handwritten images based on single characters, such as 'middle', 'state' in Chinese characters, and the like, input the single character images with standard fonts, and output corresponding handwritten character images through an Encoder-Decoder convolution network.

The prior technical scheme has the following two problems: (1) the characters are highly structured abstract data, the writing style difference of each person is large, the difficulty of learning the writing change rule of the strokes of the characters is high, most of the existing schemes carry out handwriting synthesis on single characters in order to simplify the difficulty of model learning, and then words or sentences are synthesized by splicing and combining single characters, so that the connection between the characters is hard, and the visual effect is poor; (2) the existing scheme does not effectively decouple the writing style information of the handwritten image, only learns the handwriting style of a single person or a limited person, has poor generalization capability, and is difficult to realize that one model generates the handwritten text images of different persons.

In view of the foregoing problems, embodiments of the present invention provide a method for generating a handwritten text image, which can generate a handwritten text image in units of text lines. Fig. 1 is a schematic flow chart of a method for generating a handwritten text image, as shown in fig. 1, the method includes:

step 110, a content image and a reference handwritten text image are obtained.

In particular, the content image may be an image containing text in a standard font (e.g., sons, Times New Roman, etc.). The content image can be given text content and is obtained by standard font rendering; or the standard font text may be acquired by an image acquisition device such as a scanner, a mobile phone, or a camera, which is not specifically limited in this embodiment of the present invention.

The standard font text here may be a single character, or may be a text line containing a plurality of characters, for example, a word or a sentence, etc.

The reference handwritten text image may be an image containing a handwritten text, and specifically may be acquired by an image acquisition device such as a scanner, a mobile phone, or a camera for the handwritten text.

The reference handwritten text image may be a handwritten text image for any writer, for example, a handwritten text image of zhang, or a handwritten text image of lie, and the reference handwritten text image may refer to text content included in the handwritten text image, and may have no association with text content included in the content image.

And 120, performing style migration on the content image based on the reference writing style contained in the reference handwritten text image to obtain a target handwritten text image, wherein the target handwritten text image has the same text content as the content image.

Specifically, the reference writing style included in the reference handwritten text image is the writing style corresponding to the handwritten text in the image. Since the handwritten text in a reference handwritten text image is the handwritten text of a writer and each person has a specific writing style, the reference writing style contained in the reference handwritten text image may be the writing style of the writer corresponding to the reference handwritten text image.

The determination of the reference writing style may be by using a style extraction model, for example, using a CNN convolutional neural network model to extract the reference writing style contained in the reference handwritten text image.

And performing style migration on the content image according to the reference writing style to obtain a target handwritten text image. The style migration refers to migrating the reference writing style to the content image, that is, only changing the writing style corresponding to the text content without changing the text content of the content image, for example, converting the writing style from song style to a writing style font of zhang.

During style migration, firstly, feature vector extraction of a content domain and a style domain is respectively carried out, then, feature vector fusion of the content domain and the style domain is realized through style migration learning, and image reconstruction is carried out based on the fused feature vectors, so that a target handwritten text image is obtained; the style of the content image can be directly transferred by utilizing a trained style transfer model; the writing style mapping processing can be carried out on the content image by referring to the handwritten text image, firstly, a migration matrix of the reference handwritten text image relative to the content image is calculated, and then, the target handwritten text image is generated through the migration matrix. Of course, other types of style migration methods are also possible, and this is not specifically limited in this embodiment of the present invention.

The target handwritten text image is the handwritten image that the user desires to obtain. The target handwritten text image obtained through style migration is the same as the text content of the content image, and is the same as or similar to the writing style of the reference handwritten text image.

The text content of the target handwritten text image may also be a single character, or a text line containing a plurality of characters, corresponding to the content image, for example, a word or a sentence, or the like.

It should be noted that the method for generating a handwritten text image provided by the embodiment of the present invention may be applicable to chinese, english, or other languages.

According to the method for generating the handwritten text image, the target handwritten text image obtained through style migration is the same as the text content of the content image, and the writing style of the target handwritten text image is the same as or similar to that of the reference handwritten text image. The method effectively decouples the reference writing style information of the reference handwritten text image, and can generate the handwritten text images of different writers; in addition, the method can generate the handwritten text image comprising a plurality of lines of texts, and compared with the mode that only a single handwritten character can be generated and then spliced to form a word or a sentence, the method effectively improves the generation efficiency and quality of the handwritten text image.

Based on the foregoing embodiment, fig. 2 is a schematic flowchart of step 120 in the method for generating a handwritten text image, and as shown in fig. 2, step 120 specifically includes:

step 121, extracting content characteristics of the content image to obtain content characteristics;

step 122, performing writing style characteristic extraction on the reference handwritten text image to obtain writing style characteristics, wherein the writing style characteristics are used for representing a reference writing style;

and step 123, generating a target handwritten text image based on the content characteristics and the writing style characteristics.

Specifically, style migration is performed on the content image according to a reference writing style to obtain a target handwritten text image, a content feature and a writing style feature can be obtained respectively by a feature extraction mode, and then the target handwritten text image is generated based on the content feature and the writing style feature.

The content features may characterize the standard font text content contained by the content image, and may be, for example, "today is a good day" or "How are you" or the like. Specifically, the image space can be converted into a feature space through a content encoder to extract content features of the content image, so as to obtain a content feature vector. The content encoder may be a deep learning neural network, for example, a full convolution neural network based on Vgg19, or a residual error network Resnet, etc.

The writing style characteristic is used for representing a reference writing style, for example, the writing style of Zhang III can be represented, and specifically, the characteristics of font type, character size, character spacing, complete or illegible character font, stroke characteristics, brush stroke tendency and the like can also be represented.

The image space can be converted into a feature space through a style encoder to extract the writing style features of the reference handwritten text image, and a writing style feature vector is obtained. The style encoder may also be a deep learning neural network, for example, a full convolution neural network based on Vgg19, and after the last layer of convolution, a global average pooling layer is added, outputting a fixed-length (e.g., dimension 512) writing style feature vector. The input data of the style encoder may be N samples sampled randomly from the handwritten text image collected from the same writer to extract writing style feature vectors.

After the content features and the writing style features are obtained, the target handwritten text image can be generated according to the content features and the writing style features. For example, the content features and the writing style features can be fused to obtain fusion features, and a target handwritten text image is generated according to the fusion features; and performing feature migration on the content features according to the writing style features, and generating a target handwritten text image according to the obtained migration features.

According to the method provided by the embodiment of the invention, the content image and the reference handwritten text image are respectively subjected to feature extraction, and then the target handwritten text image is generated according to the obtained content features and writing style features. The writing style information is effectively decoupled, so that the handwritten text images of different writers can be generated.

Based on any of the above embodiments, fig. 3 is a schematic flow chart of a target handwritten text image generation method provided by the present invention, and as shown in fig. 3, step 123 specifically includes:

step 1231, performing feature migration on the content features based on the writing style features to obtain migration features;

step 1232, generating a target handwritten text image based on the migration features.

Specifically, since the target handwritten text image desired by the user includes both the text content represented by the content feature and the writing style represented by the writing style feature, feature migration of the content feature may be considered, so that the obtained migration feature has both the content feature and the style feature, and then image reconstruction is performed according to the migration feature, so as to obtain the target handwritten text image.

Specifically, when feature migration is performed, the writing style distribution in the content features can be changed, so that the writing style features can be combined with the content features, and the migration features can be obtained. For example, the writing style distribution in the content feature can be changed by controlling and changing parameters in the style migration algorithm, thereby realizing the effect of writing style migration. For example, in the feature migration process, parameters for further feature extraction and style migration of the content features may be generated based on the writing style features, and feature extraction and style migration of the content features may be performed based on the parameters, so that the content features after feature extraction and style migration can be fused to the writing style features, and the migration features may be obtained.

Based on any of the above embodiments, step 1231 specifically includes:

and performing feature migration on the content features based on the multilayer residual error networks to obtain migration features, wherein the normalization parameters of each layer of residual error network in the multilayer residual error networks are determined based on the writing style features.

In particular, a multi-layer residual network based on a full-convolution network structure can be adopted to perform feature migration on content features. The first layer output of the residual network is a content feature that is used to provide text content information for generating the target handwritten text image.

In the style migration, before calculating each convolution layer of the residual network, normalization processing may be performed first. Style features are embedded into each layer in the residual network while normalizing. Specifically, the style characteristics may be used to generate the normalization processing parameters.

The Normalization process may be Batch Normalization (BN), adaptive Instance Normalization (AdaIN), or other corresponding Normalization processes.

Taking batch normalization as an example, generating a scale change factor gamma of a BN layer by utilizing writing style characteristic vectors_sAnd a bias factor beta_sThis parameter is learnable and constantly adjustable for different layers of the residual network. The characteristic z of BN can be expressed by the following formula:

wherein, γ_sIs a scale change factor, beta_sFor the bias factor, x is the output of each layer of the residual network, and u and σ are the mean values, respectively, as the standard deviation.

According to the method provided by the embodiment of the invention, the writing style characteristics are embedded in each layer of the residual error network in a normalization processing mode, so that the writing style characteristic migration of the content characteristics is realized.

Based on any of the above embodiments, step 120 specifically includes:

based on the handwritten text image generation model, performing style migration on the content image by applying a reference writing style contained in the reference handwritten text image to obtain a target handwritten text image;

the handwritten text image generation model is obtained by training based on the sample content image, the sample handwritten text image and the label handwritten text image.

Specifically, the target handwritten text image may be generated by performing style migration on the content image by using a reference writing style included in the reference handwritten text image through the handwritten text image generation model.

Before step 120 is performed, the initial model may be trained according to the constructed training data set, so as to obtain a handwritten text image generation model. The training data set may include, among other things, a sample content image as shown in (left) of fig. 4, a sample handwritten text image as shown in (middle) of fig. 4, and a label handwritten text image as shown in (right) of fig. 4. The method comprises the steps that a group of sample content images, sample handwritten text images and label handwritten text images are aimed at, wherein the label handwritten text images and the sample handwritten text images are obtained by handwriting of the same writer or are obtained by writing of the same writing style by different writers, the text content of the label handwritten text images is the same as that of the sample content images, and the text content of the sample handwritten text images is different from that of the sample content images.

The handwritten text image generation model can be obtained by performing supervised training on the initial model by taking the sample content image and the sample handwritten text image as training samples and taking the label handwritten text image as a training label. The handwritten text image generation model can learn the mapping relation between the sample content images and the sample handwritten text images to the label handwritten text images in the training process, so that the handwritten images with the same text content and the same or similar writing style are generated.

Preferably, the image discrimination model may be combined, and the initial model and the discrimination model may be subjected to countermeasure training based on a generation countermeasure network to obtain a handwritten text image generation model.

Based on any of the above embodiments, fig. 5 is a schematic flow chart of a training method for generating a model from a handwritten text image, as shown in fig. 5, the training method includes:

step 510, based on the initial model, applying a sample writing style contained in the sample handwritten text image, and performing style migration on the sample content image to obtain a predicted handwritten text image;

step 520, training the initial model based on at least one of the difference of the predicted handwritten text image and the sample content image in the text content, the difference of the predicted handwritten text image and the label handwritten text image in the image characteristics, and the difference of the predicted handwritten text image and the sample handwritten text image in the writing style, so as to obtain a handwritten text image generation model.

Specifically, based on the initial model, a style migration may be performed on the sample content image by applying a sample writing style included in the sample handwritten text image, so as to obtain a predicted handwritten text image.

Then, in order to generate the handwritten images with the same text content and the same or similar writing style, at least one of the following three types of loss functions can be adopted in the iterative optimization process, so that the loss of the predicted handwritten text image is minimized.

In order to ensure that the text content of the predicted handwritten text image is consistent with the text content in the input content image, whether the content of the predicted handwritten text image is correct or not can be judged by predicting the difference between the text content of the handwritten text image and the text content of the sample content image. If the difference is smaller, the accuracy of the predicted text image content is higher; if the difference is larger, the accuracy of the content of the predicted text image is lower. And through continuous iterative optimization, the text contents of the constraint predicted text image and the content image tend to be consistent.

In one embodiment, a pre-trained OCR model may be employed to measure the difference in text content between the predicted handwritten text image and the sample content image. And sending the predicted handwritten text image into a pre-trained OCR model to obtain the text content of the predicted handwritten text image, and comparing the difference between the text content of the predicted handwritten text image and the text content of the sample content image to obtain the loss of the initial model on the text content so as to judge whether the text content of the predicted text image is accurate.

In order to ensure that the writing style of the predicted handwritten image conforms to the real writing style of a corresponding writer, the similarity between the predicted handwritten text image and the real handwritten image can be judged by measuring the difference between the image characteristics of the predicted handwritten text image and the label handwritten text image. If the difference is smaller, the similarity between the predicted text image and the real handwritten image is larger; if the difference is larger, the representation shows a smaller similarity of the predicted text image to the real handwritten image. Through continuous iterative optimization, the writing styles of the constraint predicted text image and the real handwritten image are more and more similar.

In order to ensure that the writing style of the predicted handwritten text image is consistent with the writing style of the input sample handwritten text image, whether the writing style of the predicted handwritten text image is correct or not can be judged by measuring the difference between the writing styles of the predicted handwritten text image and the sample handwritten text image. If the difference is larger, the accuracy rate of the writing style of the representation predicted handwritten text image is lower; if the difference is smaller, the writing style accuracy indicating that the handwritten text image is predicted is higher.

Specifically, a pre-trained Vgg19 network can be used as a style sensor, and the difference of the predicted handwritten text image and the sample handwritten text image in the writing style can be measured by calculating the distance loss between the predicted image and the sample image in the middle layer features of the style sensor network.

According to the method provided by the embodiment of the invention, the initial model is trained by adopting at least one of the three types of loss functions, and the parameters of the initial model are adjusted by continuous iterative optimization so as to improve the effect and accuracy of generating the handwritten image, thereby obtaining the final handwritten text image generation model.

Based on any of the above embodiments, fig. 6 is a schematic flowchart of performing countermeasure training on an initial model provided by the present invention, and as shown in fig. 6, in step 520, training the initial model based on a difference between a predicted handwritten text image and a labeled handwritten text image in an image feature includes:

step 521, based on the discrimination model, discriminating the predicted handwritten text image and the label handwritten text image respectively to obtain discrimination results of the predicted handwritten text image and the label handwritten text image;

and 522, performing countermeasure training on the initial model and the discriminant model based on the discriminant result.

Specifically, the handwritten text image generation model obtained by applying the traditional supervision training mode is used for image reconstruction, the obtained handwritten text image may have the distortion problem, and in consideration of the point, the embodiment of the invention performs countermeasure training by combining the initial model and the judgment model.

In the confrontational training, the initial model may be regarded as a generator, and the discriminant model may be regarded as a discriminator, and the initial model as the generator may perform style migration on the input sample content image, thereby outputting a predictive handwritten text image.

The discrimination model as a discriminator may discriminate an input image, i.e., whether the input image is a predicted handwritten text image generated by the generator or a true tag handwritten text image. In the process, the generator and the discriminator play games with each other, the generator takes the output of the predicted handwritten text image which is similar to the label handwritten text image as much as possible as a means for realizing the purpose that the discriminator is difficult to distinguish the label handwritten text image from the predicted handwritten text image, and the discriminator takes the output discrimination result consistent with the actual condition of the input image as a means for realizing more accurate and reliable discrimination effect.

According to the method provided by the embodiment of the invention, through the countertraining, the capability of the handwritten text image generation model for carrying out style migration aiming at the content image is ensured, the writing style of the handwritten text image output by the handwritten text image generation model is closer to the real image, and the naturalness and the fidelity of the handwritten text image are ensured.

Based on any of the above embodiments, fig. 7 is a schematic structural diagram of a model for generating a handwritten text image according to an embodiment of the present invention, and as shown in fig. 7, the model includes:

content encoder and genre encoder: the content encoder is a Vgg 19-based full convolution neural network for extracting content features of textual content images, given an input content image C₁Obtaining a content vector E via a content encoder_c(ii) a The style encoder is also a full convolution neural network based on Vgg19, and a global average pooling layer is added after the last convolution layer, so that a fixed-length (dimension 512) feature vector is output. The input data of the style encoder is acquired from the same handwriting person to the handwriting text image, and N samples are randomly sampled to extract a handwriting style vector E_s。

The generator is composed of a plurality of layers of residual error networks, and style features are embedded in a BN layer in each residual error convolution in the generator in a conditional BN mode. Meanwhile, the generator adopts a full convolution network structure, can accept content vectors with any length as input, and realizes the generation of text images with a handwriting style by combining style vectors with the style of a handwritten person.

The discriminator is used for measuring the difference between the distribution of the generated text image and the label image, and the picture generated by the iterative optimization constraint generator is in accordance with the handwriting style of the corresponding target image.

The style perceptron adopts a pre-trained VGG19 network model, and measures the style difference between a generated picture and a target image by calculating the L1 distance loss between the characteristics of the generated image and the target image in the middle layer of the style perceptron network.

And pre-training the OCR model to judge whether the content of the generated text image is correct.

In the model training process, the iterative optimization of the network is carried out by calculating three loss functions of a discriminator, a style perceptron and an OCR model recognition.

In the model application stage, a text line picture rendered by a given standard font and any handwritten picture are used as the input of the model to generate a handwritten style image of given character content information.

The following describes a handwritten text image generation apparatus according to the present invention, and the handwritten text image generation apparatus described below and the handwritten text image generation method described above may be referred to in correspondence with each other.

Based on any of the above embodiments, fig. 8 is a schematic structural diagram of a handwritten text image generation device provided in an embodiment of the present invention, and as shown in fig. 8, the device includes:

an image acquisition unit 810 for acquiring a content image and a reference handwritten text image;

a style migration unit 820, configured to perform style migration on the content image based on a reference writing style included in the reference handwritten text image, so as to obtain a target handwritten text image, where text content of the target handwritten text image is the same as that of the content image.

According to the device provided by the embodiment of the invention, the target handwritten text image obtained through style migration is the same as the text content of the content image and is the same as or similar to the writing style of the reference handwritten text image. The method effectively decouples the reference writing style information of the reference handwritten text image, and can generate the handwritten text images of different writers; in addition, the method can generate the handwritten text image comprising a plurality of lines of texts, and compared with the mode that only a single handwritten character can be generated and then spliced to form a word or a sentence, the method effectively improves the generation efficiency and quality of the handwritten text image.

Based on any of the above embodiments, the style migration unit 820 is further configured to:

extracting content features of the content image to obtain content features;

the model is trained based on the sample content image, the sample handwritten text image and the label handwritten text image.

Based on any of the above embodiments, the handwritten text image generation apparatus provided in the embodiments of the present invention further includes a model training unit, where the model training unit is configured to:

Based on any of the above embodiments, the model training unit is further configured to:

Fig. 9 illustrates a physical structure diagram of an electronic device, and as shown in fig. 9, the electronic device may include: a processor (processor)910, a communication Interface (Communications Interface)920, a memory (memory)930, and a communication bus 940, wherein the processor 910, the communication Interface 920, and the memory 930 communicate with each other via the communication bus 940. Processor 910 may invoke logic instructions in memory 930 to perform a method of handwritten text image generation, the method comprising: acquiring a content image and a reference handwritten text image; and carrying out style migration on the content image based on a reference writing style contained in the reference handwritten text image to obtain a target handwritten text image, wherein the target handwritten text image has the same text content as the content image.

Furthermore, the logic instructions in the memory 930 may be implemented in software functional units and stored in a computer readable storage medium when the logic instructions are sold or used as independent products. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

In another aspect, the present invention also provides a computer program product, the computer program product comprising a computer program, the computer program being storable on a non-transitory computer-readable storage medium, the computer program, when executed by a processor, being capable of executing the handwritten text image generation method provided by the above methods, the method comprising: acquiring a content image and a reference handwritten text image; and carrying out style migration on the content image based on a reference writing style contained in the reference handwritten text image to obtain a target handwritten text image, wherein the target handwritten text image has the same text content as the content image.

In yet another aspect, the present invention also provides a non-transitory computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements a method for generating a handwritten text image provided by the above methods, the method including: acquiring a content image and a reference handwritten text image; and carrying out style migration on the content image based on a reference writing style contained in the reference handwritten text image to obtain a target handwritten text image, wherein the target handwritten text image has the same text content as the content image.

The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.

Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.

Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. A method of generating a handwritten text image, comprising:

acquiring a content image and a reference handwritten text image;

2. The method according to claim 1, wherein performing style migration on the content image based on a reference writing style included in the reference handwritten text image to obtain a target handwritten text image comprises:

extracting content features of the content image to obtain content features;

3. The method of generating a handwritten text image according to claim 2, wherein said generating said target handwritten text image based on said content features and said writing style features comprises:

4. The method of generating a handwritten text image according to claim 3, wherein said performing feature migration on said content features based on said writing style features to obtain migration features includes:

5. The method for generating a handwritten text image according to any of claims 1-4, wherein performing style migration on the content image based on a reference writing style contained in the reference handwritten text image to obtain a target handwritten text image includes:

6. The method of generating a handwritten text image according to claim 5, wherein said model for generating a handwritten text image is trained based on the following steps:

7. The method of generating a handwritten text image according to claim 6, wherein said training of said initial model based on differences in image characteristics between said predicted handwritten text image and said tagged handwritten text image comprises:

8. A handwritten text image generating apparatus, characterized by comprising:

9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the steps of the handwritten text image generation method according to any of claims 1 to 7 are implemented by the processor when executing the program.

10. A non-transitory computer readable storage medium, having stored thereon a computer program, wherein the computer program, when being executed by a processor, is adapted to carry out the steps of the handwritten text image generation method according to any of claims 1 to 7.