CN109285111B

CN109285111B - Font conversion method, device, equipment and computer readable storage medium

Info

Publication number: CN109285111B
Application number: CN201811101699.5A
Authority: CN
Inventors: 刘怡俊; 杨培超; 叶武剑; 翁韶伟; 张子文; 李学易; 王峰
Original assignee: Guangdong University of Technology
Current assignee: Guangdong University of Technology
Priority date: 2018-09-20
Filing date: 2018-09-20
Publication date: 2023-05-09
Anticipated expiration: 2038-09-20
Also published as: CN109285111A

Abstract

The invention discloses a font conversion method, which is characterized in that a deep learning model is obtained by training according to a standard font picture set and a target font picture set in advance, and style embedded blocks uniquely corresponding to each target font are respectively arranged for target characters of each target font in the training process, namely, each target character has unique style embedded blocks except the same character embedded blocks, so that when the trained deep learning model is used for font conversion, different target fonts can be output based on different style embedded blocks. And further converting the standard font picture into a target font text picture through the deep learning model obtained through training, so that the text pictures of various target fonts can be obtained. The method provided by the invention improves the efficiency of training the models of various fonts and the efficiency of converting the deep learning model into the new fonts. The invention also provides a font conversion device, equipment and a computer readable storage medium, which have the beneficial effects.

Description

Font conversion method, device, equipment and computer readable storage medium

Technical Field

The present invention relates to the field of image recognition and processing, and in particular, to a method, apparatus, device, and computer readable storage medium for font conversion.

Background

With the increasing types of computer fonts, handwritten fonts are increasingly added to computer libraries, and fonts of some celebrities are being pursued by many users. Every time a new font is added in the computer font library, a learning training conversion model is needed according to the pictures of the font. In the prior art, a new font is learned by training a deep learning model, and the new font is added and put in storage. However, in practical applications, it is often insufficient to generate only one style of font, and training and learning to generate a model takes a long time, and it takes more time to re-create a training set and re-train each time a new style of font is generated.

Therefore, how to improve the training conversion efficiency of generating new fonts is a technical problem that needs to be solved by those skilled in the art.

Disclosure of Invention

The invention aims to provide a method, a device, equipment and a computer readable storage medium for converting fonts, which are used for improving training conversion efficiency of generating new fonts.

In order to solve the above technical problems, the present invention provides a method for converting fonts, including:

training according to a standard font picture set and a plurality of target font picture sets in advance to obtain a deep learning model, and respectively embedding style embedded blocks which are uniquely corresponding to each target font for target characters of each target font in the training process; wherein, the target characters of each target font comprise the same character embedded blocks;

obtaining a standard font text picture;

and inputting the standard font text picture into the deep learning model to obtain the target font text picture.

Optionally, the deep learning model is specifically a cGAN network model.

Optionally, training according to the standard font picture set and the target font picture set to obtain a deep learning model, and embedding style embedded blocks uniquely corresponding to each target font for target characters of each target font in a training process, which specifically includes:

processing the standard font picture set and the target font picture set to obtain tagged binary training data;

determining a generator, a discriminator, a first encoder for embedding the style embedded block and a second encoder for calculating a loss value, and constructing an original cGAN network model;

and training the original cGAN network model by using the binary training data, and adjusting parameters according to a conversion result and the loss value until the loss value reaches a first preset condition, so as to determine the cGAN network model.

Optionally, the obtaining the standard font text picture specifically includes:

training in advance to obtain a convolutional neural network model for identifying initial characters;

and receiving an input initial text picture, and inputting the initial text picture into the convolutional neural network model to obtain the standard font text picture.

Optionally, the pre-training is performed to obtain a convolutional neural network model for identifying the initial text, which specifically includes:

determining a common Chinese character picture set and a number corresponding to each Chinese character, and generating a data set for training;

determining an input layer, a convolution layer, a downsampling layer and an output layer of the convolution neural network;

and training the convolutional neural network through the data set until the training parameters reach a second preset condition, and determining the convolutional neural network model.

Optionally, the inputting the initial text picture into the convolutional neural network model to obtain the standard font text picture specifically includes:

inputting the initial text picture into the convolutional neural network model to obtain a text number;

and acquiring and outputting the standard font text picture corresponding to the text number according to the preset corresponding relation.

In order to solve the above technical problem, the present invention further provides a device for converting fonts, including:

the first training unit is used for training according to a standard font picture set and a target font picture set in advance to obtain a deep learning model, and embedding style embedded blocks which are uniquely corresponding to each target font for target characters of each target font in the training process; wherein, the target characters of each target font comprise the same character embedded blocks;

the first conversion unit is used for obtaining standard font text pictures, inputting the standard font text pictures into the deep learning model and obtaining target font text pictures.

Optionally, the method further comprises:

the second training unit is used for training in advance to obtain a convolutional neural network model for identifying the initial characters;

the second conversion unit is used for receiving the input initial text picture, and inputting the initial text picture into the convolutional neural network model to obtain the standard font text picture.

In order to solve the above technical problem, the present invention further provides a font conversion device, including:

a memory for storing instructions comprising the steps of the method of font conversion of any of the above;

and the processor is used for executing the instructions.

To solve the above technical problem, the present invention also provides a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the method for font conversion as described in any of the above.

According to the font conversion method provided by the invention, the deep learning model is obtained by training according to the standard font picture set and the target font picture set in advance, and the style embedded blocks uniquely corresponding to each target font are respectively arranged for the target characters of each target font in the training process, even if each target character has the unique style embedded blocks besides the same character embedded blocks, the same character can be mapped to the same vector through the same character embedded blocks when the training is performed on the deep learning model, and different target fonts can be output based on different style embedded blocks. And further converting the standard font picture into a target font text picture through the deep learning model obtained through training, so that the text pictures of various target fonts can be obtained. According to the method provided by the invention, the deep learning model capable of converting fonts of multiple styles can be obtained in one model training, the efficiency of training the models of multiple fonts is improved, standard font text pictures are converted into target font text pictures of multiple fonts at one time through the deep learning model, and the efficiency of converting the new fonts of the deep learning model is improved. The invention also provides a font conversion device, equipment and a computer readable storage medium, which have the beneficial effects and are not repeated here.

Drawings

For a clearer description of embodiments of the invention or of the prior art, the drawings that are used in the description of the embodiments or of the prior art will be briefly described, it being apparent that the drawings in the description below are only some embodiments of the invention, and that other drawings can be obtained from them without inventive effort for a person skilled in the art.

FIG. 1 is a flow chart of a method for converting fonts according to an embodiment of the present invention;

FIG. 2 is a flowchart of a specific implementation of step S10 according to an embodiment of the present invention;

FIG. 3 is a flowchart of another font conversion method according to an embodiment of the present invention;

FIG. 4 is a flowchart of a specific implementation of step S30 according to an embodiment of the present invention;

FIG. 5 is a schematic diagram of a convolutional neural network model provided in an embodiment of the present invention;

FIG. 6 is a flowchart of a specific implementation of step S32 according to an embodiment of the present invention;

fig. 7 is a schematic diagram of a device for font conversion according to an embodiment of the present invention;

FIG. 8 is a schematic diagram of another apparatus for font conversion according to an embodiment of the present invention;

fig. 9 is a schematic structural diagram of a font conversion device according to an embodiment of the present invention.

Detailed Description

The core of the invention is to provide a method, a device, equipment and a computer readable storage medium for converting fonts, which are used for improving training conversion efficiency of generating new fonts.

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

Fig. 1 is a flowchart of a font conversion method according to an embodiment of the present invention. Fig. 2 is a flowchart of a specific implementation of step S10 according to an embodiment of the present invention.

As shown in fig. 1, the font conversion method includes:

s10: training according to a standard font picture set and a plurality of target font picture sets in advance to obtain a deep learning model, and embedding style embedded blocks uniquely corresponding to each target font for target characters of each target font in the training process.

Wherein the target characters of each target font comprise the same character embedded blocks.

In implementations, the deep learning model may specifically employ a cGAN network model. Experiments show that the effect of the cGAN network model on font learning and conversion is superior to other models. cGAN, a conditional GAN, can learn a generation model G mapped from observation image x and random noise vector z to y: i.e., { x, z } →y. As shown in fig. 3, G (x) is the model in the training process, y is the target model, the generator G is trained to produce an output that is a "true" image that cannot be resolved by the antagonistically trained arbiter D, which is trained to detect as much of the generator's "false" image as possible. That is, the object of the generator G is to generate an image which cannot be resolved by the arbiter D, and the object of the arbiter D is to resolve whether the received picture is a real picture, the process of game is a process of network learning, and when the two parties reach balance, the generator G can generate a high-quality picture.

The method comprises the steps that in the training process, the target characters of all target fonts are respectively embedded with style embedded blocks uniquely corresponding to all target fonts, and as the character embedded blocks included by the target characters of all target fonts are identical, the deep learning model can be mapped to the same vector according to the identical characters in the training process and the conversion process, and then different target characters are output according to the respective style embedded blocks.

As shown in fig. 2, step S10 may specifically include:

s20: and processing the standard font picture set and the target font picture set to obtain tagged binary training data.

The standard fonts can be regular script, and the standard font picture set can be prepared in advance by itself.

S21: the method comprises the steps of determining a generator, a discriminator, a first encoder for embedding style embedded blocks and a second encoder for calculating loss values, and constructing an original cGAN network model.

The method comprises the steps of building a cGAN network model with a generator of a U-Net network structure and a discriminator of a convolutional neural network model, and simultaneously adding a first encoder embedded with various style categories and a second encoder used for calculating losses.

Wherein, a first encoder (class encoder) is used for carrying out extraction encoding of style characteristics on random noise by using a convolutional neural network model. Because the common U-Net and GAN models cannot handle the uncertainty in the one-to-many relationship of converting one font into multiple fonts, the embodiment of the invention introduces a first encoder embedded with multiple style classes into the generator and connects a non-training Gaussian noise as a style embedded block and a character embedded block together before passing through the decoder. In this way, the encoder can still map the same character to the same vector, but on the other hand, the decoder will embed using both characters and the style of content to generate the target character.

The second encoder (auxiliary encoder) is similar in principle to the first encoder, focusing on feature extraction of the pictures generated by the generator, and subsequent calculation of the loss of the generated pictures, which is also used to constrain the cGAN learning process.

The loss value here is calculated with an L1 regularization function:

L _L1 (G)＝E _x,y,z [||y-G(x,z)|| ₁ ] (1)

finally, the loss optimization function of the whole cGAN network model is as follows:

wherein x represents a source picture, y represents a picture generated by a generator, z represents Gaussian noise, G represents a function of the generator, D represents a function of a discriminator, L _cGAN Is the loss function of cGAN, E _x,y,z Is a function of the expected value, G ^* Is the target optimization function, λ is the scale factor.

S22: and training an original cGAN network model by using binary training data, and adjusting parameters according to the conversion result and the loss value until the loss value reaches a first preset condition, so as to determine the cGAN network model.

And (3) performing repeated training of the cGAN network model by utilizing the binary training data obtained in the step (S20), observing the effect of the converted text and picture, calculating the loss value of the text in the text and picture compared with the corresponding text of the target font and picture according to the formula (1) and the formula (2), repeatedly teaching the network setting parameters, and testing and optimizing. And when the numerical value obtained by the loss function accords with the expectation, comparing the real picture with the generated picture, and determining a final cGAN network model for font conversion.

S11: and obtaining the standard font text picture.

And receiving the standard font text pictures transmitted by other units.

S12: and inputting the standard font text picture into a deep learning model to obtain the target font text picture.

Through the trained deep learning model, characters with various fonts are obtained through one-time conversion.

Optionally, outputting the target font text picture.

In specific implementation, the target font text and picture is displayed on a human-computer interaction interface or sent to a designated position.

May further include: and receiving the font set by the user to be selected and output, and correspondingly selecting the output font through the first encoder.

According to the font conversion method provided by the embodiment of the invention, the deep learning model is obtained by training according to the standard font picture set and the target font picture set in advance, and the style embedded blocks uniquely corresponding to each target font are respectively arranged for the target characters of each target font in the training process, even if each target character has the unique style embedded blocks besides the same character embedded blocks, the same character can be mapped to the same vector through the same character embedded blocks when the training is performed on the deep learning model, and different target fonts can be output based on different style embedded blocks. And further converting the standard font picture into a target font text picture through the deep learning model obtained through training, so that the text pictures of various target fonts can be obtained. According to the method provided by the embodiment of the invention, the deep learning model capable of converting fonts of multiple styles can be obtained in one model training, the efficiency of training the models of multiple fonts is improved, and standard font text pictures are converted into target font text pictures of multiple fonts at one time through the deep learning model, so that the efficiency of converting the deep learning model into new fonts is improved.

Fig. 3 is a flowchart of another font conversion method according to an embodiment of the present invention. Fig. 4 is a flowchart of a specific implementation of step S30 according to an embodiment of the present invention. Fig. 5 is a schematic diagram of a convolutional neural network model according to an embodiment of the present invention. Fig. 6 is a flowchart of a specific implementation of step S32 according to an embodiment of the present invention.

As shown in fig. 3, based on the above embodiment, in another embodiment, step S11 of the font conversion method specifically includes:

s30: and training in advance to obtain a convolutional neural network model for identifying the initial characters.

The step S30 and the step S10 are not sequentially related.

To further facilitate the use of the user, the user may be provided with a way to handwriting in text and then convert it to another text. This requires that the text input by the user by handwriting is first identified, i.e. the image of the initial text input by the user is processed and identified. In the traditional field of text image processing, preprocessing such as text positioning and character segmentation of Chinese characters is required. The character segmentation algorithm mainly comprises projection and connected domain analysis. The mainstream Chinese character recognition system generally adopts two modes of extracting Chinese character structural features and statistical features. However, it is almost impossible to accurately extract the structural characteristics of the Chinese characters of various handwriting due to different writing styles and different writing environments of individuals of the handwriting. Therefore, the Chinese characters are identified by a method for classifying based on statistical characteristics, and the statistical characteristics of the Chinese characters commonly used at present are as follows: elastic grid features, direction line element features, gabor features, moment features.

In order to improve the accuracy of the recognition of the initial text, as shown in fig. 4, step S30 may specifically include:

s40: and determining a common Chinese character picture set and a number corresponding to each Chinese character, and generating a data set for training.

In specific implementation, a common Chinese character CSAIA-HWDB1.1 handwriting picture set and a serial number label corresponding to each common Chinese character can be used for manufacturing a data set for training, and the data set is divided into 10: the ratio of 1 is divided into training and test sets.

S41: an input layer, a convolutional layer, a downsampling layer, and an output layer of the convolutional neural network are determined.

As shown in fig. 5, the convolutional neural network model may include an input layer (input), 5 convolutional layers (conv 1-5), 4 downsampling layers (pool 1-4), and a fully-connected output layer (output), wherein the full-link layer activation function uses a ReLU function, and the other convolutional kernel size uses 3 x 3.

S42: training the convolutional neural network through the data set until the training parameters reach a second preset condition, and determining a convolutional neural network model.

Specifically, the handwriting font data set can be trained by using a Dropout (i.e., in the training process of the deep learning network, the neural network unit is temporarily discarded from the network according to a certain probability), the convolutional neural network model is repeatedly trained, after the training times reach a preset threshold or the accuracy of the test reaches a target value, the training parameters reach a second preset condition, and the convolutional neural network model at the moment is determined to be the convolutional neural network model of the final application.

S31: and receiving the input initial text picture.

Specifically, the initial text and picture input by handwriting of the user can be received.

S32: and inputting the initial text picture into a convolutional neural network model to obtain the standard font text picture.

In specific implementation, as shown in fig. 6, step S32 may include:

s60: and inputting the initial text picture into a convolutional neural network model to obtain a text number.

S61: and acquiring and outputting the standard font character picture corresponding to the character number according to the preset corresponding relation.

The trained convolutional neural network model outputs the serial numbers of the characters, and the input and the output of the cGAN network model are the same data, so that the serial numbers of the characters can be in one-to-one correspondence with the UNICODE codes, and a functional block for generating the corresponding standard font characters and pictures by the UNICODE codes is added.

The method for converting the fonts provided by the embodiment of the invention provides a method for converting one handwriting font into another handwriting font, and expands the functions of font conversion.

The invention further discloses a font conversion device corresponding to the method based on the embodiments corresponding to the font conversion method.

Fig. 7 is a schematic diagram of a device for font conversion according to an embodiment of the present invention. As shown in fig. 7, the apparatus for font conversion provided in the embodiment of the present invention includes:

the first training unit 701 is configured to perform training in advance according to a standard font picture set and a target font picture set to obtain a deep learning model, and embed, in a training process, style embedded blocks corresponding to each target font for target characters of each target font; wherein, the target characters of each target font comprise the same character embedded blocks;

the first conversion unit 702 is configured to obtain a standard font text picture, and input the standard font text picture into the deep learning model to obtain a target font text picture.

Fig. 8 is a schematic diagram of another apparatus for font conversion according to an embodiment of the present invention. As shown in fig. 8, on the basis of the above embodiment, in another embodiment, the apparatus for converting a font further includes:

a second training unit 801, configured to perform training in advance to obtain a convolutional neural network model for identifying an initial text;

the second conversion unit 802 is configured to receive an input initial text image, and input the initial text image into the convolutional neural network model to obtain a standard font text image.

Since the embodiments of the apparatus portion and the embodiments of the method portion correspond to each other, the embodiments of the apparatus portion are referred to the description of the embodiments of the method portion, and are not repeated herein.

Fig. 9 is a schematic structural diagram of a font conversion device according to an embodiment of the present invention. As shown in fig. 9, the font converting device may vary considerably in configuration or performance, and may include one or more processors (central processing units, CPU) 910 (e.g., one or more processors) and memory 920, one or more storage media 930 (e.g., one or more mass storage devices) storing applications 933 or data 932. Wherein the memory 920 and storage medium 930 may be transitory or persistent storage. The program stored on the storage medium 930 may include one or more modules (not shown), each of which may include a series of instruction operations in the computing device. Still further, the processor 910 may be configured to communicate with a storage medium 930 and execute a series of instruction operations in the storage medium 930 on the font conversion device 900.

The font conversion device 900 may also include one or more power supplies 940, one or more wired or wireless network interfaces 950, one or more input/output interfaces 990, and/or one or more operating systems 931, such as Windows Server ^TM ，Mac OS X ^TM ，Unix ^TM ,Linux ^TM ，FreeBSD ^TM Etc.

The steps in the font conversion method described above with reference to fig. 1 to 6 are implemented by the font conversion apparatus based on the structure shown in fig. 9.

It will be clearly understood by those skilled in the art that, for convenience and brevity of description, specific working procedures of the font conversion apparatus and the computer readable storage medium described above may refer to corresponding procedures in the foregoing method embodiments, and are not repeated herein.

In the several embodiments provided in the present application, it should be understood that the disclosed method, apparatus, device, and computer readable storage medium may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of modules is merely a logical function division, and there may be additional divisions of actual implementation, e.g., multiple modules or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or modules, which may be in electrical, mechanical, or other forms. The modules illustrated as separate components may or may not be physically separate, and components shown as modules may or may not be physical modules, i.e., may be located in one place, or may be distributed over a plurality of network modules. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional module in each embodiment of the present application may be integrated into one processing module, or each module may exist alone physically, or two or more modules may be integrated into one module. The integrated modules may be implemented in hardware or in software functional modules.

The integrated modules, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art or in whole or in part in the form of a software product stored in a storage medium, including several instructions for causing a computer device (which may be a personal computer, a function calling device, or a network device, etc.) to perform all or part of the steps of the methods of the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

The method, apparatus, device and computer readable storage medium for font conversion provided by the present invention are described in detail above. In the description, each embodiment is described in a progressive manner, and each embodiment is mainly described by the differences from other embodiments, so that the same similar parts among the embodiments are mutually referred. For the device disclosed in the embodiment, since it corresponds to the method disclosed in the embodiment, the description is relatively simple, and the relevant points refer to the description of the method section. It should be noted that it will be apparent to those skilled in the art that various modifications and adaptations of the invention can be made without departing from the principles of the invention and these modifications and adaptations are intended to be within the scope of the invention as defined in the following claims.

It should also be noted that in this specification, relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

Claims

1. A method of font conversion, comprising:

obtaining a standard font text picture;

inputting the standard font text picture into the deep learning model to obtain a target font text picture;

the deep learning model is specifically a cGAN network model;

training according to a standard font picture set and a target font picture set to obtain a deep learning model, and respectively embedding style embedded blocks uniquely corresponding to each target font for target characters of each target font in a training process, wherein the method specifically comprises the following steps of:

2. The method according to claim 1, wherein the obtaining standard font text pictures specifically comprises:

3. The method according to claim 2, wherein the pre-training results in a convolutional neural network model for identifying initial text, specifically comprising:

4. The method according to claim 2, wherein said inputting the initial text picture into the convolutional neural network model to obtain the standard font text picture comprises:

5. An apparatus for font conversion, comprising:

the first conversion unit is used for obtaining standard font text pictures, inputting the standard font text pictures into the deep learning model and obtaining target font text pictures;

the deep learning model is specifically a cGAN network model;

6. The apparatus as recited in claim 5, further comprising:

7. An apparatus for font conversion, comprising:

a memory for storing instructions comprising the steps of the method of font conversion of any of claims 1 to 4;

and the processor is used for executing the instructions.

8. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, carries out the steps of the method of font conversion according to any of claims 1 to 4.