CN114626335A

CN114626335A - Character generation method, network training method, device, equipment and storage medium

Info

Publication number: CN114626335A
Application number: CN202210144287.XA
Authority: CN
Inventors: 杨奕骁; 陈宸; 李宇聪; 鞠奇
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2022-02-17
Filing date: 2022-02-17
Publication date: 2022-06-14

Abstract

The application provides a character generation method, a network training device, a network training equipment and a storage medium for character generation, which can be applied to various scenes such as cloud technology, artificial intelligence, intelligent traffic, auxiliary driving and the like, wherein the character generation method comprises the following steps: acquiring at least two candidate characters with different font style information from a character set to be processed; generating font style information corresponding to at least two candidate characters based on a font style coding network; generating target characters corresponding to the font style information and the character content information based on a character generation network; the font style coding network and the character generation network are obtained by performing character generation training on a preset neural network, and the hidden space of the sample font style information used in the character generation training process is restricted to be in normal distribution. According to the embodiment of the application, the generation quality of the target characters can be improved, and the generation cost of the target characters is reduced.

Description

Character generation method, network training method, device, equipment and storage medium

Technical Field

The application belongs to the technical field of computers, and particularly relates to a character generation method, a network training method, a device, equipment and a storage medium.

Background

In the related art, the style representation of a certain font is generally learned from a character subset of the font, and then a whole set of characters corresponding to a new font is generated.

However, in the related art, a part of characters in the new font needs to be preset, then the characters are input into the character generation model to learn style characteristics, then other character results are obtained from the model, and finally a whole set of characters corresponding to the new font is obtained. And the character generation model in the related technology is easy to have the problems of stroke missing or stroke adhesion, so that the character generation quality is low.

Disclosure of Invention

In order to solve the above technical problem, the present application provides a text generation method, a network training method, an apparatus, a device and a storage medium.

In one aspect, the present application provides a text generation method, including:

acquiring at least two candidate characters with different font style information from a character set to be processed;

generating font style information corresponding to the at least two candidate characters based on a font style coding network;

generating target characters corresponding to the font style information and the character content information based on a character generation network; the text content information represents the content of the text in the text set to be processed;

the font style coding network and the character generation network are obtained by performing character generation training on a preset neural network, and the hidden space of the sample font style information used in the character generation training process is constrained to be in normal distribution.

In another aspect, the present application provides a network training method for generating words, where the method includes:

extracting a first sample word and a second sample word from the sample word set;

and performing character generation training on a preset neural network based on the sample font style information of the first sample character and the sample character content information of the second sample character, and constraining the hidden space of the sample font style information into normal distribution in the character generation training process to obtain a font style coding network and a character generation network.

In another aspect, an embodiment of the present application provides a text generation apparatus, where the apparatus includes:

the character acquisition module is used for acquiring at least two candidate characters with different font style information from the character set to be processed;

the font style information generating module is used for generating font style information corresponding to the at least two candidate characters based on a font style coding network;

the target character generation module is used for generating target characters corresponding to the font style information and the character content information based on a character generation network; the text content information represents the content of the text in the text set to be processed;

In another aspect, the present application provides a network training apparatus for generating words, the apparatus includes:

the sample character acquisition module is used for extracting a first sample character and a second sample character from the sample character set;

and the training module is used for performing character generation training on a preset neural network based on the sample font style information of the first sample characters and the sample character content information of the second sample characters, and in the character generation training process, the hidden space of the sample font style information is restricted to normal distribution to obtain a font style coding network and a character generation network.

In another aspect, the present application provides an electronic device, which includes a processor and a memory, where at least one instruction or at least one program is stored in the memory, and the at least one instruction or the at least one program is loaded by the processor and executed to implement the text generation method or the text generation network training method as described above.

In another aspect, the present application provides a computer-readable storage medium, where at least one instruction or at least one program is stored, and the at least one instruction or the at least one program is loaded by a processor and executed to implement the method for generating words or the method for training a network for generating words as described above.

In another aspect, the present application provides a computer program product, which includes a computer program, and when executed by a processor, the computer program implements the text generation method or the text generation network training method as described above.

According to the character generation method, the network training method, the device, the equipment and the storage medium, the font style information corresponding to the at least two candidate characters is generated by using the trained font style coding network, the target characters corresponding to the font style information and the character content information are generated by using the trained character generation network, and the hidden space of the sample font style information is constrained to be in normal distribution in the training process, so that the font style information is compressed, the distance between different font style information is shortened, the change of the hidden space of the style is smoother, the network encounter point is avoided, and the generation quality of the target characters is improved; in addition, the trained font style coding network and the character generation network are used, so that the consumption of system resources in the target character generation process can be reduced, and the generation cost of the target characters is reduced.

Drawings

In order to more clearly illustrate the technical solutions and advantages of the embodiments of the present application or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present application, and other drawings can be obtained by those skilled in the art without creative efforts.

FIG. 1 is a schematic diagram of an implementation environment of a text generation method according to an exemplary embodiment.

FIG. 2 is a flow diagram illustrating a method of text generation according to an example embodiment.

FIG. 3 is a flow diagram illustrating a process for standardizing genres and content in accordance with an exemplary embodiment.

FIG. 4 is a flow diagram illustrating a method for web training for text generation in accordance with an exemplary embodiment.

FIG. 5 is a schematic diagram illustrating a neural network according to an exemplary embodiment.

FIG. 6 is a flow diagram illustrating a trained trellis coded network and word generation network in accordance with an exemplary embodiment.

FIG. 7 is a diagram illustrating one way of obtaining sample font style information in accordance with an illustrative embodiment.

FIG. 8 is a diagram illustrating a normalization process for sample font style information and sample textual content information, according to an illustrative embodiment.

Fig. 9 is a schematic diagram illustrating a target text generated by using the text generation method according to the embodiment of the present application, according to an exemplary embodiment.

FIG. 10 is a graph illustrating a fusion effect comparison, according to an exemplary embodiment.

Fig. 11 is a diagram illustrating a similar font obtained by detecting the font of the target text through a font similarity detection model according to an exemplary embodiment.

FIG. 12 illustrates a text generation apparatus in accordance with an exemplary embodiment.

FIG. 13 illustrates a network training apparatus for word generation according to an example embodiment.

Fig. 14 is a block diagram of a hardware structure of a server for network training of text generation or text generation according to an embodiment of the present application.

Detailed Description

Artificial Intelligence (AI) is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human Intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.

The artificial intelligence technology is a comprehensive subject and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.

In particular, the embodiment of the application relates to an artificial neural network technology in deep learning.

In order to make the technical solutions better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only partial embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

It should be noted that the terms "first," "second," and the like in the description and claims of this application and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the application described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or server that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

FIG. 1 is a schematic diagram of an implementation environment of a text generation method according to an exemplary embodiment. As shown in fig. 1, the implementation environment may include at least a terminal 01 and a server 02, and the terminal 01 and the server 02 may be directly or indirectly connected through wired or wireless communication, and the application is not limited herein.

In particular, the terminal may be used to collect a set of words to be processed and a sample set of words. Alternatively, the terminal 01 may include, but is not limited to, a mobile phone, a computer, an intelligent voice interaction device, an intelligent appliance, a vehicle-mounted terminal, an aircraft, and the like. The embodiment of the application can be applied to various scenes, including but not limited to cloud technology, artificial intelligence, intelligent transportation, driving assistance and the like.

In particular, the server 02 may be configured to train a font style encoding network and a text generation network, and generate target text based on the font style encoding network and the text generation network. Optionally, the server 02 may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing basic cloud computing services such as a cloud service, a cloud database, cloud computing, a cloud function, cloud storage, a network service, cloud communication, a middleware service, a domain name service, a security service, a CDN, a big data and artificial intelligence platform, and the like.

It should be noted that fig. 1 is only an example. In other scenarios, other implementation environments may also be included, for example, the implementation environment may include a terminal, obtain a font style encoding network and a text generation network through terminal training, and generate a target text based on the font style encoding network and the text generation network.

FIG. 2 is a flow diagram illustrating a method of text generation according to an example embodiment. The method may be used in the implementation environment of fig. 1. The present specification provides method steps as described in the examples or flowcharts, but may include more or fewer steps based on routine or non-inventive labor. The order of steps recited in the embodiments is merely one manner of performing the steps in a multitude of orders and does not represent the only order of execution. In practice, the system or server product may be implemented in a sequential or parallel manner (e.g., parallel processor or multi-threaded environment) according to the embodiments or methods shown in the figures. Specifically, as shown in fig. 2, the method may include:

s101, acquiring at least two candidate characters with different font style information from a character set to be processed.

Specifically, the font style information may include, but is not limited to: regular script, Song's style, cartoon square, black body, Chinese running script, peach square, etc. Optionally, the set of words to be processed may include all words of various font style information. For example, all characters of regular script, all characters of song style, all characters of cartoon character, etc.

In the embodiment of the present application, at least two candidate texts with different font style information may be obtained from the to-be-processed text set in multiple ways, which is not specifically limited herein.

In one aspect, the characters in the processed character set may be classified according to font style information to obtain a plurality of font style information categories, each corresponding to a plurality of characters. At least two candidate font style information categories can be determined from the plurality of font style information categories, and one character is respectively extracted from the characters corresponding to the at least two candidate font style information categories to obtain at least two candidate characters. For example, a regular script type and a song style type may be determined from the plurality of font style information types, the regular script type corresponds to the plurality of characters, the song style type corresponds to the plurality of characters, one character is extracted from the plurality of characters corresponding to the regular script type, and one character is extracted from the plurality of characters corresponding to the song style type to obtain at least two candidate characters.

In another mode, a preset number of characters may be extracted from the characters corresponding to the at least two candidate font style information categories, respectively, to obtain at least two candidate characters. For example, a regular script type and a song style type may be determined from the plurality of font style information types, the regular script type corresponds to the plurality of characters, the song style type corresponds to the plurality of characters, a preset number of characters may be extracted from the plurality of characters corresponding to the regular script type, and the preset number of characters may be extracted from the plurality of characters corresponding to the regular script type, so as to obtain at least two candidate characters.

And S103, generating font style information corresponding to the at least two candidate characters based on the font style coding network.

S105, generating target characters corresponding to the font style information and the character content information based on a character generation network; the text content information represents the content of the text in the text set to be processed.

Illustratively, the text content information may be spatial information of the text content, e.g., a stroke structure of the text.

Illustratively, the hidden space (latent space) may refer to a space where a hidden variable (e.g., noise Z) is located, and the hidden space constraint may be understood as constraining a hidden vector of the hidden space, so that in a text generation process, text with better quality and effect can be generated by constraining the hidden space of the sample font style information in the training process.

In a specific embodiment, in step S103, at least two candidate characters may be input into a pre-trained font style coding network, and feature extraction is performed on the at least two candidate characters through the font style coding network, so as to obtain font style information corresponding to the at least two candidate characters. For example, through a pre-trained coding network, the font style information corresponding to at least two candidate characters extracted is a regular font and a song font respectively.

Optionally, the font style information corresponding to the at least two candidate characters may be generated in multiple ways according to the embodiment of the present application, which is not specifically limited herein.

In one aspect, in the step S103, the generating font style information corresponding to each of the at least two candidate characters based on the font style encoding network may include: and mapping the at least two candidate characters to the normal distribution based on the font style coding network to obtain font style information corresponding to the at least two candidate characters.

Because the hidden space of the sample font style information is constrained to be normal distribution in the character generation training process, after at least two candidate characters are input to the font style coding network, the at least two candidate characters can be mapped to the normal distribution by the font style coding network, and style feature vectors corresponding to the at least two candidate characters are extracted from the normal distribution, so that the extraction of the style feature vectors corresponding to the at least two candidate characters is realized, and the font style information corresponding to the at least two candidate characters is obtained. Because the hidden space of the sample font style information is constrained to be in normal distribution, the font style information is compressed in the process of extracting the font style information, the distance between different font style information corresponding to at least two candidate characters is shortened, the change of the hidden space of the style is smoother, the network is prevented from encountering a break point, the generation quality of the font style information is improved, and the generation quality of the target character is improved.

In a specific embodiment, in step S105, the content of the text may be extracted from the set of texts to be processed as text content information, and the text content information and the font style information are input to a text generation network trained in advance to generate the target text. The word generation network may include two branches: the system comprises a style branch and a content branch, wherein the style branch is input by font style information obtained by a font style coding network, and the content branch is input by character content information.

In one mode, the text content information of each text can be sequentially extracted from the text set to be processed, and the text content information of each text is sequentially traversed to obtain the text content information of each text and the target text corresponding to the font style information.

In another mode, the text content information of all the texts can be extracted from the text set to be processed, and the text content information of all the texts is traversed in parallel to obtain the target texts corresponding to the text content information and the font style information of all the texts.

In an optional embodiment, the method may further include: and calculating the average value of the font style information based on the font style coding network to obtain target font style information.

Accordingly, in step S105, the generating a target character corresponding to the font style information and the character content information based on the character generation network may include: and generating the target character corresponding to the target font style information and the character content information based on the character generation network.

Specifically, because at least two candidate characters are input into the font style coding network, and the font style coding network outputs one font style information for each candidate character, an average value of the font style information corresponding to the at least two candidate characters can be obtained to obtain the target font style information, and the target font style information can be understood as a new style characteristic, that is, the target character style characteristic is different from the font style information corresponding to the characters in the character set to be processed. For example, since the text can be processed as an image, and the style information of the image is usually represented by a mean value and a variance, the averaging of the font style information corresponding to at least two candidate texts can be understood as averaging the image corresponding to at least two candidate texts.

Accordingly, after obtaining the target font style information, the target font style information and the text content information may be input to the text generation network, so as to obtain the target text corresponding to the text content information and the target font style information. In the embodiment of the application, a new style characteristic (namely target font style information) can be obtained by calculating the average value of the font style information corresponding to at least two candidate characters, and the accuracy of generating the target font style information is high. The target font style information and the character content information with higher accuracy are input into a character generation network, so that the generation quality of target characters can be improved; in addition, a whole set of new characters (namely target characters) corresponding to the style information of the target font can be generated through the character generation network, partial characters under a certain new font do not need to be preset, consumption of system resources in the target character generation process can be reduced, and therefore generation cost of the target characters is reduced.

FIG. 3 is a flow diagram illustrating a process for standardizing genres and content in accordance with an exemplary embodiment. As shown in fig. 3, in an optional embodiment, the method may further include:

s201, decoupling the font style information.

And S203, carrying out standardization processing on the decoupled font style information to obtain standard font style information.

And S205, carrying out standardization processing on the text content information to obtain standard text content information.

Accordingly, in step S105, the generating a target character corresponding to the font style information and the character content information based on the character generation network may include:

and generating the target character corresponding to the standard font style information and the standard character content information based on the character generation network.

In an alternative embodiment, the structure of the text-generating network may be a SPADE + AdaIN structure. The SPADE is a generation model for converting a segmentation graph into a live-action graph, and the SPADE is used as a backbone network and can better keep the structural information of characters. AdaIN is an abbreviation for Adaptive Instance Normalization, the Chinese name being Adaptive Instance Normalization.

For example, in the above steps S201 to S203, the font style information may be input to a Multi-layer perceptron (MLP) for decoupling, and the decoupled font style information is normalized by means of AdaIN to obtain normalized standard font style information, that is, the decoupled style is input to each layer of the network by means of AdaIN, and the style of the generated image is controlled by means of controlling the characteristic statistical information. The MLP is an artificial neural network with a forward structure. By way of example, decoupling may refer to: the style itself is decoupled, such as stroke thickness, stroke shape, etc.

The font style information in the embodiment of the application is decoupled after the plurality of full connection layers are arranged, and then is sent into each layer of the network.

In the text generation, the content branch preserves the stroke structure of the text. The stroke structure of the text (i.e., text content information) is a key factor. For example, in the step S205, the stroke structure of the text may be normalized in a spatial adaptive instance normalization (SpatialAdaIN) form to obtain standard text content information, that is, the stroke structure of the text is inserted into the text generation network in a SpatialAdaIN form, and by this way, the stroke structure of the text is better retained, that is, the originally segmented spatial information is retained), so as to improve the generation quality of the target text.

FIG. 4 is a flow diagram illustrating a method for web training for text generation in accordance with an exemplary embodiment. As shown in fig. 4, the method may include:

s301, extracting a first sample character and a second sample character from the sample character set.

Alternatively, the first sample text word may be a sample text set or a single text word, and the second sample text word may also be a sample text set or a single text word.

Illustratively, for any word x in the sample word set_ijContaining a style attribute S_iE S and content attribute C_jE.g. C. To represent a style, the style can be S from a sample corpus of characters_iIn the font of (2), K characters are taken out to form a reference set of the style

I.e. the first sample text. Wherein the content of the first and second substances,

d_sis the dimension of style embedding, and K is a positive integer greater than or equal to 1.

Illustratively, to represent the content attribute C_jSome words can be taken from the sample word set to form a content reference set

I.e. the second sample text.

If characters with different styles and the same content are selected as a content reference set, the difficulty of learning the content through the network is increased, because the characters with different styles have great difference in position and stroke form, and the network is difficult to abstract a correct stroke structure from the characters. Therefore, in an exemplary embodiment, in order to reduce the difficulty of the online learning content, a sample text with the font style information of song may be selected from the sample text set as a second sample text, and since the song is a standard font, the model learning of the font structure is facilitated. Furthermore, in order to further reduce the difficulty of the network learning content, a single Song style, namely a single Song style, can be selected from the sample text set

The second sample word is the word in (1).

And S303, performing character generation training on a preset neural network based on the sample font style information of the first sample characters and the sample character content information of the second sample characters, and constraining the hidden space of the sample font style information into normal distribution in the character generation training process to obtain a font style coding network and a character generation network.

In the embodiment of the application, the sample font style information of the first sample text character and the sample text content information of the second sample text character can be input into a preset neural network for text generation training, and in the text generation training process, the hidden space of the sample font style information is restricted to normal distribution, so that a font style coding network and a text generation network are obtained. In the training process, the hidden space of the sample font style information is constrained to be in normal distribution, so that the font style information is compressed, the distance between different font style information is shortened, the change of the style hidden space is smoother, a network is prevented from encountering a break point, and the training precision of a font style coding network and a character generation network is improved; in addition, based on the sample font style information of the first sample characters and the sample character content information of the second sample characters, character generation training is carried out on the preset neural network, and then a font style coding network and a character generation network can be obtained.

FIG. 5 is a schematic diagram illustrating a neural network according to an exemplary embodiment. As shown in fig. 5, the predetermined neural network may include a predetermined font style encoding network, a predetermined character generating network and a predetermined discriminating network, and the predetermined neural network is a structure generating a countermeasure network (GAN) as a whole. In addition, the preset neural network respectively inputs the spatial information of the sample text content information and the sample font style information of different layers to each layer of the generator by using a SpatilAdaIN and AdaIN mode.

The following describes a process of training a font style encoding network and a character generating network using the preset neural network in fig. 5:

supposing that the first sample text is the font style information of the ' fang zheng cartoon ', …, political affairs ' and the second sample text is the song style ' and is used for representing the text content of the ' first sample text, the first sample text is input into a preset style coding network to obtain sample font style information, the sample font style information is input into a preset text generation network in an AdaIN mode, the sample text content information of the second sample text is inserted into the preset text generation network in a SpatiaAdaIN mode, the preset text generation network outputs the text ' fang zheng cartoon ' (i.e. reference text), the reference text output by training and the text ' fang zheng cartoon ' in the sample text set are judged through a preset discriminator network to obtain loss information, parameters of the network are continuously adjusted in the training process, and stopping the training process until the loss information meets the preset condition to obtain the trained font style coding network and the character generation network.

FIG. 6 is a flow diagram illustrating a trained trellis coded network and word generation network in accordance with an exemplary embodiment. As shown in fig. 6, in step S303, performing a character generation training on a preset neural network based on the sample font style information of the first sample character and the sample character content information of the second sample character, and in the character generation training process, constraining a hidden space of the sample font style information to be a normal distribution to obtain a font style coding network and a character generation network, may include:

s3031, mapping the first sample font to the latest positive distribution based on the preset font style coding network to obtain the style information of the current sample font.

Optionally, in step S3031, the preset font style encoding network may be a forward propagation network, that is, the forward propagation network transmits the first sample text (i.e., the style reference set)R_S) As input, current sample font style information (i.e., style embedding vector) is output

)。

Illustratively, the style embeds vector Z_SMay be randomly sampled. The forward propagation network can reference the style to a set R_SMapping to the latest positive Tai distribution, and outputting two vectors

And

the two vectors represent a multivariate normal distribution N (. mu.)_s,σ_s) Parameter μ of_sAnd σ_s。

And S3033, processing the standard normal distribution and the current sample font style information based on the preset font style coding network to obtain the sample font style information.

In this embodiment of the application, in step S3033, the preset font style encoding network may make the current sample font style information look even to the multi-standard normal distribution, so as to obtain the sample font style information.

FIG. 7 is a diagram illustrating one way of obtaining sample font style information in accordance with an illustrative embodiment. As shown in fig. 7, in the step S3033, the processing the standard normal distribution and the current sample font style information based on the preset font style coding network to obtain the sample font style information may include:

and S30331, based on the preset font style coding network, randomly acquiring a feature vector corresponding to the font style information of the current sample from the standard normal distribution.

And S30333, based on the preset font style coding network, updating the latest positive distribution through the difference information between the characteristic vector and the current sample font style information, and taking the updated latest positive distribution as the latest positive distribution again.

S30335, based on the preset font style coding network, obtaining current sample font style information in the process of mapping the first sample text word to the latest positive distribution, and repeating the process of using the updated latest positive distribution as the latest positive distribution until the difference information meets the preset condition.

And S30337, based on the preset font style coding network, using the current sample font style information when the difference information meets the preset condition as the sample font style information.

Illustratively, in step S30331, during training, in order to make the style information (i.e. style embedded vector) of the current sample font

) Looking at the multivariate standard normal distribution, the preset font style coding network can randomly determine a feature vector corresponding to the current sample font style information from the standard normal distribution. In the above steps S20333 to S20337, the predetermined coding network may calculate difference information between the feature vector and the current sample font style information, and use the difference information as a distribution loss value, and based on the distribution loss value, adjust and update the latest positive distribution in the predetermined font style coding network, so that the latest positive distribution is continuously adjusted to be able to be used to obtain sample font style information with higher quality, that is, the latest positive distribution is continuously adjusted so that the difference information satisfies a predetermined condition, and use the current sample font style information when the difference information satisfies the predetermined condition as the sample font style information.

In one approach, the following constraints may be imposed on the pre-set style encoder:

wherein, N represents multivariate normal distribution, KL refers to KL divergence, and KL divergence is an index for measuring the matching degree of two probability distributions, and the larger the difference of the two distributions is, the larger the KL divergence is.

Accordingly, in step S30333, KL divergence between the feature vector and the current sample font style information may be calculated, and the latest positive distribution in the preset font style encoding network may be adjusted and updated based on the KL divergence, so that the latest positive distribution is continuously adjusted to be able to be used to obtain sample font style information with higher quality, that is, the latest positive distribution is continuously adjusted so that the KL divergence is smaller than the preset divergence threshold, and the current sample font style information when the KL divergence is smaller than the preset divergence threshold is used as the sample font style information.

In another mode, in step S30333, the difference information between the feature vector and the current sample font style information may also be calculated by using a maximum average difference algorithm. Where a maximum mean difference algorithm is used to measure the difference between the two distributions.

In the embodiment of the application, the current sample font style information (style embedded vector Z) is enabled to be limited by applying constraint to the preset font style coding network_S) Sampling from a multivariate normal distribution, so the style is embedded in the vector Z_SIs around the 0 mean and not in

Arbitrarily chosen in this space, thereby compressing the style embedding Z_SThe space of the method is shortened, the distance between different styles is shortened, the change of the style hidden space is smoother, the network is prevented from encountering break points, and the training precision of the font style coding network is improved.

And S3035, generating the current characters corresponding to the sample font style information and the sample character content information based on the preset character generation network.

In the embodiment of the present application, style information (style S) of a sample font is given_i) And sample text content (content C)_j) The preset character generation network aims to generate characters x corresponding to styles and contents_ij. Preparation ofThe input to the text generation network includes two branches: content branch and style branch, the input of the style branch is sample font style information (style embedding) obtained by a preset font style coding network

) The content branch inputs the content character image, so that the preset character generates the current character generated by the network

And S3037, based on the preset discrimination network, discriminating the current characters, the reference characters, the sample font style information and the sample character content information to obtain loss information.

S3039, training the preset font style coding network and the preset character generation network based on the loss information to obtain the font style coding network and the character generation network; the reference characters represent the characters of the sample character content information under the sample font style information.

The preset discrimination network is an important component of GAN, and its objective is to discriminate whether a given current word is sufficiently true, if true, the preset discrimination network will score the current word high, otherwise, score low. During training, the preset judgment network generally considers that the current characters generated by the preset character generation network are not real enough, and only the reference characters in the sample character set are real. Thus, the preset character generation network can only generate more real images in order to cheat the preset discrimination network. The preset character generation network and the preset discrimination network continuously improve the self ability in the zero-sum game, and finally, the current characters generated by the preset character generation network are close to the reference characters, so that a character generation network with higher quality can be obtained through training.

The purpose of the preset character generation network is to cheat the preset discrimination network, that is, the preset character generation network does not have direct supervision information, but provides supervision information by means of the preset discrimination network. In a 1In an alternative embodiment, the predetermined discrimination network may use 2 predetermined discrimination networks D having the same structure₁,D₂And judging images with different sizes, wherein the small-size images can enable the preset judging network to have a larger receptive field, and the large-size images enable the preset judging network to pay more attention to details, so that the problem of overfitting can be avoided to a certain extent.

In a specific embodiment, for each discriminant network, a hinge loss function (change loss) may be used as the loss function for GAN:

g, D, E denotes a preset character generation network, a preset discrimination network and a preset font style coding network, respectively, and k denotes the number of the preset discrimination networks.

In an exemplary embodiment, in order to enable the preset discrimination network to distinguish the content and style of the font and enhance the network quality, in S3037, the input of the preset discrimination network may further include sample font style information and sample text content information in addition to the current text and the reference text.

In another exemplary embodiment, in order to stabilize the training process, feature matching loss (feature) may also be used to align features (features) of the current text and the reference text in layers of the predetermined discriminant network, so that the features (features) are aligned in layers of the predetermined discriminant network

If the feature of the input text x in the kth discriminator and the ith layer is represented, the feature matching loss can be represented as:

wherein, the first and the second end of the pipe are connected with each other,

t represents the number of convolutional layers of a predetermined discrimination network, N_tAnd the number of elements for presetting the t-th layer feature of the discrimination network is represented.

In summary, after each part of the preset neural network is constructed, the overall optimization goal of the whole preset neural network is as follows:

wherein the hyperparameter lambda_FM＝10,λ_VAE0.05, G, D, E respectively indicates a preset character generation network, a preset discrimination network and a preset font style coding network.

The preset neural network of the embodiment of the application can comprise a preset font style coding network, a preset character generation network and a preset discrimination network, the hidden space of the sample font style information is constrained to be normal distribution through the preset font style coding network, so that the font style information is compressed, the distance between different font style information is shortened, the change of the hidden space of the style is smoother, the network is prevented from encountering break points, the training precision of the font style coding network and the character generation network is improved, and the training cost and the training difficulty are reduced; in addition, the current characters corresponding to the sample font style information and the sample character content information are generated through a preset character generation network, the current characters, the reference characters, the sample font style information and the sample character content information are distinguished through a preset distinguishing network to obtain loss information, and a font style coding network and a character generation network are obtained through training on the basis of the loss information, so that the training precision of the font style coding network and the character generation network is further improved, and the training cost is reduced.

FIG. 8 is a diagram illustrating a normalization process for sample font style information and sample textual content information, according to an illustrative embodiment. As shown in fig. 8, in an alternative embodiment, the method may further include:

s401, decoupling the sample font style information.

And S403, standardizing the decoupled sample font style information to obtain standard sample font style information.

S405, standardizing the sample text content information to obtain standard sample text content information.

Accordingly, in step S3035, the generating the current text corresponding to the sample font style information and the sample text content information based on the preset text generating network includes:

and generating the current characters corresponding to the standard sample character content information and the standard sample font style information based on the preset character generation network.

In an alternative embodiment, continuing with FIG. 5, the structure of the predetermined text generation network may be the SPADE + AdaIN structure. The SPADE is a generation model for converting a segmentation graph into a live-action graph, and the generation model is used as a backbone network and can better keep the structural information of characters.

For example, in the above steps S401 to S403, the sample font style information may be input into the ML for decoupling, the decoupled sample font style information is normalized by means of AdaIN to obtain normalized standard sample font style information, the decoupled style is input into each layer of the network by means of AdaIN, and the style of the generated image is controlled by means of controlling the feature statistical information. The font style information is decoupled after the multiple layers of full connection layers and then sent into each layer of the network, and through the mode, the preset character generation network can extract the font style information from multiple semantic levels. In addition, SPADE uses AdaIN as a normalization method, which can well preserve the originally partitioned spatial information. Namely, the quality of font fusion can be improved through the structure of the SPADE + AdaIN generator.

In the process of generating the characters, the content branches reserve the stroke structure of the characters. The stroke structure of the text (i.e., text content information) is a key factor. For example, in step S405, the sample text content information may be inserted into the network in a SpatialAdaIN form to perform a normalization process on the sample text content information to obtain standard sample text content information, so that the stroke structure (i.e., the originally segmented spatial information) of the text is retained, and the generation quality of the current text is improved.

In a specific embodiment, the specific process of normalization by means of AdaIN may be as follows:

AdaIN receives a content input x and a style input s, and normalizes by aligning the channel level (C) mean and standard deviation of x to s. AdaIN, which does not require learned affine parameters, can adaptively calculate affine parameters from style input, with the following calculation formula:

AdaIN(x,s)＝σ(s)(σ(x)x-μ(x))+μ(s)，

wherein IN AdaIN refers to: for each feature channel (C) of each sample (N), the mean and standard deviation are calculated in space (H, W).

In a specific embodiment, the specific process of performing the normalization process by the Spatial AdaIN manner may be as follows:

SpatialAdaIN is similar to AdaIN in that there are no radiological parameters to learn, and is an adaptive normalization method. In particular, the mean and standard deviation of the SpatialAdaIN statistics are pixel-level, not channel-level. The statistics of the pixel level can improve the complexity of the model and can better keep the spatial information of the image, thereby better keeping the stroke structure of the characters.

In one possible embodiment, the sample text content information with low resolution may also be used as an input to the predetermined text generation network. Starting from the characters with low resolution, the preset character generation network can generate the characters with corresponding styles only by slightly adjusting the details such as the positions, the thicknesses and the like of all strokes according to the styles, so that the difficulty of character generation is greatly reduced.

In a possible embodiment, the present application further provides a font quality detection model and a font similarity detection model, which are used to measure the font quality and the innovation degree of the generated target text.

Illustratively, the font quality detection model is mainly used for evaluating whether the font is complete or not, whether the font is difficult to identify or not and the like, and the font quality detection model can be a two-classification character quality evaluation model, wherein a positive sample of a data set is high-quality characters and real characters generated by the model, a negative sample is low-quality characters generated by the model, and tests show that the classification accuracy of the model is 93%, and the model has strong quality resolution capability. Further, in order to avoid that the generated characters are too similar to the training characters, a font Similarity detection model may be trained to evaluate the Similarity between the fonts of the generated characters and the fonts of the training characters, and the evaluation index may be Cosine Similarity (Cosine Similarity).

In one embodiment, a predetermined number of font styles may be selected, each font containing a predetermined number of common characters (content) as a sample text set. And training the sample character set according to the training process. Fig. 9 is a schematic diagram illustrating a target text generated by using the text generation method according to the embodiment of the present application, according to an exemplary embodiment. As shown in fig. 9, the font style coding network and the character generation network provided by the embodiment of the present application can generate characters with different styles, including square, straight, abstract and artistic characters, and characters with different thickness degrees, and the same character can maintain its own style. On the premise of ensuring the style diversity, the generated characters have good integrity and higher quality, and the problems of stroke missing and combination basically can not occur.

In one possible embodiment, an Empirical Mode Decomposition (EMD) model may also be trained and used to generate text. Table 1 is a table comparing the fusion effect between the text generated by EMD and the target text generated by the method in the embodiment of the present application. FIG. 10 is a graph illustrating a fusion effect comparison, according to an exemplary embodiment.

As shown in table 1, the target text generated by the method in the embodiment of the present application has a better fusion effect. Wherein FID is an abbreviation of Freehet inclusion Distance score, the name of which is Distance score. The yield refers to the ratio of the generated characters with better quality to the generated total characters.

TABLE 1 comparison of fusion effects

Model (model)	FID↓	Yield ↓
			EMD	32.66	63.30％
This application	28.10	91.49％

As shown in fig. 10, the EMD-fused text is prone to font incompleteness problems such as missing strokes and merging strokes, for example, the "true" word in the first line lacks a horizontal line, and the "practise" and "solution" words have merged strokes and thus generate wrongly written words. The target subgraph generated by the embodiment of the application rarely has the problems of stroke shortage and combination, can well keep the structure of characters, rarely has wrong characters, and greatly improves the quality of the characters.

Fig. 11 is a diagram illustrating a similar font obtained by detecting the font of the target text through a font similarity detection model according to an exemplary embodiment. As shown in fig. 11, it can be seen from the first three lines that the font of the generated target text has a distinct style difference from the fonts of the text in the training set (i.e., the closest font and the next closest font in fig. 11). However, the font of the character displayed in the fourth row is highly similar to the font of the characters in the character library, and the character can be automatically filtered out to screen out improper characters, so that the quality of target character generation is further improved.

In one possible embodiment, a method for generating text, a method for training a network for generating text, and the like, as disclosed herein, wherein font style information, text content information, and the like may be stored on a blockchain.

FIG. 12 illustrates a text generation apparatus in accordance with an exemplary embodiment. As shown in fig. 12, the apparatus may include at least:

the text acquiring module 501 may be configured to acquire at least two candidate texts with different font style information from the to-be-processed text set.

The font style information generating module 503 may be configured to generate font style information corresponding to each of the at least two candidate characters based on the font style encoding network.

A target character generation module 505, configured to generate a target character corresponding to the font style information and the character content information based on a character generation network; the text content information represents the content of the text in the text set to be processed.

The font style coding network and the character generation network are obtained by performing character generation training on a preset neural network, and the hidden space of the sample font style information used in the character generation training process is restricted to be in normal distribution.

In an exemplary embodiment, the font style information generating module 503 may be configured to map the at least two candidate characters into the normal distribution based on the font style coding network, so as to obtain the font style information corresponding to each of the at least two candidate characters.

In an exemplary embodiment, the apparatus may further include:

the target font style information determining module may be configured to calculate an average value of the font style information based on the font style encoding network, so as to obtain target font style information.

Accordingly, the target character generation module 505 may be configured to generate the target characters corresponding to the target font style information and the character content information based on the character generation network.

In an exemplary embodiment, the apparatus may further include:

the first decoupling module can be used for decoupling the font style information.

The first standardization processing module can be used for carrying out standardization processing on the decoupled font style information to obtain standard font style information.

The second standardization processing module can be used for carrying out standardization processing on the character content information to obtain standard character content information.

Accordingly, the target character generation module 505 may be configured to generate the target character corresponding to the standard font style information and the standard character content information based on the character generation network.

FIG. 13 illustrates a network training apparatus for word generation according to an example embodiment. As shown in fig. 13, the apparatus may further include:

a sample text obtaining module 601, configured to extract a first sample text and a second sample text from the sample text set.

The training module 603 is configured to perform text generation training on a preset neural network based on the sample font style information of the first sample text word and the sample text content information of the second sample text word, and constrain a hidden space of the sample font style information to be normal distribution in the text generation training process to obtain a font style coding network and a text generation network.

In an exemplary embodiment, the preset neural network includes a preset font style coding network, a preset text generation network, and a preset judgment network, and the training module 603 may include:

and the mapping unit may be configured to map the first sample font word to a latest positive distribution based on the preset font style encoding network, so as to obtain current sample font style information.

The sample font style information generating unit may be configured to process the standard normal distribution and the current sample font style information based on the preset font style encoding network to obtain the sample font style information.

And the current character generating unit may be configured to generate a current character corresponding to the sample font style information and the sample character content information based on the preset character generating network.

And the loss information determining unit can be used for performing discrimination processing on the current character, the reference character, the sample font style information and the sample character content information based on the preset discrimination network to obtain loss information.

A network generating unit, configured to train the preset font style encoding network and the preset character generating network based on the loss information to obtain the font style encoding network and the character generating network; the reference characters represent the characters of the sample character content information under the sample font style information.

In an exemplary embodiment, the sample font style information generating unit may include:

and the feature vector obtaining subunit is configured to randomly obtain, based on the preset font style encoding network, a feature vector corresponding to the current sample font style information from the standard normal distribution.

An updating subunit, configured to determine difference information between the feature vector and the current sample font style information based on the preset font style encoding network, update the latest positive distribution based on the difference information, and take the updated latest positive distribution as the latest positive distribution.

And the repeating subunit may be configured to repeat, based on the preset font style encoding network, between the mapping of the first sample word to the latest positive distribution to obtain current sample font style information and the re-setting of the updated latest positive distribution as the latest positive distribution until the difference information satisfies a preset condition.

The sample font style information determining subunit may be configured to determine, based on the preset font style encoding network, current sample font style information when the difference information satisfies a preset condition as the sample font style information.

In an exemplary embodiment, the apparatus may further include:

a second decoupling module may be configured to decouple the sample font style information.

And the third standardization processing module can be used for carrying out standardization processing on the decoupled sample font style information to obtain standard sample font style information.

And the fourth standardization processing module can be used for carrying out standardization processing on the sample text content information to obtain standard sample text content information.

Accordingly, the current text generating unit may be configured to generate the current text corresponding to the standard sample text content information and the standard sample font style information based on the preset text generating network.

It should be noted that the embodiments of the apparatus provided in the embodiments of the present application are based on the same inventive concept as the embodiments of the method described above.

The embodiment of the present application further provides an electronic device, where the electronic device includes a processor and a memory, where the memory stores at least one instruction or at least one program, and the at least one instruction or the at least one program is loaded and executed by the processor to implement the method for generating words or the method for network training for generating words provided in any of the above embodiments.

Embodiments of the present application further provide a computer-readable storage medium, which may be disposed in a terminal to store at least one instruction or at least one program for implementing a method for generating a text or a network training method for generating a text, where the at least one instruction or the at least one program is loaded and executed by a processor to implement the method for generating a text or the network training method for generating a text provided in the foregoing method embodiments.

Alternatively, in the present specification embodiment, the storage medium may be located at least one network server among a plurality of network servers of a computer network. Optionally, in this embodiment, the storage medium may include but is not limited to: various media capable of storing program codes, such as a usb disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic disk, or an optical disk.

The memory of the embodiments of the present disclosure may be used to store software programs and modules, and the processor may execute various functional applications and data processing by operating the software programs and modules stored in the memory. The memory can mainly comprise a program storage area and a data storage area, wherein the program storage area can store an operating system, application programs needed by functions and the like; the storage data area may store data created according to use of the device, and the like. Further, the memory may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device. Accordingly, the memory may also include a memory controller to provide the processor access to the memory.

Embodiments of the present application also provide a computer program product or computer program comprising computer instructions stored in a computer-readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and executes the computer instructions, so that the computer device executes the text generation method or the text generation network training method provided by the above method embodiments.

The embodiment of the character generation method or the network training method for character generation provided by the embodiment of the application can be executed in a terminal, a computer terminal, a server or a similar operation device. Taking the example of running on a server, fig. 14 is a block diagram of a hardware structure of a server for word generation or network training of word generation according to the embodiment of the present application. As shown in fig. 14, the server 700 may have a relatively large difference due to different configurations or performances, and may include one or more Central Processing Units (CPUs) 710 (the CPU 710 may include but is not limited to a Processing device such as a microprocessor MCU or a programmable logic device FPGA, etc.), a memory 730 for storing data, and one or more storage media 720 (e.g., one or more mass storage devices) for storing applications 723 or data 722. Memory 730 and storage medium 720 may be, among other things, transient storage or persistent storage. The program stored in the storage medium 720 may include one or more modules, each of which may include a series of instruction operations for the server. Still further, central processor 710 may be configured to communicate with storage medium 720 to execute a sequence of instruction operations in storage medium 720 on server 700. The server 700 may also include one or more power supplies 760, one or more wired or wireless network interfaces 750, one or more input-output interfaces 740, and/or one or more operating systems 721, such as Windows Server, MacOS XTM, UnixTM, LinuxTM, FreeBSDTM, etc.

The input/output interface 740 may be used to receive or transmit data via a network. Specific examples of the network described above may include a wireless network provided by a communication provider of the server 700. In one example, the input/output Interface 740 includes a Network adapter (NIC) that can be connected to other Network devices through a base station to communicate with the internet. In one example, the input/output interface 740 can be a Radio Frequency (RF) module, which is used to communicate with the internet in a wireless manner.

It will be understood by those skilled in the art that the structure shown in fig. 14 is only an illustration and is not intended to limit the structure of the electronic device. For example, server 700 may also include more or fewer components than shown in FIG. 14, or have a different configuration than shown in FIG. 14.

It should be noted that: the sequence of the embodiments of the present application is only for description, and does not represent the advantages and disadvantages of the embodiments. And that specific embodiments have been described above. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.

The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, as for the device and server embodiments, since they are substantially similar to the method embodiments, the description is simple, and the relevant points can be referred to the partial description of the method embodiments.

It will be understood by those skilled in the art that all or part of the steps of implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, and the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.

The present invention is not intended to be limited to the particular embodiments shown and described, but is to be accorded the widest scope consistent with the principles and novel features herein disclosed.

Claims

1. A method for generating words, the method comprising:

2. The method of claim 1, wherein generating font style information corresponding to each of the at least two candidate words based on the font style encoding network comprises:

and mapping the at least two candidate characters to the normal distribution based on the font style coding network to obtain font style information corresponding to the at least two candidate characters.

3. The method of claim 1, further comprising:

calculating the average value of the font style information based on the font style coding network to obtain target font style information;

generating the target characters corresponding to the font style information and the character content information based on the character generation network, wherein the generating comprises:

and generating the target characters corresponding to the target font style information and the character content information based on the character generation network.

4. The method according to any one of claims 1 to 3, further comprising:

decoupling said font style information;

standardizing the decoupled font style information to obtain standard font style information;

standardizing the text content information to obtain standard text content information;

correspondingly, the generating the target text corresponding to the font style information and the text content information based on the text generation network includes:

and generating the target characters corresponding to the standard font style information and the standard character content information based on the character generation network.

5. A network training method for generating characters is characterized by comprising the following steps:

6. The method according to claim 5, wherein the preset neural network includes a preset font style coding network, a preset character generation network, and a preset discrimination network, the character generation training is performed on the preset neural network based on the sample font style information of the first sample character and the sample character content information of the second sample character, and in the character generation training process, the hidden space of the sample font style information is constrained to be normally distributed to obtain the font style coding network and the character generation network, including:

mapping the first sample font to the latest normal distribution based on the preset font style coding network to obtain the current sample font style information;

processing the standard normal distribution and the current sample font style information based on the preset font style coding network to obtain the sample font style information;

generating the current characters corresponding to the sample font style information and the sample character content information based on the preset character generation network;

based on the preset discrimination network, performing discrimination processing on the current characters, the reference characters, the sample font style information and the sample character content information to obtain loss information;

training the preset font style coding network and the preset character generation network based on the loss information to obtain the font style coding network and the character generation network; the reference characters represent characters of the sample character content information under the sample font style information.

7. The method according to claim 6, wherein the processing the standard normal distribution and the current sample font style information based on the preset font style encoding network to obtain the sample font style information comprises:

based on the preset font style coding network, randomly acquiring a feature vector corresponding to the font style information of the current sample from the standard normal distribution;

determining difference information between the feature vector and the current sample font style information based on the preset font style encoding network, updating the latest positive distribution based on the difference information, and taking the updated latest positive distribution as the latest positive distribution again;

based on the preset font style coding network, repeating the mapping of the first sample text word to the latest positive-too-distribution to obtain the current sample font style information, and the renewed latest positive-too-distribution as the latest positive-too-distribution until the difference information meets the preset condition;

and taking the current sample font style information when the difference information meets the preset condition as the sample font style information based on the preset font style coding network.

8. The method of claim 6, further comprising:

decoupling the sample font style information;

standardizing the decoupled sample font style information to obtain standard sample font style information;

standardizing the sample text content information to obtain standard sample text content information;

correspondingly, the generating the current text corresponding to the sample font style information and the sample text content information based on the preset text generation network includes:

9. A text generation apparatus, the apparatus comprising:

10. A network training apparatus for character generation, the apparatus comprising:

11. An electronic device, comprising a processor and a memory, wherein at least one instruction or at least one program is stored in the memory, and the at least one instruction or the at least one program is loaded by the processor and executed to implement the method for generating words according to any one of claims 1 to 4 or the method for training a network for generating words according to any one of claims 5 to 8.

12. A computer-readable storage medium, having at least one instruction or at least one program stored therein, which is loaded and executed by a processor to implement the method for generating words according to any one of claims 1 to 4 or the method for network training of word generation according to any one of claims 5 to 8.

13. A computer program product comprising a computer program, wherein the computer program, when executed by a processor, implements the method of generating a word according to any one of claims 1 to 4 and the method of network training for word generation according to any one of claims 5 to 8.