CN116152368A - Font generation method, training method, device and equipment of font generation model - Google Patents

Font generation method, training method, device and equipment of font generation model Download PDF

Info

Publication number
CN116152368A
CN116152368A CN202211665987.XA CN202211665987A CN116152368A CN 116152368 A CN116152368 A CN 116152368A CN 202211665987 A CN202211665987 A CN 202211665987A CN 116152368 A CN116152368 A CN 116152368A
Authority
CN
China
Prior art keywords
target
font
text
fonts
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211665987.XA
Other languages
Chinese (zh)
Inventor
周敏
王驰
葛铁铮
姜宇宁
许威威
鲍虎军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University ZJU
Original Assignee
Zhejiang University ZJU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University ZJU filed Critical Zhejiang University ZJU
Priority to CN202211665987.XA priority Critical patent/CN116152368A/en
Publication of CN116152368A publication Critical patent/CN116152368A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T11/002D [Two Dimensional] image generation
    • G06T11/001Texturing; Colouring; Generation of texture or colour
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/103Formatting, i.e. changing of presentation of documents
    • G06F40/109Font handling; Temporal or kinetic typography
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Abstract

The embodiment of the application provides a font generation method, a training device and training equipment for a font generation model. The method comprises the following steps: acquiring a character image of a reference character of a target font and character images of target characters of preset multiple basic fonts, wherein the skeleton styles of different basic fonts are different; determining style characteristics of a target font based on a character image of the reference character, and obtaining a plurality of content characteristics of the target character based on the character image of the target character; and mixing the style characteristics and a plurality of content characteristics of the target characters to obtain the target characters corresponding to the target fonts. The method and the device can improve the consistency of the skeleton style of the generated text and the skeleton style of the reference text.

Description

Font generation method, training method, device and equipment of font generation model
Technical Field
The present disclosure relates to the field of computer technologies, and in particular, to a font generating method, a training device, and a training device for a font generating model.
Background
In scenes such as advertising creative production, webpage design and the like, information is required to be transmitted by words, and the whole visual effect or reading experience can be affected by different fonts, so that the fonts can be automatically generated according to the needs in practical application.
In general, the font generation method used in automatically generating fonts is: content features are acquired from a character image of target characters of a certain known font (for example, a regular script), style features are acquired from a character image of reference characters of the target font, and the two features are combined and then the target characters corresponding to the target font are generated through a decoder. However, such a method has a problem that the skeleton style of the generated text does not coincide with the skeleton style of the reference text.
Disclosure of Invention
The embodiment of the application provides a font generation method, a training device and training equipment of a font generation model, which are used for solving the problem that the skeleton style of generated characters is inconsistent with the skeleton style of reference characters in the prior art.
In a first aspect, an embodiment of the present application provides a font generating method, including:
acquiring a character image of a reference character of a target font and character images of target characters of preset multiple basic fonts, wherein the skeleton styles of different basic fonts are different;
determining style characteristics of the target font based on the text image of the reference text, and obtaining a plurality of content characteristics of the target text based on the text image of the target text;
And carrying out mixed processing on the style characteristics and the plurality of content characteristics of the target characters to obtain the target characters corresponding to the target fonts.
In a second aspect, an embodiment of the present application provides a font generating method, including:
acquiring a generation request sent by a terminal, wherein the generation request is used for requesting to generate target characters corresponding to target fonts;
under the condition that the generation request is acquired, acquiring target characters corresponding to the target fonts; the target text corresponding to the target font is obtained by adopting the following mode: determining style characteristics of the target font based on the text image of the reference text of the target font, and obtaining a plurality of content characteristics of the target text based on the text images of the target text of the preset various basic fonts, wherein the skeleton styles of different basic fonts are different; mixing the style characteristics and a plurality of content characteristics of the target characters to obtain target characters corresponding to the target fonts;
and sending the target text corresponding to the target font to the terminal.
In a third aspect, an embodiment of the present application provides a font generating method, including:
Sending a generation request to a server in response to the operation of a user, wherein the generation request is used for requesting to generate target characters corresponding to a target font;
acquiring target characters corresponding to the target fonts sent by the server based on the generation request; the target text corresponding to the target font is obtained by adopting the following mode: determining style characteristics of the target font based on the text image of the reference text of the target font, and obtaining a plurality of content characteristics of the target text based on the text images of the target text of the preset various basic fonts, wherein the skeleton styles of different basic fonts are different; mixing the style characteristics and a plurality of content characteristics of the target characters to obtain target characters corresponding to the target fonts;
and outputting the target text corresponding to the target font.
In a fourth aspect, an embodiment of the present application provides a training method for a font generation model, including:
constructing a basic model, wherein training parameters are arranged in the basic model, and the basic model is used for generating fonts based on a text image of a target text with a certain known font;
iteratively adjusting training parameters of the basic model by utilizing corresponding training samples until a second preset requirement is met, so as to obtain the trained basic model;
The basic model is adjusted to be a model for generating fonts based on character images of target characters of preset multiple basic fonts, and a font generation model to be trained is obtained, wherein the skeleton styles of different basic fonts are different;
and iteratively adjusting training parameters of the font generating model by utilizing corresponding training samples until the second preset requirement is met, thereby obtaining the trained font generating model.
In a fifth aspect, an embodiment of the present application provides a font generating device, including:
the acquisition module is used for acquiring the character images of the reference characters of the target fonts and the character images of the target characters of the preset multiple basic fonts, wherein the skeleton styles of the different basic fonts are different;
the characteristic module is used for determining style characteristics of the target font based on the character image of the reference character and obtaining a plurality of content characteristics of the target character based on the character image of the target character;
and the mixing module is used for carrying out mixing processing on the style characteristics and the content characteristics of the target characters so as to obtain the target characters corresponding to the target fonts.
In a sixth aspect, an embodiment of the present application provides a font generating device, including:
The acquisition module is used for acquiring a generation request sent by the terminal, wherein the generation request is used for requesting to generate target characters corresponding to the target fonts;
the obtaining module is used for obtaining the target characters corresponding to the target fonts under the condition that the generation request is obtained; the target text corresponding to the target font is obtained by adopting the following mode: determining style characteristics of the target font based on the text image of the reference text of the target font, and obtaining a plurality of content characteristics of the target text based on the text images of the target text of the preset various basic fonts, wherein the skeleton styles of different basic fonts are different; mixing the style characteristics and a plurality of content characteristics of the target characters to obtain target characters corresponding to the target fonts;
and the sending module is used for sending the target text corresponding to the target font to the terminal.
In a seventh aspect, an embodiment of the present application provides a font generating device, including:
the sending module is used for responding to the operation of the user and sending a generation request to the server, wherein the generation request is used for requesting to generate target characters corresponding to the target fonts;
The acquisition module is used for acquiring target characters corresponding to the target fonts sent by the server based on the generation request; the target text corresponding to the target font is obtained by adopting the following mode: determining style characteristics of the target font based on the text image of the reference text of the target font, and obtaining a plurality of content characteristics of the target text based on the text images of the target text of the preset various basic fonts, wherein the skeleton styles of different basic fonts are different; mixing the style characteristics and a plurality of content characteristics of the target characters to obtain target characters corresponding to the target fonts;
and the output module is used for outputting the target characters corresponding to the target fonts.
In an eighth aspect, an embodiment of the present application provides a training apparatus for a font generation model, including:
the building module is used for building a basic model, training parameters are set in the basic model, and the basic model is used for generating fonts based on the text images of the target text of a certain known font;
the first training module is used for iteratively adjusting training parameters of the basic model by utilizing corresponding training samples until a second preset requirement is met, so that the trained basic model is obtained;
The adjusting module is used for adjusting the basic model into a model for generating fonts based on character images of target characters of preset multiple basic fonts, so as to obtain a font generating model to be trained, wherein the skeleton styles of different basic fonts are different;
and the second training module is used for iteratively adjusting training parameters of the font generation model by utilizing corresponding training samples until the second preset requirement is met, so that the trained font generation model is obtained.
In a ninth aspect, embodiments of the present application provide an electronic device, including: a memory, a processor; wherein the memory stores one or more computer instructions which, when executed by the processor, implement the method of any of the first aspects.
In a tenth aspect, embodiments of the present application provide an electronic device, including: a memory, a processor; wherein the memory stores one or more computer instructions which, when executed by the processor, implement the method of any of the second aspects.
In an eleventh aspect, embodiments of the present application provide an electronic device, including: a memory, a processor; wherein the memory stores one or more computer instructions which, when executed by the processor, implement the method of any of the third aspects.
In a twelfth aspect, embodiments of the present application provide an electronic device, including: a memory, a processor; wherein the memory stores one or more computer instructions which, when executed by the processor, implement the method of any of the fourth aspects.
In a thirteenth aspect, embodiments of the present application provide a computer-readable storage medium, on which a computer program is stored, which, when executed, implements the method according to any of the first aspects.
In a fourteenth aspect, embodiments of the present application provide a computer readable storage medium having stored thereon a computer program which, when executed, implements the method according to any of the second aspects.
In a fifteenth aspect, embodiments of the present application provide a computer readable storage medium having stored thereon a computer program which, when executed, implements the method according to any of the third aspects.
In a sixteenth aspect, embodiments of the present application provide a computer readable storage medium having stored thereon a computer program which, when executed, implements the method according to any of the fourth aspects.
Embodiments of the present application also provide a computer program for implementing the method according to any of the first aspects when the computer program is executed by a computer.
Embodiments of the present application also provide a computer program for implementing the method according to any of the second aspects when the computer program is executed by a computer.
Embodiments of the present application also provide a computer program for implementing the method according to any of the third aspects when the computer program is executed by a computer.
Embodiments of the present application also provide a computer program for implementing the method according to any of the fourth aspects, when the computer program is executed by a computer.
According to the method and the device for generating the text, the text image of the reference text of the target font and the text image of the target text of the preset multiple basic fonts can be obtained, wherein the skeleton styles of the different basic fonts are different, the style characteristics of the target font are determined based on the text image of the reference text, the content characteristics of the target text are obtained based on the text image of the target text, the style characteristics and the content characteristics of the target text are subjected to mixed processing to obtain the target text corresponding to the target font, the text image of the target text based on the multiple basic fonts is realized, the number of fonts which are referred to by font generation is increased compared with that of the text generated based on only one known font, and therefore the consistency of training results is increased, and the consistency of the skeleton styles of the generated text and the skeleton styles of the reference text can be improved.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, a brief description will be given below of the drawings that are needed in the embodiments or the prior art descriptions, and it is obvious that the drawings in the following description are some embodiments of the present application, and that other drawings can be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a schematic view of an application scenario in an embodiment of the present application;
FIG. 2 is a schematic flow chart of a font generating method according to an embodiment of the present application;
FIG. 3A is a reference text of a target font according to an embodiment of the present application;
FIG. 3B is a reference text of a target font according to another embodiment of the present application;
FIG. 4A is a font effect of generating a font according to an embodiment of the present application;
FIG. 4B is a font effect of generating a font provided in another embodiment of the present application;
FIG. 5 is a schematic structural diagram of a basic model according to an embodiment of the present disclosure;
FIG. 6 is a schematic diagram of calculating projection distribution loss according to an embodiment of the present application;
FIG. 7 is a schematic diagram of determining multiple base fonts according to an embodiment of the present application;
FIG. 8 is a schematic diagram of determining corresponding weights of a base font according to an embodiment of the present application;
FIG. 9 is a schematic diagram of a font generation model according to an embodiment of the present application;
FIG. 10 is a schematic diagram of optimizing style features provided by an embodiment of the present application;
FIG. 11 is a flowchart illustrating a font generating method according to another embodiment of the present application;
fig. 12 is a flowchart of a font generating method according to another embodiment of the present application;
FIG. 13 is a flowchart of a training method of a font generation model according to an embodiment of the present application;
fig. 14 is a schematic structural diagram of a font generating device according to an embodiment of the present application;
fig. 15 is a schematic structural diagram of an electronic device according to an embodiment of the present application;
fig. 16 is a schematic structural diagram of a font generating device according to another embodiment of the present application;
fig. 17 is a schematic structural diagram of an electronic device according to another embodiment of the present application;
fig. 18 is a schematic structural diagram of a font generating device according to another embodiment of the present application;
fig. 19 is a schematic structural diagram of an electronic device according to another embodiment of the present disclosure;
FIG. 20 is a schematic structural diagram of a training device for a font generation model according to an embodiment of the present application;
Fig. 21 is a schematic structural diagram of an electronic device according to another embodiment of the present application.
Detailed Description
For the purposes of making the objects, technical solutions and advantages of the embodiments of the present application more clear, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is apparent that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by one of ordinary skill in the art based on the embodiments herein without making any inventive effort, are intended to be within the scope of the present application.
The terminology used in the embodiments of the application is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in this application and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise, the "plurality" generally includes at least two, but does not exclude the case of at least one.
It should be understood that the term "and/or" as used herein is merely one relationship describing the association of the associated objects, meaning that there may be three relationships, e.g., a and/or B, may represent: a exists alone, A and B exist together, and B exists alone. In addition, the character "/" herein generally indicates that the front and rear associated objects are an "or" relationship.
The words "if", as used herein, may be interpreted as "at … …" or "at … …" or "in response to a determination" or "in response to a detection", depending on the context. Similarly, the phrase "if determined" or "if detected (stated condition or event)" may be interpreted as "when determined" or "in response to determination" or "when detected (stated condition or event)" or "in response to detection (stated condition or event), depending on the context.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a product or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such product or system. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a commodity or system comprising such elements.
In addition, the sequence of steps in the method embodiments described below is only an example and is not strictly limited.
Fig. 1 is a schematic view of an application scenario of a method provided in an embodiment of the present application, where, as shown in fig. 1, the application scenario may include a terminal 11 and a server 12, and the terminal 11 and the server 12 may be connected in a communication manner. The terminal 11 may be, for example, a mobile phone, a tablet computer, a notebook computer, a desktop computer, a wearable device, etc., and the terminal 11 may execute the font generating method or the training method of the font generating model provided in the embodiments of the present application. The server 12 is configured to provide a background service for the terminal 11, where the server 12 may be, for example, a physical server, a cloud server, or the like, and the server 12 may also execute the font generating method or the training method of the font generating model provided in the embodiment of the present application.
In general, content features are obtained from a character image of a target character of a certain known font (for example, a regular script), style features are obtained from a character image of a reference character of the target font, and the target character corresponding to the target font is generated by a decoder after the two features are combined.
In order to solve the technical problem that the skeleton style of the generated text is inconsistent with the reference text, in the embodiment of the application, the text image of the reference text of the target text and the text image of the target text of the preset multiple basic fonts can be obtained, wherein the skeleton styles of the different basic fonts are different, the style characteristics of the target text are determined based on the text image of the reference text, the text image of the target text is used for obtaining multiple content characteristics of the target text, and the style characteristics and the multiple content characteristics of the target text are mixed to obtain the target text corresponding to the target text.
It should be noted that, the skeletons of different characters in the same font are different, and the skeletons of different characters in the same font have the same characteristics, and in the embodiment of the present application, the characteristics of the skeletons can be recorded as skeleton styles.
It should be noted that, the font generating method provided in the embodiment of the present application may be applied to any scene where the target text needs to be generated based on the reference text, including but not limited to, scenes such as advertising creative, web page design, and the like.
Some embodiments of the present application are described in detail below with reference to the accompanying drawings. The embodiments described below and features of the embodiments may be combined with each other without conflict.
Fig. 2 is a flow chart of a font generating method according to an embodiment of the present application, and the embodiment may be applied to the terminal 11 or the server 12 in fig. 1. As shown in fig. 2, the method of the present embodiment may include:
step 21, obtaining a character image of a reference character of a target font and a character image of target characters of preset multiple basic fonts, wherein the skeleton styles of different basic fonts are different;
step 22, determining style characteristics of the target font based on the text image of the reference text, and obtaining a plurality of content characteristics of the target text based on the text image of the target text;
And step 23, carrying out mixing processing on the style characteristics and a plurality of content characteristics of the target characters to obtain the target characters corresponding to the target fonts.
In this embodiment of the present application, the target font refers to a font to be generated, and one or more characters of the target font may be designed in advance by a designer, where the one or more characters may be recorded as reference characters, the reference characters of one target font may be shown in fig. 3A, for example, and the reference characters of another target font may be shown in fig. 3B, for example. The target text refers to text of which the font to be generated is the target font, and text images of the reference text based on the target font, and the target text corresponding to the target font can be generated by adopting the font generating method provided by the embodiment of the application, and specifically can be the text images of the target text of the target font.
The preset various basic fonts are all known fonts, the known fonts refer to existing fonts and can be also called source fonts, the known fonts can be Song Ti, regular script and the like, and the skeleton styles of different basic fonts in the various basic fonts are different. In one embodiment, the plurality of base fonts may be specifically known fonts with a skeleton style that is representative, and the skeleton style represented by the different base fonts may be different. Alternatively, the base font may be determined manually, that is, a plurality of known fonts are determined as the base font manually from among the known fonts within a preset range. Or alternatively, the base font may be automatically determined, an implementation of which may be seen later.
The specific manner of acquiring the text image of the reference text and the text image of the target text is not limited, and for example, the text images of the reference text and the target text sent by other devices may be received.
In the embodiment of the present application, after obtaining the text images of the reference text and the target text, the style characteristics of the target font may be determined based on the text images of the reference text, and the text images of the target text may be used to obtain a plurality of content characteristics of the target text, and then the style characteristics and the plurality of content characteristics of the target text may be mixed to obtain the target text corresponding to the target font.
In one embodiment, a plurality of content features may be fused first, and then the fusion result and the style feature may be mixed to obtain a target text corresponding to the target font. In addition, in the case of input support for a mixed network, style characteristics and a plurality of content characteristics may be directly mixed to obtain a target text corresponding to a target font.
Based on this, step 23 may specifically include: fusing a plurality of content features of the target text to obtain content fusion features; and mixing the style characteristics and the content fusion characteristics to obtain target characters corresponding to the target fonts.
In one embodiment, the content fusion feature may be obtained by averaging a plurality of content features. In this case, the importance of the content features corresponding to the different base fonts is the same at the time of fusion.
In another embodiment, the content fusion feature may be obtained by weighting and summing a plurality of content features. Based on this, the fusing the multiple content features of the target text to obtain the content fusion feature may specifically include: and carrying out weighted summation on a plurality of content features of the target text based on the weight corresponding to the base font, and obtaining the content fusion feature. In this case, the importance degrees of the content features corresponding to different base fonts may be different during fusion, and the weight corresponding to the base fonts may be obtained based on the importance degrees of the content features corresponding to the base fonts during fusion. In one embodiment, the weight corresponding to the base font may be positively correlated with the similarity between the base font and the target font, so that the more similar the base font is between the base font and the target font, the more important the corresponding content features of the base font may be in fusion, so that the content fusion features can be used for expressing the skeleton style of the target font, and the consistency between the skeleton style of the generated font and the skeleton style of the reference font can be further improved.
The weight corresponding to the base font can be determined manually, for example, the similarity between the base font and the target font can be determined through human eye observation, so that the weight corresponding to the base font is determined; alternatively, the weights corresponding to the base fonts may be automatically determined, and the specific manner of automatically determining the weights corresponding to the base fonts may be found in the following description.
In one embodiment, font generation may be performed by a font generation model including a style encoder, a content encoder, and a mixer. The style encoder can be used for extracting style characteristics from the text images of the reference text, the content encoder can be used for extracting content characteristics from the text images of the target text, and the mixer can be used for carrying out characteristic mixing to generate the target text corresponding to the target font. It should be noted that, the specific manner of feature mixing performed by the mixer is not limited in this application, and may be, for example, mixing by an adaptive instance normalization (Adaptive Instance Normalization, adaIN) manner.
Based on the above, the determining the style characteristics of the target font based on the text image of the reference text comprises: inputting the text image of the reference text into a style encoder for processing to determine the style characteristics of the target font; the text image based on the target text obtains a plurality of content features of the target text, including: respectively inputting character images of target characters of multiple basic fonts into a content encoder for processing to obtain multiple content characteristics of the target fonts; mixing the style characteristics and the content fusion characteristics to obtain target characters corresponding to the target fonts, wherein the method comprises the following steps: and inputting the style characteristics and the content fusion characteristics into a mixer for processing to obtain target characters corresponding to the target fonts.
Since the font generation model may be font generation based on content fusion features, the font generation model may also be referred to as a content fusion model (Content Fusion Module, CFM). The font effect generated by using the font generation model provided in the embodiment of the present application may be as shown in fig. 4A based on the reference text shown in fig. 3A, and the font effect generated by using the font generation model provided in the embodiment of the present application may be as shown in fig. 4B based on the reference text shown in fig. 3B.
In one embodiment, when the number of the reference characters is one, the inputting the text image of the reference characters into the style encoder for processing to determine the style characteristics of the target font may specifically include: and inputting the text image of the reference text into a style encoder to obtain style characteristics which are used as style characteristics of the target font.
In another embodiment, when the number of the reference characters of the target font is multiple, the inputting the text image of the reference characters into the style encoder for processing to determine the style characteristics of the target font may specifically include: and inputting the character images of the plurality of reference characters into a style encoder to process the average value of the plurality of style characteristics as the style characteristics of the target font.
In yet another embodiment, style iterative reasoning (Style Iterative Inference, SII) strategies can be utilized to obtain style characteristics of the target fonts, specifically, in the reasoning stage, iterative reasoning can be performed on the style characteristics on the basis of initial values of the style characteristics to obtain optimized style characteristics, and then the optimized style characteristics are adopted to predict, so that style representation of the optimized target fonts is realized, detail information such as strokes, edges and the like of generated characters can be improved, and the problems of multiple errors, uneven edges and the like of the multi-stroke characters can be solved. The optimized style characteristics are style characteristics of the target fonts.
Based on this, in one embodiment, the text image of the reference text is input to a style encoder for processing to determine style characteristics of the target font, which may specifically include: inputting the text image of the reference text into a style encoder network for processing to obtain an initial value for the style characteristics of the input mixer; and iteratively adjusting the style characteristics of the input mixer by utilizing the corresponding training samples until a preset requirement (hereinafter referred to as a first preset requirement) is met, so as to obtain the style characteristics of the determined target fonts.
When the number of the reference characters is one, inputting the character images of the reference characters into a style encoder for processing to obtain style characteristics serving as initial values of the style characteristics; when the number of the reference characters is plural, an average value of a plurality of style characteristics obtained by processing the character images of the plural reference characters by the style encoder may be used as an initial value of the style characteristics.
The first preset requirement refers to a requirement that a difference between the generated text image and the sample text image needs to be satisfied, and the difference between the generated text image and the sample text image may be represented by a loss value of a loss function (hereinafter referred to as a first loss function), and the first preset requirement may be a requirement for the loss value of the first loss function, for example, the loss value of the first loss function is less than or equal to a first threshold value.
For example, training examples for iteratively adjusting style characteristics of an input mixer may include sample text images of sample text of a base font and sample text images of sample text of a target font. The sample character image of the sample character of the base font can be input into a content encoder for processing to obtain content characteristics, the obtained content characteristics can also be input into a mixer to obtain a generated character image of the sample character of the target font, and the style characteristics input into the mixer can be iteratively adjusted based on the difference between the generated character image of the sample character of the target font and the sample character image of the sample character of the target font. The sample character image of the sample character of the target font can be designed in advance by a designer.
In practical application, the difference between the generated text image of the sample text of the target font and the sample text image of the sample text of the target font may be represented by a loss value of the loss function, based on which, in one embodiment, the style characteristics of the input mixer are iteratively adjusted based on the difference between the generated text image of the sample text of the target font and the sample text image of the sample text of the target font until the first preset requirement is satisfied, which may specifically include: determining a loss value of a first loss function based on the generated text image of the sample text of the target font and the sample text image of the sample text of the target font; and iteratively adjusting the style characteristics of the input mixer until the loss value of the first loss function meets a first preset requirement.
Wherein the first loss function may be constructed based on one or more losses, and the embodiment of the loss for constructing the first loss function is not limited, for example, when the Font generation model is implemented based on a deformable generation network (Deformable Generative Networks for Unsupervised Font Generation, DG-Font) for unsupervised Font generation, the losses for constructing the first loss function may include one or more of four losses used in DG-Font, and the four losses may include: image reconstruction L1 (mean absolute error) loss for domain invariant feature preservation; 2) Content consistency loss to ensure consistency between generated and input content images; 3) The contrast loss is used to generate realistic images; 4) The deformation offset is normalized to avoid excessive offset in the feature deformation jumper connection (Feature Deformation Skip Connection, FDSC) module.
In the embodiment of the application, optionally, a font generation model capable of generating fonts based on the text images of the target text of various basic fonts can be directly constructed for training; or alternatively, the training can be performed by constructing a model capable of performing font generation based on the text image of the target text with a certain known font, and then obtaining a font generation model capable of performing font generation based on the text images of the target text with various basic fonts based on the trained model, so that the training process of the model is facilitated to be simplified.
Based on this, in one embodiment, the font generation model is trained in the following manner: step A, constructing a basic model, wherein training parameters are set in the basic model, and the basic model is used for generating fonts based on a character image of a target character with a certain known font; step B, iteratively adjusting training parameters of the basic model by utilizing corresponding training samples until a preset requirement (hereinafter, the training parameters can be recorded as a second preset requirement) is met, so that a trained font model is obtained; step C, adjusting the basic model into a model for generating fonts based on the character images of the target characters of the multiple basic fonts, and obtaining a font generation model to be trained; and D, iteratively adjusting training parameters of the font generating model by utilizing the corresponding training samples until the second preset requirement is met, thereby obtaining the trained font generating model.
The second preset requirement refers to a requirement that a difference between the generated text image and the sample text image needs to be met, and the difference between the generated text image and the sample text image may be represented by a loss value of a loss function (hereinafter referred to as a second loss function), and the second preset requirement may be a requirement for the loss value of the second loss function, for example, the loss value of the second loss function is less than or equal to a second threshold.
The step A and the step B can realize training to obtain a model (marked as a basic model) for generating the fonts of the character images of the target characters with a certain known fonts, the basic model can learn to transfer the characters to the target domain by separating the content and the styles, the target fonts can be understood as the fonts in the target domain, and the source fonts can be understood as the fonts in the source domain correspondingly.
For example, as shown in fig. 5, the basic model may be structured, referring to fig. 5, a text image of "kanji" of a target font may be input to a style encoder 51 of the basic model to be processed, a style vector S1 may be obtained, a text image of "kanji" of a certain existing font (e.g., regular script) may be input to a content encoder 52 of the basic model to be processed, a content feature map C1 may be obtained, and the style vector S1 and the content feature map C1 may be input to a mixer 53 of the basic model to be processed, thereby obtaining a text image of "kanji" of the target font. It should be noted that, FDSC-1 and FDSC-2 in fig. 5 represent two feature-modified jumper connection modules, and the specific content of the feature-modified jumper connection modules may be referred to the description in the related art, which is not repeated herein.
The training samples for iteratively adjusting the training parameters of the base model may include sample text images of sample text of a certain known font and sample text images of sample text of a target font, where the sample text images of sample text of a certain known font may be used to obtain content features, the sample text images of sample text of a target font may be used to obtain style features, the obtained content features and style features may be used to obtain a generated text image of sample text of a target font, and the training parameters of the base model may be iteratively adjusted based on differences between the generated text images of sample text of the target font and the sample text images of sample text of the target font.
In one embodiment, based on the difference between the generated text image of the sample text of the target font and the sample text image of the sample text of the target font, the training parameters of the basic model are iteratively adjusted until the second preset requirement is met, which specifically may include: determining a loss value of the second loss function based on the generated text image of the sample text of the target font and the sample text image of the sample text of the target font; and iteratively adjusting training parameters of the basic model until the loss value of the second loss function meets a second preset requirement.
In one embodiment, the penalty for constructing the second penalty function may include a projection distribution penalty (Projected Character Loss, PCL), which refers to a similarity between a first distribution of pixel values obtained by accumulating pixel values of a sample text image of the sample text of the target font along the target projection direction and a second distribution of pixel values obtained by accumulating pixel values of a generated text image of the sample text of the target font along the target projection direction. Since the distribution is sensitive to the relative relationship, the projected distribution loss can pay more attention to the global shape of the text, so that the supervision on the generation of the whole character form can be enhanced through the second loss function constructed by the projected distribution loss. For example, a W distance (i.e., waserstein distance) or KL divergence (i.e., kullback-Leibler Divergence) may be used in calculating the similarity between the pixel value distributions.
The target projection direction may be one or more, taking the target projection direction as 6 as an example, as shown in fig. 6, a projection distribution loss (e.g., pcl_1 in fig. 6) in the target projection direction may be obtained by calculating a similarity between a first pixel value distribution and a second pixel value distribution in the target projection direction, a projection distribution loss (e.g., pcl_4 in fig. 6) in the further target projection direction may be obtained by calculating a similarity between a first pixel value distribution and a second pixel value distribution in the further target projection direction, a projection distribution loss (e.g., pcl_2 in fig. 6) in the further target projection direction may be obtained by calculating a similarity between a first pixel value distribution and a second pixel value distribution in the further target projection direction, a projection distribution loss (e.g., pcl_4 in fig. 6) in the further target projection direction may be obtained by calculating a similarity between a first pixel value distribution and a second pixel value distribution in the further target projection direction, a projection loss (e.g., pcl_3 in fig. 6) in the further target projection direction may be obtained by calculating a similarity between a first pixel value distribution and a second pixel value distribution in the further target projection direction, and a projection loss (e.g., pcl_4 in fig. 6), and a projection loss (e.g., pcl_3 in the further target projection direction) in the further target projection direction may be obtained by calculating a similarity between a first pixel value distribution and a second pixel value distribution in the further target projection direction. It should be noted that the target projection directions in fig. 6 are only examples.
Illustratively, when the base model is based on a DG-Font implementation, constructing the penalty of the second penalty function may also include the four penalty described above. The loss of constructing the first loss function may be the same as or different from the loss of constructing the second loss function.
Alternatively, a base font with a representative skeleton style may be found using the trained base model. In one embodiment, the plurality of base fonts may be determined by: inputting character images of the same characters with various known fonts into a basic model obtained by training for processing, and obtaining a plurality of content characteristics of the same characters; clustering the plurality of content features to obtain a plurality of class clusters, and determining a plurality of base fonts based on the plurality of class clusters. Wherein each class cluster has a corresponding cluster center, and for example, a known font corresponding to the content feature in the cluster center in each class cluster can be determined as a base font. Since clustering is to cluster similar together, and dissimilar may include dissimilar of skeleton styles, fonts with a representative skeleton style can be selected based on the clustering result.
For example, as shown in fig. 7, the content encoder 52 in the basic model obtained by training may be used to input the character images of "permanent" of the chinese characters in multiple known fonts (for example, song Ti, regular script, and follow-up script) for processing, so as to obtain the respective content feature maps of the chinese characters "permanent" of the multiple known fonts, and the respective content feature maps of the chinese characters "permanent" of the multiple known fonts may be clustered to obtain multiple class clusters, and based on the multiple class clusters, multiple basic fonts such as Song Ti (song), and the book (li) may be obtained.
Alternatively, the weight corresponding to the base font may be determined using the trained base model. In one embodiment, the weights respectively corresponding to the plurality of base fonts may be determined by: inputting the character images of the same characters of the multiple basic fonts and the character images of the same characters of the target fonts into a trained basic model for processing to obtain the content characteristics of the same characters of the multiple basic fonts and the content characteristics of the same characters of the target fonts respectively; determining the similarity between the content characteristics of the same text of each basic font and the content characteristics of the same text of the target font; and determining weights corresponding to the multiple basic fonts respectively based on the respective similarity between the content characteristics of the same text of the multiple basic fonts and the content characteristics of the same text of the target font.
For example, as shown in fig. 8, the text image of the "volume" of the chinese character of the multiple basic fonts and the text image of the "volume" of the chinese character of the target font may be input to the content encoder 52 in the trained basic model for processing, so as to obtain the content feature images of the "volume" of the chinese character and the content feature images of the "volume" of the chinese character of the target font under the multiple basic fonts, respectively, and weight calculation is performed based on the obtained content feature images, so as to obtain the content fusion weight W.
The step C and the step D can be used for obtaining a font generation model based on the trained basic model, and the obtained font generation model aims at adaptively extracting the content characteristics by combining the content characteristics of the basic font, wherein the adaptation is mainly embodied in an adaptive skeleton style.
The structure of the font generation model may follow the basic model, and the content features (inputs of the mixer) may be replaced by weighted sums of the basic fonts, so that the basic model may be adjusted to a model for font generation based on the text images of the target text of the plurality of basic fonts, thereby obtaining a font generation model to be trained. Since the content features generated by the text image of a single known font are replaced with a weighted sum of the base fonts, the font generation model obtained by adapting the base model can be trained using the fused content features.
The training samples for iteratively adjusting the training parameters of the base model may include sample text images of sample text of a plurality of base fonts and sample text images of sample text of a target font, wherein the sample text images of the sample text of the plurality of base fonts may be used to obtain content fusion features, the sample text images of the sample text of the target font may be used to obtain style features, the obtained content fusion features and style features may be used to obtain a generated text image of the sample text of the target font, and the training parameters of the font generation model may be iteratively adjusted based on differences between the generated text images of the sample text of the target font and the sample text images of the sample text of the target font.
In one embodiment, based on a difference between the generated text image of the sample text of the target font and the sample text image of the sample text of the target font, iteratively adjusting the training parameters of the font generation model until the second preset requirement is met, specifically may include: determining a loss value of the second loss function based on the generated text image of the sample text of the target font and the sample text image of the sample text of the target font; and iteratively adjusting training parameters of the font generation model until the loss value of the second loss function meets a second preset requirement. It should be noted that, the description of the second loss function may refer to the previous description, and will not be repeated here.
Exemplary, the structure of the font generating model may be as shown in fig. 9, referring to fig. 9, the text image of the Chinese character "Chi" of the target font may be input to the style encoder 91 of the font generating model for processing to obtain the style vector S2, and the text image of the Chinese character "tong" of the multiple basic fonts may be input to the content encoder 92 of the font generating model for processing to obtain the content feature map matrix C b The content fusion feature C2 can be obtained after fusion based on the content fusion weight W, and the style vector S2 and the content fusion feature C2 are input to the mixer 93 of the font generation model for processing, so that the character image of the Chinese character "tung" of the target font can be obtained.
As shown in fig. 10, a schematic diagram of style characteristic optimization based on the font generation model shown in fig. 9 may be shown, and referring to fig. 10, sample text images of a plurality of sample texts (e.g., kanji "or the like) of the target font may be input to the style encoder 91 of the font generation model to be processed, to obtain an initial value S3 of style characteristic of the mixer 93 for inputting the font generation model. Namely, the style characteristics of the mixer 93 are initialized to S3, then, the sample text images of the sample text are used as supervision samples, the style vector is used as a variable to calculate the loss, the gradient tuning style vector is reversely transferred, and finally, the optimized style vector is adopted for prediction.
According to the font generation method, the character images of the reference characters of the target fonts and the character images of the target characters of the preset multiple basic fonts are obtained, wherein the skeleton styles of the different basic fonts are different, the style characteristics of the target fonts are determined based on the character images of the reference characters, the content characteristics of the target characters are obtained based on the character images of the target characters, the style characteristics and the content characteristics of the target characters are subjected to mixed processing to obtain the target characters corresponding to the target fonts, the character images of the target characters based on the multiple basic fonts are realized, the number of the fonts which are used for generating the reference characters is increased compared with the method of generating the target characters based on only one known font, and therefore the consistency of training results is increased, and the skeleton styles of the generated characters and the skeleton styles of the reference characters can be improved.
Fig. 11 is a flowchart of a font generating method according to another embodiment of the present application, and the embodiment may be applied to the server 12 in fig. 1. As shown in fig. 11, the method of the present embodiment may include:
step 111, obtaining a generation request sent by a terminal, wherein the generation request is used for requesting to generate a target character corresponding to a target font;
step 112, obtaining a target character corresponding to the target font under the condition that the generation request is obtained; the target text corresponding to the target font is obtained by adopting the following modes: determining style characteristics of the target fonts based on the text images of the reference characters of the target fonts, and obtaining a plurality of content characteristics of the target characters based on the text images of the target characters of the preset various basic fonts, wherein the skeleton styles of the different basic fonts are different; mixing the style characteristics and a plurality of content characteristics of the target characters to obtain target characters corresponding to the target fonts;
and 113, transmitting the target text corresponding to the target font to the terminal.
The generation request may, for example, carry an identifier of the target font and one or more target characters of a known font, which are used to represent the characters requesting to generate the target font, and in other embodiments, the generation request may also be represented by other manners to request to generate the target characters corresponding to the target font.
In one embodiment, obtaining the corresponding target text of the target font may specifically include: determining style characteristics of the target fonts based on the text images of the reference characters of the target fonts, and obtaining a plurality of content characteristics of the target characters based on the text images of the target characters of the preset various basic fonts, wherein the skeleton styles of the different basic fonts are different; and mixing the style characteristics and a plurality of content characteristics of the target characters to obtain the target characters corresponding to the target fonts.
In another embodiment, obtaining the target text corresponding to the target font may specifically include: receiving target characters corresponding to target fonts sent by other equipment, wherein the other equipment can determine style characteristics of the target fonts based on character images of reference characters of the target fonts, obtain a plurality of content characteristics of the target characters based on character images of target characters of preset various base fonts, and perform mixed processing on the style characteristics and the plurality of content characteristics of the target characters to obtain the target characters corresponding to the target fonts.
It should be noted that, regarding the specific manner of obtaining the target text corresponding to the target font, reference may be made to the description of the foregoing embodiment, which is not repeated herein.
After obtaining the target text corresponding to the target font, the target text corresponding to the target font may be sent to the terminal, so that the terminal may output, e.g., display, print, etc., the target text corresponding to the target text.
According to the font generation method provided by the embodiment of the application, the generation request is used for requesting to generate the target characters corresponding to the target fonts by acquiring the generation request sent by the terminal, the target characters corresponding to the target fonts are obtained under the condition that the generation request is acquired, and the target characters corresponding to the target fonts are sent to the terminal, wherein the target characters corresponding to the target fonts are obtained in the following mode: the method comprises the steps of determining the style characteristics of a target font based on the text image of a reference text of the target font, obtaining a plurality of content characteristics of the target text based on the text image of the target text of the preset plurality of base fonts, mixing the style characteristics and the plurality of content characteristics of the target text to obtain the target text corresponding to the target font, and realizing the text image generation of the target text corresponding to the target font returned to the terminal based on the target text of the plurality of base fonts, thereby improving the consistency of the skeleton style of the target text obtained by the terminal and the skeleton style of the reference text.
Fig. 12 is a flowchart of a font generating method according to another embodiment of the present application, and the embodiment may be applied to the terminal 11 in fig. 1. As shown in fig. 12, the method of the present embodiment may include:
step 121, a generation request is sent to a server in response to the operation of a user, wherein the generation request is used for requesting to generate target characters corresponding to a target font;
step 122, obtaining a target text corresponding to the target font sent by the server based on the generation request; the target text corresponding to the target font is obtained by adopting the following modes: determining style characteristics of the target fonts based on the text images of the reference characters of the target fonts, and obtaining a plurality of content characteristics of the target characters based on the text images of the target characters of the preset various basic fonts, wherein the skeleton styles of the different basic fonts are different; mixing the style characteristics and a plurality of content characteristics of the target characters to obtain target characters corresponding to the target fonts;
and step 123, outputting the target text corresponding to the target font.
The operation of the user may be, for example, any type of operation such as a touch operation, a voice operation, or the like, which can indicate that the user needs to generate a target text corresponding to the target font.
The specific mode of outputting the target text corresponding to the target font can be flexibly realized according to the requirement. Illustratively, step 123 may specifically include: and displaying the target characters corresponding to the target fonts and/or printing the target characters corresponding to the target fonts. Or, for example, step 123 may specifically include: and sending the target text corresponding to the target font to other equipment so as to be displayed and/or printed by the other equipment.
Note that the embodiment shown in fig. 12 is an embodiment on the terminal side corresponding to the embodiment shown in fig. 11, and as for the specific implementation on the server side, reference may be made to the description in the embodiment shown in fig. 11.
According to the font generation method provided by the embodiment of the application, the generation request is sent to the server in response to the operation of the user, the target characters corresponding to the target fonts sent by the generation request are obtained and output by the server based on the target characters corresponding to the target fonts, wherein the target characters corresponding to the target fonts are obtained in the following mode: the method comprises the steps of determining the style characteristics of a target font based on the text image of a reference text of the target font, obtaining a plurality of content characteristics of the target text based on the text image of the target text of the preset various basic fonts, mixing the style characteristics and the content characteristics of the target text to obtain the target text corresponding to the target font, realizing the text image generation of the target text corresponding to the target font output by the terminal based on the target text of the various basic fonts, and improving the consistency of the skeleton style of the output target text and the skeleton style of the reference text.
Fig. 13 is a flow chart of a training method of a font generation model according to an embodiment of the present application, and the embodiment may be applied to the terminal 11 or the server 12 in fig. 1. As shown in fig. 11, the method of the present embodiment may include:
step 131, constructing a basic model, wherein training parameters are set in the basic model, and the basic model is used for generating fonts based on the text images of the target text with a certain known font;
step 132, iteratively adjusting training parameters of the basic model by using the corresponding training samples until a second preset requirement is met, thereby obtaining the trained basic model;
step 133, adjusting the basic model into a model for generating fonts based on character images of target characters of preset multiple basic fonts, so as to obtain a font generation model to be trained, wherein the skeleton styles of different basic fonts are different;
and step 134, iteratively adjusting training parameters of the font generation model by utilizing the corresponding training samples until a second preset requirement is met, thereby obtaining the trained font generation model.
It should be noted that, regarding the specific manner of training to obtain the font generating model, reference may be made to the related description in the embodiment shown in fig. 2, which is not repeated herein.
According to the training method for the font generation model, the training parameters of the basic model for font generation based on the character images of the target characters of a certain known font are iteratively adjusted by utilizing the corresponding training samples until the second preset requirement is met, so that the trained basic model is obtained, the basic model is adjusted to be a model for font generation based on the internal character images of the target characters of preset multiple basic fonts, the font generation model to be trained is obtained, the training parameters of the font generation model are iteratively adjusted by utilizing the corresponding training samples until the second preset requirement is met, and therefore the trained font generation model is obtained, the purpose that the model capable of font generation based on the character images of the target characters of the certain known font is firstly constructed is achieved, and the font generation model capable of font generation based on the character images of the target characters of the multiple basic fonts is obtained based on the trained model, so that the training process of the model is facilitated to be simplified.
Fig. 14 is a schematic structural diagram of a font generating device according to an embodiment of the present application; referring to fig. 14, this embodiment provides a font generating device, which may perform the font generating method provided in the embodiment shown in fig. 2, and specifically, the device may include:
The obtaining module 141 is configured to obtain a text image of a reference text of a target font and a text image of a target text of a preset plurality of basic fonts, where skeleton styles of different basic fonts are different;
a feature module 142, configured to determine style characteristics of the target font based on the text image of the reference text, and obtain a plurality of content characteristics of the target text based on the text image of the target text;
and the mixing module 143 is configured to perform a mixing process on the style characteristic and the plurality of content characteristics of the target text, so as to obtain the target text corresponding to the target font.
Optionally, the mixing module 143 is specifically configured to: fusing a plurality of content features of the target text to obtain content fusion features; and mixing the style characteristics and the content fusion characteristics to obtain the target characters corresponding to the target fonts.
Optionally, the mixing module 143 is configured to fuse a plurality of content features of the target text to obtain a content fusion feature, including: and carrying out weighted summation on a plurality of content features of the target text based on the weight corresponding to the base font, and obtaining a content fusion feature.
Optionally, the weight corresponding to the base font is positively correlated with the similarity between the base font and the target font.
Optionally, the feature module 142 is configured to determine style characteristics of the target font based on the text image of the reference text, including: inputting the text image of the reference text into a style encoder in a font generation model for processing so as to determine the style characteristics of the target font;
the feature module 142 is configured to obtain a plurality of content features of the target text based on the text image of the target text, including: respectively inputting the text images of the target text of the multiple basic fonts into a content encoder in the font generation model for processing to obtain multiple content characteristics of the target font;
the mixing module 143 is configured to mix the style feature and the content fusion feature to obtain a target text corresponding to the target font, and includes: and inputting the style characteristics and the content fusion characteristics into a mixer in the font generation model for processing to obtain target characters corresponding to the target fonts.
Optionally, the feature module 142 is configured to input a text image of the reference text into a style encoder of the font generating model for processing to determine style characteristics of the target font, and includes: inputting the text image of the reference text into the style encoder for processing to obtain an initial value for inputting the style characteristics of the mixer; and iteratively adjusting the style characteristics input into the mixer by utilizing the corresponding training samples until a first preset requirement is met, so as to determine the style characteristics of the target fonts.
Optionally, the font generating model is obtained by training in the following manner:
constructing a basic model, wherein training parameters are arranged in the basic model, and the basic model is used for generating fonts based on a text image of a target text with a certain known font;
iteratively adjusting training parameters of the basic model by utilizing corresponding training samples until a second preset requirement is met, so as to obtain the trained basic model;
adjusting the basic model into a model for generating fonts based on the character images of the target characters of the multiple basic fonts, so as to obtain a font generation model to be trained;
and iteratively adjusting training parameters of the font generating model by utilizing corresponding training samples until the second preset requirement is met, thereby obtaining the trained font generating model.
Optionally, the second preset requirement is related to a projection distribution loss, where the projection distribution loss refers to a similarity between a first pixel value distribution obtained by accumulating, along a target projection direction, pixel values of a sample text image of a sample text of the target font and a second pixel value distribution obtained by accumulating, along the target projection direction, pixel values of a generated text image of the sample text of the target font.
Optionally, the plurality of base fonts is determined by: inputting the character images of the same characters with various known fonts into the basic model obtained by training for processing, and obtaining a plurality of content characteristics of the same characters; clustering the content features to obtain a plurality of class clusters, and determining the plurality of base fonts based on the class clusters.
The apparatus shown in fig. 14 may perform the method provided by the embodiment shown in fig. 2, and reference is made to the relevant description of the embodiment shown in fig. 2 for parts of this embodiment not described in detail. The implementation process and the technical effect of this technical solution are described in the embodiment shown in fig. 2, and are not described herein.
In one possible implementation, the structure of the apparatus shown in fig. 14 may be implemented as an electronic device. As shown in fig. 15, the electronic device may include: a processor 151 and a memory 152. Wherein the memory 152 stores a program supporting the controller to perform the method provided by the embodiment shown in fig. 2 described above, the processor 151 is configured to execute the program stored in the memory 152.
The program comprises one or more computer instructions, wherein the one or more computer instructions, when executed by processor 151, are capable of performing the steps of:
Acquiring a character image of a reference character of a target font and character images of target characters of preset multiple basic fonts, wherein the skeleton styles of different basic fonts are different;
determining style characteristics of the target font based on the text image of the reference text, and obtaining a plurality of content characteristics of the target text based on the text image of the target text;
and carrying out mixed processing on the style characteristics and the plurality of content characteristics of the target characters to obtain the target characters corresponding to the target fonts.
Optionally, the processor 151 is further configured to perform all or part of the steps in the embodiment shown in fig. 2.
The electronic device may also include a communication interface 153 in its structure for communicating with other devices or communication networks.
Fig. 16 is a schematic structural diagram of a font generating device according to another embodiment of the present application; referring to fig. 14, this embodiment provides a font generating device, which may perform the font generating method provided in the embodiment shown in fig. 11, and specifically, the device may include:
an obtaining module 161, configured to obtain a generation request sent by a terminal, where the generation request is used to request to generate a target text corresponding to a target font;
An obtaining module 162, configured to obtain, when the generation request is obtained, a target text corresponding to the target font; the target text corresponding to the target font is obtained by adopting the following mode: determining style characteristics of the target font based on the text image of the reference text of the target font, and obtaining a plurality of content characteristics of the target text based on the text images of the target text of the preset various basic fonts, wherein the skeleton styles of different basic fonts are different; mixing the style characteristics and a plurality of content characteristics of the target characters to obtain target characters corresponding to the target fonts;
and a sending module 163, configured to send the target text corresponding to the target font to the terminal.
The apparatus shown in fig. 16 may perform the method provided by the embodiment shown in fig. 11, and reference is made to the relevant description of the embodiment shown in fig. 11 for parts of this embodiment not described in detail. The implementation process and the technical effect of this technical solution are described in the embodiment shown in fig. 11, and are not described herein.
In one possible implementation, the apparatus shown in fig. 16 may be implemented as an electronic device, and may specifically be a server. As shown in fig. 17, the electronic device may include: a processor 171 and a memory 172. Wherein the memory 172 stores a program supporting the controller to perform the method provided by the embodiment shown in fig. 11 described above, the processor 171 is configured to execute the program stored in the memory 172.
The program comprises one or more computer instructions that when executed by the processor 171 are capable of performing the steps of:
acquiring a generation request sent by a terminal, wherein the generation request is used for requesting to generate target characters corresponding to target fonts;
under the condition that the generation request is acquired, acquiring target characters corresponding to the target fonts; the target text corresponding to the target font is obtained by adopting the following mode: determining style characteristics of the target font based on the text image of the reference text of the target font, and obtaining a plurality of content characteristics of the target text based on the text images of the target text of the preset various basic fonts, wherein the skeleton styles of different basic fonts are different; mixing the style characteristics and a plurality of content characteristics of the target characters to obtain target characters corresponding to the target fonts;
and sending the target text corresponding to the target font to the terminal.
Optionally, the processor 171 is further configured to perform all or part of the steps in the embodiment shown in fig. 11.
The electronic device may also include a communication interface 173 in its structure for communicating with other devices or communication networks.
Fig. 18 is a schematic structural diagram of a font generating device according to another embodiment of the present application; referring to fig. 18, this embodiment provides a font generating device, which may perform the font generating method provided in the embodiment shown in fig. 12, and specifically, the device may include:
a sending module 181, configured to send a generation request to a server in response to an operation of a user, where the generation request is used to request generation of a target text corresponding to a target font;
an obtaining module 182, configured to obtain a target text corresponding to the target font sent by the server based on the generation request; the target text corresponding to the target font is obtained by adopting the following mode: determining style characteristics of the target font based on the text image of the reference text of the target font, and obtaining a plurality of content characteristics of the target text based on the text images of the target text of the preset various basic fonts, wherein the skeleton styles of different basic fonts are different; mixing the style characteristics and a plurality of content characteristics of the target characters to obtain target characters corresponding to the target fonts;
and the output module 183 is configured to output a target text corresponding to the target font.
The apparatus shown in fig. 18 may perform the method provided by the embodiment shown in fig. 12, and reference is made to the relevant description of the embodiment shown in fig. 12 for parts of this embodiment not described in detail. The implementation process and the technical effect of this technical solution are described in the embodiment shown in fig. 12, and are not described herein.
In one possible implementation, the apparatus shown in fig. 18 may be implemented as an electronic device, and in particular may be a terminal. As shown in fig. 19, the electronic device may include: a processor 191 and a memory 192. Wherein the memory 192 stores a program supporting the controller to perform the method provided by the embodiment shown in fig. 12 described above, the processor 191 is configured to execute the program stored in the memory 192.
The program comprises one or more computer instructions, wherein the one or more computer instructions, when executed by the processor 191, are capable of performing the steps of:
sending a generation request to a server in response to the operation of a user, wherein the generation request is used for requesting to generate target characters corresponding to a target font;
acquiring target characters corresponding to the target fonts sent by the server based on the generation request; the target text corresponding to the target font is obtained by adopting the following mode: determining style characteristics of the target font based on the text image of the reference text of the target font, and obtaining a plurality of content characteristics of the target text based on the text images of the target text of the preset various basic fonts, wherein the skeleton styles of different basic fonts are different; mixing the style characteristics and a plurality of content characteristics of the target characters to obtain target characters corresponding to the target fonts;
And outputting the target text corresponding to the target font.
Optionally, the processor 191 is further configured to perform all or part of the steps in the foregoing embodiment shown in fig. 12.
The electronic device may also include a communication interface 193 in its architecture for communicating with other devices or communication networks.
FIG. 20 is a schematic structural diagram of a training device for a font generation model according to an embodiment of the present application; referring to fig. 20, this embodiment provides a training apparatus for a font generating model, which may perform the training method for a font generating model provided in the embodiment shown in fig. 13, and specifically the apparatus may include:
the building module 201 is configured to build a basic model, where training parameters are set in the basic model, and the basic model is configured to perform font generation based on a text image of a target text with a certain known font;
the first training module 202 is configured to iteratively adjust training parameters of the base model by using corresponding training samples until a second preset requirement is met, thereby obtaining a trained base model;
the adjustment module 203 is configured to adjust the basic model to a model for performing font generation based on a text image of a target text of a preset plurality of basic fonts, so as to obtain a font generation model to be trained, where skeleton styles of different basic fonts are different;
And the second training module 204 is configured to iteratively adjust training parameters of the font generating model by using corresponding training samples until the second preset requirement is met, thereby obtaining the trained font generating model.
The apparatus shown in fig. 20 may perform the method provided by the embodiment shown in fig. 13, and reference is made to the relevant description of the embodiment shown in fig. 13 for parts of this embodiment not described in detail. The implementation process and the technical effect of this technical solution are described in the embodiment shown in fig. 13, and are not described herein.
In one possible implementation, the structure of the apparatus shown in fig. 20 may be implemented as an electronic device. As shown in fig. 21, the electronic device may include: a processor 211 and a memory 212. Wherein the memory 212 stores a program supporting the controller to perform the method provided by the embodiment shown in fig. 13 described above, the processor 211 is configured to execute the program stored in the memory 212.
The program comprises one or more computer instructions, wherein the one or more computer instructions, when executed by the processor 211, are capable of performing the steps of:
constructing a basic model, wherein training parameters are arranged in the basic model, and the basic model is used for generating fonts based on a text image of a target text with a certain known font;
Iteratively adjusting training parameters of the basic model by utilizing corresponding training samples until a second preset requirement is met, so as to obtain the trained basic model;
the basic model is adjusted to be a model for generating fonts based on character images of target characters of preset multiple basic fonts, and a font generation model to be trained is obtained, wherein the skeleton styles of different basic fonts are different;
and iteratively adjusting training parameters of the font generating model by utilizing corresponding training samples until the second preset requirement is met, thereby obtaining the trained font generating model.
Optionally, the processor 211 is further configured to perform all or part of the steps in the embodiment shown in fig. 13.
The electronic device may also include a communication interface 213 in its structure for communicating with other devices or communication networks.
The present embodiments also provide a computer readable storage medium having a computer program stored thereon, which when executed, implements a method as described in the embodiment shown in fig. 2.
The present embodiments also provide a computer readable storage medium having a computer program stored thereon, which when executed, implements a method as described in the embodiment shown in fig. 11.
The present embodiments also provide a computer readable storage medium having a computer program stored thereon, which when executed, implements the method described in the embodiment shown in fig. 12.
The present embodiments also provide a computer readable storage medium having a computer program stored thereon, which when executed, implements a method as described in the embodiment shown in fig. 13.
The apparatus embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.
From the above description of the embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by adding necessary general purpose hardware platforms, or may be implemented by a combination of hardware and software. Based on such understanding, the foregoing aspects, in essence and portions contributing to the art, may be embodied in the form of a computer program product, which may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
In one typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
The memory may include volatile memory in a computer-readable medium, random Access Memory (RAM) and/or nonvolatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of computer-readable media.
Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, linked lists, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can store information that can be accessed by a computing device. Computer-readable media, as defined herein, does not include transitory computer-readable media (transmission media), such as modulated data signals and carrier waves.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some or all of the technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit of the corresponding technical solutions from the scope of the technical solutions of the embodiments of the present application.

Claims (18)

1. A font generation method, comprising:
acquiring a character image of a reference character of a target font and character images of target characters of preset multiple basic fonts, wherein the skeleton styles of different basic fonts are different;
determining style characteristics of the target font based on the text image of the reference text, and obtaining a plurality of content characteristics of the target text based on the text image of the target text;
and carrying out mixed processing on the style characteristics and the plurality of content characteristics of the target characters to obtain the target characters corresponding to the target fonts.
2. The method according to claim 1, wherein the mixing the style characteristic and the plurality of content characteristics of the target text to obtain the target text corresponding to the target font includes:
Fusing a plurality of content features of the target text to obtain content fusion features;
and mixing the style characteristics and the content fusion characteristics to obtain the target characters corresponding to the target fonts.
3. The method of claim 2, wherein the fusing the plurality of content features of the target text to obtain content fusion features comprises: and carrying out weighted summation on a plurality of content features of the target text based on the weight corresponding to the base font, and obtaining a content fusion feature.
4. A method according to claim 3, wherein the weight corresponding to the base font is positively correlated with the degree of similarity between the base font and the target font.
5. The method of claim 2, wherein the determining style characteristics of the target font based on the text image of the reference text comprises: inputting the text image of the reference text into a style encoder in a font generation model for processing so as to determine the style characteristics of the target font;
the obtaining a plurality of content features of the target text based on the text image of the target text comprises the following steps: respectively inputting the text images of the target text of the multiple basic fonts into a content encoder in the font generation model for processing to obtain multiple content characteristics of the target font;
The step of mixing the style characteristics and the content fusion characteristics to obtain the target text corresponding to the target font comprises the following steps: and inputting the style characteristics and the content fusion characteristics into a mixer in the font generation model for processing to obtain target characters corresponding to the target fonts.
6. The method of claim 5, wherein the inputting the text image of the reference text into the style encoder of the font generation model for processing to determine the style characteristics of the target font comprises:
inputting the text image of the reference text into the style encoder for processing to obtain an initial value for inputting the style characteristics of the mixer;
and iteratively adjusting the style characteristics input into the mixer by utilizing the corresponding training samples until a first preset requirement is met, so as to determine the style characteristics of the target fonts.
7. The method of claim 5, wherein the font generation model is trained by:
constructing a basic model, wherein training parameters are arranged in the basic model, and the basic model is used for generating fonts based on a text image of a target text with a certain known font;
Iteratively adjusting training parameters of the basic model by utilizing corresponding training samples until a second preset requirement is met, so as to obtain the trained basic model;
adjusting the basic model into a model for generating fonts based on the character images of the target characters of the multiple basic fonts, so as to obtain a font generation model to be trained;
and iteratively adjusting training parameters of the font generating model by utilizing corresponding training samples until the second preset requirement is met, thereby obtaining the trained font generating model.
8. The method of claim 7, wherein the second preset requirement relates to a projection distribution loss, the projection distribution loss being a similarity between a first distribution of pixel values obtained by accumulating pixel values of a sample text image of a sample text of the target font along a target projection direction and a second distribution of pixel values obtained by accumulating pixel values of a generated text image of the sample text of the target font along the target projection direction.
9. The method of claim 7, wherein the plurality of base fonts are determined by:
Inputting the character images of the same characters with various known fonts into the basic model obtained by training for processing, and obtaining a plurality of content characteristics of the same characters;
clustering the content features to obtain a plurality of class clusters, and determining the plurality of base fonts based on the class clusters.
10. A font generation method, comprising:
acquiring a generation request sent by a terminal, wherein the generation request is used for requesting to generate target characters corresponding to target fonts;
under the condition that the generation request is acquired, acquiring target characters corresponding to the target fonts; the target text corresponding to the target font is obtained by adopting the following mode: determining style characteristics of the target font based on the text image of the reference text of the target font, and obtaining a plurality of content characteristics of the target text based on the text images of the target text of the preset various basic fonts, wherein the skeleton styles of different basic fonts are different; mixing the style characteristics and a plurality of content characteristics of the target characters to obtain target characters corresponding to the target fonts;
and sending the target text corresponding to the target font to the terminal.
11. A font generation method, comprising:
sending a generation request to a server in response to the operation of a user, wherein the generation request is used for requesting to generate target characters corresponding to a target font;
acquiring target characters corresponding to the target fonts sent by the server based on the generation request; the target text corresponding to the target font is obtained by adopting the following mode: determining style characteristics of the target font based on the text image of the reference text of the target font, and obtaining a plurality of content characteristics of the target text based on the text images of the target text of the preset various basic fonts, wherein the skeleton styles of different basic fonts are different; mixing the style characteristics and a plurality of content characteristics of the target characters to obtain target characters corresponding to the target fonts;
and outputting the target text corresponding to the target font.
12. A method for training a font generation model, comprising:
constructing a basic model, wherein training parameters are arranged in the basic model, and the basic model is used for generating fonts based on a text image of a target text with a certain known font;
Iteratively adjusting training parameters of the basic model by utilizing corresponding training samples until a second preset requirement is met, so as to obtain the trained basic model;
the basic model is adjusted to be a model for generating fonts based on character images of target characters of preset multiple basic fonts, and a font generation model to be trained is obtained, wherein the skeleton styles of different basic fonts are different;
and iteratively adjusting training parameters of the font generating model by utilizing corresponding training samples until the second preset requirement is met, thereby obtaining the trained font generating model.
13. A font generating device, comprising:
the acquisition module is used for acquiring the character images of the reference characters of the target fonts and the character images of the target characters of the preset multiple basic fonts, wherein the skeleton styles of the different basic fonts are different;
the characteristic module is used for determining style characteristics of the target font based on the character image of the reference character and obtaining a plurality of content characteristics of the target character based on the character image of the target character;
and the mixing module is used for carrying out mixing processing on the style characteristics and the content characteristics of the target characters so as to obtain the target characters corresponding to the target fonts.
14. A font generating device, comprising:
the acquisition module is used for acquiring a generation request sent by the terminal, wherein the generation request is used for requesting to generate target characters corresponding to the target fonts;
the obtaining module is used for obtaining the target characters corresponding to the target fonts under the condition that the generation request is obtained; the target text corresponding to the target font is obtained by adopting the following mode: determining style characteristics of the target font based on the text image of the reference text of the target font, and obtaining a plurality of content characteristics of the target text based on the text images of the target text of the preset various basic fonts, wherein the skeleton styles of different basic fonts are different; mixing the style characteristics and a plurality of content characteristics of the target characters to obtain target characters corresponding to the target fonts;
and the sending module is used for sending the target text corresponding to the target font to the terminal.
15. A font generating device, comprising:
the sending module is used for responding to the operation of the user and sending a generation request to the server, wherein the generation request is used for requesting to generate target characters corresponding to the target fonts;
The acquisition module is used for acquiring target characters corresponding to the target fonts sent by the server based on the generation request; the target text corresponding to the target font is obtained by adopting the following mode: determining style characteristics of the target font based on the text image of the reference text of the target font, and obtaining a plurality of content characteristics of the target text based on the text images of the target text of the preset various basic fonts, wherein the skeleton styles of different basic fonts are different; mixing the style characteristics and a plurality of content characteristics of the target characters to obtain target characters corresponding to the target fonts;
and the output module is used for outputting the target characters corresponding to the target fonts.
16. A training device for a font generation model, comprising:
the building module is used for building a basic model, training parameters are set in the basic model, and the basic model is used for generating fonts based on the text images of the target text of a certain known font;
the first training module is used for iteratively adjusting training parameters of the basic model by utilizing corresponding training samples until a second preset requirement is met, so that the trained basic model is obtained;
The adjusting module is used for adjusting the basic model into a model for generating fonts based on character images of target characters of preset multiple basic fonts, so as to obtain a font generating model to be trained, wherein the skeleton styles of different basic fonts are different;
and the second training module is used for iteratively adjusting training parameters of the font generation model by utilizing corresponding training samples until the second preset requirement is met, so that the trained font generation model is obtained.
17. An electronic device, comprising: a memory, a processor; wherein the memory stores one or more computer instructions that, when executed by the processor, implement the method of any of claims 1 to 12.
18. A computer readable storage medium, characterized in that a computer program is stored thereon, which, when executed, implements the method according to any of claims 1 to 12.
CN202211665987.XA 2022-12-23 2022-12-23 Font generation method, training method, device and equipment of font generation model Pending CN116152368A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211665987.XA CN116152368A (en) 2022-12-23 2022-12-23 Font generation method, training method, device and equipment of font generation model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211665987.XA CN116152368A (en) 2022-12-23 2022-12-23 Font generation method, training method, device and equipment of font generation model

Publications (1)

Publication Number Publication Date
CN116152368A true CN116152368A (en) 2023-05-23

Family

ID=86372774

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211665987.XA Pending CN116152368A (en) 2022-12-23 2022-12-23 Font generation method, training method, device and equipment of font generation model

Country Status (1)

Country Link
CN (1) CN116152368A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117236284A (en) * 2023-11-13 2023-12-15 江西师范大学 Font generation method and device based on style information and content information adaptation

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117236284A (en) * 2023-11-13 2023-12-15 江西师范大学 Font generation method and device based on style information and content information adaptation

Similar Documents

Publication Publication Date Title
CN109816589B (en) Method and apparatus for generating cartoon style conversion model
AU2016349518B2 (en) Edge-aware bilateral image processing
CN110458918B (en) Method and device for outputting information
US10846870B2 (en) Joint training technique for depth map generation
EP3968280A1 (en) Target tracking method and apparatus, storage medium and electronic device
CN109829432B (en) Method and apparatus for generating information
CN111275784B (en) Method and device for generating image
CN109919110B (en) Video attention area detection method, device and equipment
CN112348081A (en) Transfer learning method for image classification, related device and storage medium
CN112330684B (en) Object segmentation method and device, computer equipment and storage medium
CN112348828A (en) Example segmentation method and device based on neural network and storage medium
CN110765882A (en) Video tag determination method, device, server and storage medium
CN108921801B (en) Method and apparatus for generating image
CN110929041A (en) Entity alignment method and system based on layered attention mechanism
CN116152368A (en) Font generation method, training method, device and equipment of font generation model
CN110717555B (en) Picture generation system and device based on natural language and generation countermeasure network
CN113657087B (en) Information matching method and device
CN113516697A (en) Image registration method and device, electronic equipment and computer-readable storage medium
CN117033599A (en) Digital content generation method and related equipment
CN112069412A (en) Information recommendation method and device, computer equipment and storage medium
CN116630768A (en) Target detection method and device, electronic equipment and storage medium
CN111401347B (en) Information positioning method and device based on picture
CN114493674A (en) Advertisement click rate prediction model and method
CN111914850B (en) Picture feature extraction method, device, server and medium
CN112818997A (en) Image synthesis method and device, electronic equipment and computer-readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination