CN116306496A

CN116306496A - Character generation method, training method and device of character generation model

Info

Publication number: CN116306496A
Application number: CN202310271377.XA
Authority: CN
Inventors: 郑程耀; 周航; 范敏虎
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2023-03-17
Filing date: 2023-03-17
Publication date: 2023-06-23
Anticipated expiration: 2043-03-17
Also published as: CN116306496B

Abstract

The disclosure provides a character generation method, a training method and a training device of a character generation model, relates to the field of artificial intelligence, in particular to the fields of computer vision, deep learning and the like, and can be applied to scenes such as AIGC. The specific implementation scheme is as follows: acquiring a target text image, a first reference style word image and first noise; the first noise is used for carrying out noise adding processing on the target text image; obtaining target noise according to the target text image, the first reference style word image and the first noise; and determining a target style word image corresponding to the target word image according to the target noise.

Description

Character generation method, training method and device of character generation model

Technical Field

The disclosure relates to the technical field of artificial intelligence, in particular to the fields of computer vision, deep learning and the like, and can be applied to scenes such as AIGC (AI Generated Content) and the like. The disclosure relates in particular to a character generation method and apparatus, a training method of a character generation model, an apparatus, an electronic device, a storage medium and a computer program product.

Background

In some scenes, a new style of character may be generated through image style migration techniques to present a diversified style of artwork, enhancing visual effects. However, the related art has disadvantages in terms of font accuracy and font style stability in generating a new style of character.

Disclosure of Invention

The present disclosure provides a character generation method and apparatus, a training method of a character generation model, an apparatus, an electronic device, a storage medium, and a computer program product.

According to an aspect of the present disclosure, there is provided a character generating method including: acquiring a target text image, a first reference style word image and first noise; the first noise is used for carrying out noise adding processing on the target text image; obtaining target noise according to the target text image, the first reference style word image and the first noise; and determining a target style word image corresponding to the target word image according to the target noise.

According to another aspect of the present disclosure, there is provided a training method of a character generation model, including: acquiring a sample style word image, a sample word image and a second reference style word image; the style type of the sample style word image is the same as that of the second reference style word image; determining a style noise word image according to the second noise and the sample style word image; the second noise is used for carrying out noise adding processing on the sample style word image; inputting the second reference style word image, the sample word image and the style noise word image into a character generation model to obtain sample noise; and adjusting parameters of the character generation model according to the sample noise and the second noise.

According to another aspect of the present disclosure, there is provided a character generating apparatus including: the first acquisition module is used for acquiring a target text image, a first reference style word image and first noise; the first noise is used for carrying out noise adding processing on the target text image; the first determining module is used for obtaining target noise according to the target text image, the first reference style word image and the first noise; and the second determining module is used for determining a target style word image corresponding to the target word image according to the target noise.

According to another aspect of the present disclosure, there is provided a training apparatus of a character generation model, including: the second acquisition module is used for acquiring a sample style word image, a sample word image and a second reference style word image; the style type of the sample style word image is the same as that of the second reference style word image; the third determining module is used for determining the style noise word image according to the second noise and the sample style word image; the second noise is used for carrying out noise adding processing on the sample style word image; a fourth determining module, configured to input a second reference style word image, a sample word image, and a style noise word image into a character generating model, to obtain sample noise; and an adjustment module for adjusting parameters of the character generation model according to the sample noise and the second noise.

According to another aspect of the present disclosure, there is provided an electronic device including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a method provided in accordance with the present disclosure.

According to another aspect of the present disclosure, there is provided a non-transitory computer-readable storage medium storing computer instructions for causing a computer to perform a method provided according to the present disclosure.

According to another aspect of the present disclosure, there is provided a computer program product comprising a computer program which, when executed by a processor, implements a method provided according to the present disclosure.

It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification.

Drawings

The drawings are for a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

FIG. 1 is a schematic diagram of an exemplary system architecture to which character generation methods and apparatus, training methods and apparatus of character generation models may be applied, according to embodiments of the present disclosure;

FIG. 2 is a flow chart of a character generation method according to an embodiment of the present disclosure;

FIG. 3A is an effect diagram of a target style word image generated based on a plurality of first reference style word images according to an embodiment of the present disclosure;

FIG. 3B is a schematic diagram of a character generation method according to an embodiment of the present disclosure;

FIG. 4 is a flow chart of a training method of a character generation model according to an embodiment of the present disclosure;

FIG. 5 is a schematic diagram of a training method of a character generation model according to an embodiment of the present disclosure;

FIG. 6 is a block diagram of a character generating apparatus according to an embodiment of the present disclosure;

FIG. 7 is a block diagram of a training apparatus of a character generation model according to an embodiment of the present disclosure; and

fig. 8 is a block diagram of an electronic device used to implement the character generation method and training method of the character generation model of the embodiments of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present disclosure to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. The terms "comprises," "comprising," and/or the like, as used herein, specify the presence of stated features, steps, operations, and/or components, but do not preclude the presence or addition of one or more other features, steps, operations, or components.

All terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art unless otherwise defined. It should be noted that the terms used herein should be construed to have meanings consistent with the context of the present specification and should not be construed in an idealized or overly formal manner.

Where expressions like at least one of "A, B and C, etc. are used, the expressions should generally be interpreted in accordance with the meaning as commonly understood by those skilled in the art (e.g.," a system having at least one of A, B and C "shall include, but not be limited to, a system having a alone, B alone, C alone, a and B together, a and C together, B and C together, and/or A, B, C together, etc.).

FIG. 1 is a schematic diagram of an exemplary system architecture to which character generation methods and apparatus, training methods and apparatus of character generation models may be applied, according to embodiments of the present disclosure. It should be noted that fig. 1 is only an example of a system architecture to which embodiments of the present disclosure may be applied to assist those skilled in the art in understanding the technical content of the present disclosure, but does not mean that embodiments of the present disclosure may not be used in other devices, systems, environments, or scenarios.

As shown in fig. 1, a system architecture 100 according to this embodiment may include

terminal devices

101, 102, 103, a network 104, and a server 105. The network 104 is used as a medium to provide communication links between the

terminal devices

101, 102, 103 and the server 105. The network 104 may include various connection types, such as wired and/or wireless communication links, and the like.

The user may interact with the server 105 via the network 104 using the

terminal devices

101, 102, 103 to receive or send messages or the like. Various client applications can be installed on the

terminal devices

101, 102, 103. For example, image processing class applications, web browser applications, search class applications, instant messaging tools, mailbox clients or social platform software, and the like (just examples).

The

terminal devices

101, 102, 103 may be a variety of electronic devices having a display screen and supporting web browsing, including but not limited to smartphones, tablets, laptop and desktop computers, and the like.

The server 105 may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing basic cloud computing services such as cloud computing, network service, and middleware service.

The server 105 may be a server providing various services, such as a background management server (by way of example only) providing support for websites browsed by users using the

terminal devices

101, 102, 103. The background management server may analyze and process the received data such as the user request, and feed back the processing result (e.g., the web page, information, or data obtained or generated according to the user request) to the terminal device.

For example, the server 105 may acquire the target text image, the first reference style word image, and the first noise from the

terminal devices

101, 102, 103 through the network 104, and obtain the target noise based on the target text image, the first reference style word image, and the first noise. Then, a target style word image corresponding to the target word image is determined from the target noise. The server 105 may also send the target style word image to the

terminal devices

101, 102, 103.

Note that the character generating method provided by the embodiment of the present disclosure may be generally performed by the server 105. Accordingly, the character generating apparatus provided by the embodiments of the present disclosure may be generally provided in the server 105.

Alternatively, the character generating method provided by the embodiments of the present disclosure may also be performed by a server or a server cluster that is different from the server 105 and is capable of communicating with the

terminal devices

101, 102, 103 and/or the server 105. Accordingly, the character generating apparatus provided by the embodiments of the present disclosure may also be provided in a server or a server cluster that is different from the server 105 and is capable of communicating with the

terminal devices

101, 102, 103 and/or the server 105.

Alternatively, the character generating method provided by the embodiments of the present disclosure may be generally performed by the

terminal device

101, 102, or 103. Accordingly, the character generating apparatus provided by the embodiments of the present disclosure may also be provided in the

terminal device

101, 102, or 103.

It should be noted that, the training method of the character generation model provided in the embodiment of the present disclosure may be generally performed by the server 105. Accordingly, the training apparatus of the character generation model provided in the embodiments of the present disclosure may be generally disposed in the server 105. The training method of the character generation model provided by the embodiments of the present disclosure may also be performed by a server or a server cluster that is different from the server 105 and is capable of communicating with the

terminal devices

101, 102, 103 and/or the server 105. Accordingly, the training apparatus of the character generation model provided by the embodiments of the present disclosure may also be provided in a server or a server cluster that is different from the server 105 and is capable of communicating with the

terminal devices

101, 102, 103 and/or the server 105.

It should be understood that the number of terminal devices, networks and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

It should be noted that the sequence numbers of the respective operations in the following methods are merely representative of the operations for the purpose of description, and should not be construed as representing the order of execution of the respective operations. The method need not be performed in the exact order shown unless explicitly stated.

Fig. 2 is a flowchart of a character generating method according to an embodiment of the present disclosure.

As shown in fig. 2, the character generating method 200 may include operations S210 to S230, for example.

In operation S210, a target text image, a first reference style word image, and first noise are acquired.

In operation S220, a target noise is obtained from the target text image, the first reference style word image, and the first noise.

In operation S230, a target style word image corresponding to the target word image is determined according to the target noise.

According to embodiments of the present disclosure, a target text image may refer to, for example, an image formed of characters that need to be converted to a target font style. The number of the target text images may be at least one. It should be noted that, the characters described in the present disclosure may include, for example, characters such as chinese, western, or numerals, and the type of the characters is not limited in the present disclosure.

The font style of the characters in the target text image may refer to, for example, a conventional font of the characters, such as bold, song, clerical, or regular script fonts in Chinese characters, or a New Roman (Times New Roman) font, calibri font, etc. in Western characters.

The first reference style word image may refer to, for example, an image formed of characters having a target font style. The number of first reference style word images may be at least one. The font style of the characters in the first reference style character image may include a font style existing in the related art or a handwriting font style, etc. The content of the characters in the first reference style word image may be the same as or different from the content of the characters in the target word image, and is not limited herein.

In the embodiment of the disclosure, the font style of the characters in the first reference style character image can be used as a reference to adjust the font style of the characters in the target character image, so that conversion of different font styles is realized, and the number of fonts in a new style is increased.

According to embodiments of the present disclosure, the first noise may be used, for example, to noise the target text image. The first noise may be set according to actual needs, which is not limited by the present disclosure.

In the embodiment of the disclosure, the target noise can be obtained by performing noise adding processing on the target text image and combining font style information of characters in the first reference style character image. The target noise may characterize the amount of noise generated on the target text image after style migration (i.e., the target style word image) after style migration of the font style of the characters in the first reference style word image to the characters in the target text image.

From the target noise, a target style word image corresponding to the target word image may be determined. For example, denoising the target text image after style migration according to the target noise and the first noise, thereby obtaining the target style text image. The content of the characters in the target style word image is the same as the content of the characters in the target word image, and the font style of the characters in the target style word image is the same as the font style of the characters in the first reference style word image. Thereby, the font style migration of the characters in the first reference style character image into the target character image is realized.

It will be appreciated that in migrating the font style of the characters in the first reference style character image to the target character image, the noise applied to the target character image does not substantially affect the style migration of the characters and the glyphs of the characters in the image. Therefore, in the font style migration process, noise is utilized to process the target text image, so that the stability and the font accuracy of the font style are maintained.

According to the embodiment of the disclosure, the font style of the characters in the target text image is adjusted by taking the font style of the characters in the first reference style character image as a reference, and the target text image is processed by combining noise, so that the fonts of the characters in the target style character image can be consistent with the fonts of the characters in the target text image, and the font style of the characters in the target style character image is consistent with the font style of the characters in the first reference style character image, thereby improving the font accuracy and the font style stability of the characters in the target style character image.

In some embodiments, a character having a new font style may also be generated from the plurality of first reference style word images.

For example, fusion coefficients corresponding to each of the plurality of first reference style word images may be obtained. The fusion coefficient can be used for representing the fusion degree of the first reference style word image in style fusion. And determining a fusion style feature vector according to the fusion coefficients corresponding to the plurality of first reference style word images and the plurality of first reference style word images. Then, the target noise can be obtained according to the target text image, the fusion style feature vector and the first noise. Then, a target style word image corresponding to the target word image is determined based on the target noise.

First, feature extraction may be performed on a plurality of first reference style word images, respectively, to obtain reference style feature vectors corresponding to the plurality of first reference style word images, respectively. And then, based on the fusion coefficient corresponding to each first reference style word image, fusing the reference style feature vectors corresponding to the plurality of first reference style word images to obtain a fusion style feature vector.

It should be noted that, the fusion coefficient corresponding to each first reference style word image may be set according to actual needs, which is not limited in this disclosure. For example, in one example, the corresponding fusion coefficients for the respective first reference style word images may be the same, or partially the same, or completely different from each other.

Next, the target noise may be obtained from the target text image, the fusion style feature vector, and the first noise based on the manner described above. And then determining a target style word image corresponding to the target word image according to the target noise and the first noise. Here, the content of the characters in the target style word image is the same as the content of the characters in the target character image, but the font style of the characters in the target style word image is different from the font style of the characters in the plurality of first reference style word images, and a new font style is generated. By the method, the new font style can be generated by utilizing the existing multiple font style types, so that the style types of the fonts are further expanded, and the number of characters in the new style is increased.

In addition, after the target style character image with the new font style is acquired, the font style of the characters in the target character image can be adjusted by taking the font style of the characters in the target style character image as a reference, so that the number of the characters in the new style can be increased.

Fig. 3A is an effect diagram of a target style word image generated based on a plurality of first reference style word images according to an embodiment of the present disclosure. The scheme of the present disclosure is exemplified below with reference to fig. 3A.

As shown in fig. 3A, for a plurality of target text images, such as target text images "fill", "gizzard", "disaster", "hub", "condition", "change", "bore", "pass" and the like, font style migration may be performed using a plurality of first reference style word images (e.g., first reference style word images 1-4) and first noise to obtain target style word images corresponding to each of the plurality of target text images. The font style of the characters in the plurality of target text images may be, for example, bold (only an example).

The font style migration process using the plurality of first reference style word images and the first noise is the same for each of the plurality of target word images. For ease of description, the font style migration process is described below using the target text image "fill" as an example. In addition, in order to facilitate comparison of the variation of font style, in the embodiment of the present disclosure, description is given taking as an example that the character content in the plurality of first reference style character images is the same as the character content in the target character image, but the present disclosure is not limited thereto.

For example, a fusion coefficient corresponding to each of a plurality of first reference style word images (e.g., "fills" in first reference style word images 1-4) may be obtained. Illustratively, the fusion coefficients for the "fill" in the first reference style word images 1-4 are each 0.25.

Next, a fused style feature vector is determined based on the plurality of first reference style word images and the fused coefficients corresponding to the plurality of first reference style word images.

For example, feature extraction is performed on the 4 first reference style word image "bins" respectively, so as to obtain reference style feature vectors corresponding to the 4 first reference style word image "bins" respectively. And then, based on the fusion coefficients corresponding to the "irrigation" of each first reference style word image, fusing the reference style feature vectors corresponding to the "irrigation" of the 4 first reference style word images to obtain fusion style feature vectors.

And then, according to the target text image 'filling', merging the style characteristic vector and the first noise, and obtaining the target noise. A target style word image (target style word image "pot" as shown in fig. 3A) corresponding to the target text image is then determined based on the target noise and the first noise.

As shown in fig. 3A, the character content of the target style word image "can" obtained in the above manner is the same as that of the target word image "can", but the font style of the two is different. The font style of the "fill" of the target style word image is different from the font style of the first reference style word images 1 to 4, that is, the first reference style word images 1 to 4 and the first noise are used to perform font style migration, so that a new font style is generated.

Similarly, new font styles (as shown in FIG. 3A) can also be generated in the manner described above for other target text images of the plurality of target text images, such as target text images "gizzard", "disaster", "hub", "condition", "pass", and the like.

It should be noted that the scheme of the present disclosure is applicable to font style conversion of chinese characters, western characters (including letters and words), numeric characters, and other suitable characters of other font styles in addition to chinese characters as shown in fig. 3A.

In some embodiments, the target noise may also be derived based on the target text image, the first reference style word image, and the first noise using a character generation model.

Fig. 3B is a schematic diagram of a character generating method according to an embodiment of the present disclosure. An example process of generating target noise using the character generation model is described below with reference to fig. 3B.

As depicted in FIG. 3B, character generation model 300 may include, for example, a style coding network 310 and a diffusion model 320.

For example, after the first noise 301, the target text image 302, and the first reference style word image 303 are acquired, a first reference style feature vector may be derived based on the first reference style word image using the style encoding network 310. Then, a diffusion model 320 is used to obtain a target noise 304 based on the target text image 302, the first reference style feature vector, and the first noise 301. Thereafter, a target style word image 305 corresponding to the target word image 302 may be determined from the target noise 304.

In the embodiment of the disclosure, the character generating model 300 is a pre-trained model, and the training method of the character generating model 300 will be described later, which is not repeated here.

According to the embodiment of the disclosure, the target noise is obtained through processing by using the trained character generation model, so that the output efficiency and accuracy of the target noise can be improved, and the accuracy and the generation efficiency of the target style word image are improved.

In some embodiments, the target text image and the first reference style word image may be binarized before the target noise is derived from the target text image, the first reference style word image, and the first noise. Thereby, the disturbing pixels in the respective images can be removed, so that the subsequent processing is not disturbed by the pixel values of the other gray values.

In some examples, before the target noise is obtained from the target text image, the first reference style word image, and the first noise, both the target text image and the first reference style word image may also be scaled to a preset size, for example, 128×128 size, so that the images may be processed at the same scale.

Fig. 4 is a flowchart of a training method of a character generation model according to an embodiment of the present disclosure.

As shown in FIG. 4, the training method 400 of the character generation model may include operations S410-S440, for example.

In operation S410, a sample style word image, a sample word image, and a second reference style word image are acquired.

In operation S420, a style noise word image is determined from the second noise and the sample style word image.

In operation S430, the second reference style word image, the sample word image, and the style noise word image are input into the character generation model, resulting in sample noise.

In operation S440, parameters of the character generation model are adjusted according to the sample noise and the second noise.

According to embodiments of the present disclosure, a sample text image may refer to, for example, an image formed of characters that need to be converted to a target font style. The number of sample text images may be at least one.

The font style of the characters in the sample text image may refer to, for example, the regular font of the characters, such as bold, song, clerical, regular script, etc. fonts in Chinese characters, or the New Roman (Times New Roman) font, calibri font, etc. in Western characters.

The sample style word image refers to, for example, an image formed of characters having a target font style. The sample style word image and the sample word image have the same character content, and the font style may be the same or different.

The second reference style word image may refer to, for example, an image formed of characters having a target font style. The number of second reference style word images may be at least one. The font style of the characters in the second reference style character image may include a font style existing in the related art or a handwriting font style, etc.

The style of the second reference style word image is the same as the style of the sample style word image, i.e. the font style of the second reference style word image is the same as the font style of the sample style word image. The character content of the second reference style word image may be different from the character content of the sample style word image.

In the embodiment of the disclosure, the font style of the characters in the sample text image can be adjusted by taking the font style of the characters in the second reference style character image as a reference, so as to realize conversion of different font styles.

According to embodiments of the present disclosure, the second noise may be used to noise the sample style-word image to obtain a style-noise word image. The character generation model is trained based on the stylistic noise word image such that the character generation model gains the ability to recover character information from noise. In this way, even if the style noise word image is replaced with noise when the model is subsequently applied, the font style can be shifted based on the noise and the reference style word image, and the target style word image with accurate font style and stable style can be generated. In the embodiment of the present disclosure, the second noise may be set according to actual needs, which is not limited by the present disclosure.

According to the embodiment of the disclosure, the font style of the characters in the sample character image is adjusted by taking the font style of the characters in the second reference style character image as a reference, and the character generation model is trained by combining the sample character image and the style noise character image, so that the trained model can carry out font style migration based on the noise and the reference style character image, a target style character image with accurate font and stable style is generated, and the accuracy of model output is improved.

According to embodiments of the present disclosure, the character generation model described above may include, for example, a style coding network and a diffusion model. A training method of the character generation model according to an embodiment of the present disclosure is exemplified below with reference to fig. 5.

Fig. 5 is a schematic diagram of a training method of a character generation model according to an embodiment of the present disclosure.

As depicted in FIG. 5, character generation model 500 includes a style coding network 510 and a diffusion model 520.

After the sample style word image (e.g., sample style word image "member") 502, the sample word image (e.g., sample word image "member") 504, and the second reference style word image (e.g., second reference style word image "benzyl") 505 are acquired, the sample style word image 502 may be noisy according to the second noise 501 to obtain a style noise word image (e.g., style noise word image "member") 503.

Next, the second reference style word image 505, the sample word image 504, and the style noise word image 503 are input into the character generation model 500 to obtain sample noise 506.

For example, second reference style word image 505 may be encoded using style encoding network 510 to obtain a second reference style feature vector. Thereafter, a diffusion model 520 is used to derive sample noise 506 based on the sample text image 504, the stylistic noise word image 503, and the second reference stylistic feature vector.

Next, parameters of the character generation model 300 are adjusted according to the sample noise 506 and the second noise 501.

For example, the noise loss is determined from the sample noise 506 and the second noise 501. The noise loss may be determined by way of example using, for example, an L1 loss function, but the disclosure is not so limited.

Parameters of the style-coding network 510 and parameters of the diffusion model 520 are then adjusted according to the noise loss, resulting in a trained character generation model. The trained character generation model may be used to generate the target style word image described above.

It should be noted that, in the character generation model training process, characters other than chinese characters may be used to train the character generation model. Model training may be performed, for example, using various types of characters including, but not limited to, western characters (including letters and words) or numeric characters as sample data, such that the trained character generation model enables font style conversion for these characters.

According to an embodiment of the present disclosure, before determining the style-noise word image from the second noise and the sample-style word image, and the second reference-style word image may be binarized. Thereby, the disturbing pixels in the respective images can be removed, so that the subsequent processing is not disturbed by the pixel values of the other gray values.

In some examples, the sample style word image, the sample word image, and the second reference style word image may also each be scaled to a preset size, such as 128 x 128 size, before the style noise word image is determined from the second noise and sample style word image, whereby the images may be processed at the same scale.

According to embodiments of the present disclosure, the first noise and the second noise described in the present disclosure may each include gaussian noise or pretzel noise, for example. Wherein the gaussian noise characterizes the noise corresponding to a gaussian distribution subject to a mean of 0 and a variance of 1.

Fig. 6 is a block diagram of a character generating apparatus according to an embodiment of the present disclosure.

As shown in fig. 6, the character generating apparatus 600 includes: a first acquisition module 610, a first determination module 620, and a second determination module 630.

The first obtaining module 610 is configured to obtain a target text image, a first reference style word image, and a first noise. The first noise is used for carrying out noise adding processing on the target text image.

The first determining module 620 is configured to obtain the target noise according to the target text image, the first reference style word image, and the first noise.

The second determining module 630 is configured to determine a target style word image corresponding to the target word image according to the target noise; wherein the style type of the target style word image is determined based on the style type of the first reference style word image.

According to an embodiment of the present disclosure, the first reference style word image comprises a plurality of first reference style word images; the first determination module 620 includes: the device comprises an acquisition unit, a fusion unit and a first determination unit. The acquisition unit is used for acquiring fusion coefficients corresponding to the first reference style word images respectively; the fusion coefficient is used for representing the fusion degree of the first reference style word image in style fusion; the fusion unit is used for determining fusion style feature vectors according to the first reference style word images and fusion coefficients corresponding to the first reference style word images; and the first determining unit is used for obtaining target noise according to the target text image, the fusion style characteristic vector and the first noise.

According to an embodiment of the present disclosure, the first determining module 620 includes: and the second determining unit is used for obtaining target noise based on the target text image, the first reference style word image and the first noise by using the character generation model.

According to an embodiment of the present disclosure, a character generation model includes a style coding network and a diffusion model.

According to an embodiment of the present disclosure, the second determining unit includes: a first determination subunit and a second determination subunit. The first determining subunit is used for obtaining a first reference style feature vector based on the first reference style word image by using the style encoding network; and the second determining subunit is used for obtaining target noise based on the target text image, the first reference style feature vector and the first noise by using the diffusion model.

According to an embodiment of the present disclosure, the character generating apparatus 600 further includes: and a first processing module. The first processing module is used for performing binarization processing on the target text image and the first reference style word image before obtaining target noise according to the target text image, the first reference style word image and the first noise.

According to an embodiment of the present disclosure, the first noise comprises gaussian noise or pretzel noise.

Fig. 7 is a block diagram of a training apparatus of a character generation model according to an embodiment of the present disclosure.

As shown in fig. 7, the training apparatus 700 of the character generation model includes: a second acquisition module 710, a third determination module 720, a fourth determination module 730, and an adjustment module 740.

The second obtaining module 710 is configured to obtain a sample style word image, a sample word image, and a second reference style word image, where a style type of the sample style word image is the same as a style type of the second reference style word image.

The third determining module 720 is configured to determine a style noise word image based on the second noise and the sample style word image. The second noise is used for carrying out noise adding processing on the sample style word image.

The fourth determining module 730 is configured to input the second reference style word image, the sample word image, and the style noise word image into the character generating model to obtain sample noise.

The adjustment module 740 is configured to adjust parameters of the character generation model according to the sample noise and the second noise.

According to an embodiment of the present disclosure, the fourth determining module 730 includes: an encoding unit and a third determining unit. The coding unit is used for coding the second reference style word image by using a style coding network to obtain a second reference style feature vector; and the third determining unit is used for obtaining sample noise based on the sample text image, the style noise word image and the second reference style feature vector by using the diffusion model.

According to an embodiment of the present disclosure, the adjustment module 740 includes: a first adjusting unit and a second adjusting unit. The first adjusting unit is used for determining noise loss according to the sample noise and the second noise; and the second adjusting unit is used for adjusting parameters of the style coding network and parameters of the diffusion model according to the noise loss.

According to an embodiment of the present disclosure, the training apparatus 700 of the character generation model further includes: and a second processing module. The second processing module is used for performing binarization processing on the sample style word image, the sample word image and the second reference style word image before determining the style noise word image according to the second noise and the sample style word image.

According to an embodiment of the present disclosure, the second noise includes gaussian noise or pretzel noise.

It should be noted that, in the embodiment of the apparatus portion, the implementation manner, the solved technical problem, the realized function, and the achieved technical effect of each module/unit/subunit and the like are the same as or similar to the implementation manner, the solved technical problem, the realized function, and the achieved technical effect of each corresponding step in the embodiment of the method portion, and are not described herein again.

In the technical scheme of the disclosure, the related data (such as including but not limited to personal information of a user) are collected, stored, used, processed, transmitted, provided, disclosed, applied and the like, and all meet the requirements of related laws and regulations without violating the public welfare.

In the technical scheme of the disclosure, the authorization or consent of the data attribution is acquired before the related data is acquired or collected.

According to embodiments of the present disclosure, the present disclosure also provides an electronic device, a readable storage medium and a computer program product.

According to an embodiment of the present disclosure, an electronic device includes: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a method as in an embodiment of the present disclosure.

According to an embodiment of the present disclosure, a non-transitory computer-readable storage medium storing computer instructions for causing a computer to perform a method as in an embodiment of the present disclosure.

According to an embodiment of the present disclosure, a computer program product comprising a computer program which, when executed by a processor, implements a method as an embodiment of the present disclosure.

Fig. 8 illustrates a schematic block diagram of an example electronic device 800 that may be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 8, the apparatus 800 includes a computing unit 801 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 802 or a computer program loaded from a storage unit 808 into a Random Access Memory (RAM) 803. In the RAM803, various programs and data required for the operation of the device 800 can also be stored. The computing unit 801, the ROM802, and the RAM803 are connected to each other by a bus 804. An input/output (I/O) interface 805 is also connected to the bus 804.

Various components in device 800 are connected to I/O interface 805, including: an input unit 806 such as a keyboard, mouse, etc.; an output unit 807 such as various types of displays, speakers, and the like; a storage unit 808, such as a magnetic disk, optical disk, etc.; and a communication unit 809, such as a network card, modem, wireless communication transceiver, or the like. The communication unit 809 allows the device 800 to exchange information/data with other devices via a computer network such as the internet and/or various telecommunication networks.

The computing unit 801 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 801 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, etc. The calculation unit 801 performs the respective methods and processes described above, such as a character generation method and a training method of a character generation model. For example, in some embodiments, the character generation method and the training method of the character generation model may be implemented as computer software programs tangibly embodied on a machine-readable medium, such as the storage unit 808. In some embodiments, part or all of the computer program may be loaded and/or installed onto device 800 via ROM802 and/or communication unit 809. When the computer program is loaded into the RAM803 and executed by the computing unit 801, one or more steps of the character generation method and training method of the character generation model described above may be performed. Alternatively, in other embodiments, the computing unit 801 may be configured to perform the character generation method and the training method of the character generation model in any other suitable manner (e.g., by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for carrying out methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.

The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server incorporating a blockchain.

It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps recited in the present disclosure may be performed in parallel, sequentially, or in a different order, provided that the desired results of the disclosed aspects are achieved, and are not limited herein.

The above detailed description should not be taken as limiting the scope of the present disclosure. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present disclosure are intended to be included within the scope of the present disclosure.

Claims

1. A character generating method comprising:

acquiring a target text image, a first reference style word image and first noise; the first noise is used for carrying out noise adding processing on the target text image;

obtaining target noise according to the target text image, the first reference style word image and the first noise; and

and determining a target style word image corresponding to the target word image according to the target noise.

2. The method of claim 1, wherein the first reference style word image comprises a plurality of first reference style word images; the obtaining the target noise according to the target text image, the first reference style word image and the first noise includes:

acquiring fusion coefficients corresponding to the first reference style word images respectively; the fusion coefficient is used for representing the fusion degree of the first reference style word image in style fusion;

determining fusion style feature vectors according to the first reference style word images and fusion coefficients corresponding to the first reference style word images; and

and obtaining the target noise according to the target text image, the fusion style characteristic vector and the first noise.

3. The method of claim 1 or 2, wherein the deriving target noise from the target text image, the first reference style word image, and the first noise comprises:

and obtaining the target noise based on the target text image, the first reference style word image and the first noise by using a character generation model.

4. A method according to claim 3, wherein the character generation model comprises a style coding network and a diffusion model.

5. The method of claim 4, wherein the deriving the target noise based on the target text image, the first reference style word image, and the first noise using a character generation model comprises:

obtaining a first reference style feature vector based on the first reference style word image by using the style coding network; and

and obtaining the target noise based on the target text image, the first reference style feature vector and the first noise by using the diffusion model.

6. The method of any one of claims 1 to 5, further comprising:

and before target noise is obtained according to the target text image, the first reference style word image and the first noise, binarizing the target text image and the first reference style word image.

7. The method of any of claims 1-6, wherein the first noise comprises gaussian noise or pretzel noise.

8. The method of claim 7, wherein the gaussian noise characterizes noise subject to a gaussian distribution with a mean of 0 and a variance of 1.

9. A training method of a character generation model, comprising:

Acquiring a sample style word image, a sample word image and a second reference style word image; wherein the style of the sample style word image is the same as the style of the second reference style word image;

determining a style noise word image according to the second noise and the sample style word image; the second noise is used for carrying out noise adding processing on the sample style word image;

inputting the second reference style word image, the sample word image and the style noise word image into a character generation model to obtain sample noise; and

parameters of the character generation model are adjusted according to the sample noise and the second noise.

10. The method of claim 9, wherein the character generation model comprises a style coding network and a diffusion model.

11. The method of claim 10, wherein the inputting the second reference style word image, the sample word image, and the style noise word image into a character generation model, deriving sample noise comprises:

using the style coding network to code the second reference style word image to obtain a second reference style feature vector; and

And obtaining the sample noise based on the sample text image, the style noise word image and the second reference style feature vector by using the diffusion model.

12. The method of claim 10 or 11, wherein the adjusting parameters of the character generation model according to the sample noise and the second noise comprises:

determining a noise loss from the sample noise and the second noise; and

and adjusting parameters of the style coding network and parameters of the diffusion model according to the noise loss.

13. The method of any of claims 9 to 12, further comprising:

and before determining the style noise word image according to the second noise and the sample style word image, performing binarization processing on the sample style word image, the sample text image and the second reference style word image.

14. The method of any of claims 9 to 13, wherein the second noise comprises gaussian noise or pretzel noise.

15. A character generating apparatus comprising:

the first acquisition module is used for acquiring a target text image, a first reference style word image and first noise; the first noise is used for carrying out noise adding processing on the target text image;

The first determining module is used for obtaining target noise according to the target text image, the first reference style word image and the first noise; and

and the second determining module is used for determining a target style word image corresponding to the target word image according to the target noise.

16. The apparatus of claim 15, wherein the first reference style word image comprises a plurality of first reference style word images; the first determining module includes:

the acquisition unit is used for acquiring fusion coefficients corresponding to the first reference style word images respectively; the fusion coefficient is used for representing the fusion degree of the first reference style word image in style fusion;

the fusion unit is used for determining fusion style feature vectors according to the first reference style word images and fusion coefficients corresponding to the first reference style word images; and

and the first determining unit is used for obtaining the target noise according to the target text image, the fusion style characteristic vector and the first noise.

17. The apparatus of claim 15 or 16, wherein the first determining module comprises:

And the second determining unit is used for obtaining the target noise based on the target text image, the first reference style word image and the first noise by utilizing a character generation model.

18. The apparatus of claim 17, wherein the character generation model comprises a style coding network and a diffusion model.

19. The apparatus of claim 18, wherein the second determining unit comprises:

a first determining subunit, configured to obtain a first reference style feature vector based on the first reference style word image using the style encoding network; and

and the second determination subunit is used for obtaining the target noise based on the target text image, the first reference style feature vector and the first noise by using the diffusion model.

20. The apparatus of any of claims 15 to 19, further comprising:

and the first processing module is used for carrying out binarization processing on the target text image and the first reference style word image before obtaining target noise according to the target text image, the first reference style word image and the first noise.

21. The apparatus of any of claims 15-20, wherein the first noise comprises gaussian noise or pretzel noise.

22. A training apparatus for a character generation model, comprising:

the second acquisition module is used for acquiring a sample style word image, a sample word image and a second reference style word image; wherein the style of the sample style word image is the same as the style of the second reference style word image;

the third determining module is used for determining a style noise word image according to the second noise and the sample style word image; the second noise is used for carrying out noise adding processing on the sample style word image;

a fourth determining module, configured to input the second reference style word image, the sample word image, and the style noise word image into a character generating model, to obtain sample noise; and

and the adjusting module is used for adjusting parameters of the character generation model according to the sample noise and the second noise.

23. The apparatus of claim 22, wherein the character generation model comprises a style coding network and a diffusion model.

24. The apparatus of claim 23, wherein the fourth determination module comprises:

the coding unit is used for coding the second reference style word image by using the style coding network to obtain a second reference style feature vector; and

And a third determining unit, configured to obtain the sample noise based on the sample text image, the style noise word image, and the second reference style feature vector using the diffusion model.

25. The apparatus of claim 23 or 24, wherein the adjustment module comprises:

a first adjustment unit configured to determine a noise loss based on the sample noise and the second noise; and

and the second adjusting unit is used for adjusting the parameters of the style coding network and the parameters of the diffusion model according to the noise loss.

26. The apparatus of any of claims 22 to 25, further comprising:

and the second processing module is used for carrying out binarization processing on the sample style word image, the sample word image and the second reference style word image before determining the style noise word image according to the second noise and the sample style word image.

27. The apparatus of any of claims 22-26, wherein the second noise comprises gaussian noise or pretzel noise.

28. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein, the liquid crystal display device comprises a liquid crystal display device,

The memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1 to 14.

29. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of any one of claims 1 to 14.

30. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any one of claims 1 to 14.