CN114820398B

CN114820398B - Image font replacing method, system, equipment and medium based on diffusion model

Info

Publication number: CN114820398B
Application number: CN202210765322.XA
Authority: CN
Inventors: 郑智展
Original assignee: SHANGHAI YICHUANG INFORMATION TECHNOLOGY CO LTD; Beijing Hanyi Innovation Technology Co ltd
Current assignee: SHANGHAI YICHUANG INFORMATION TECHNOLOGY CO LTD; Beijing Hanyi Innovation Technology Co ltd
Priority date: 2022-07-01
Filing date: 2022-07-01
Publication date: 2022-11-04
Anticipated expiration: 2042-07-01
Also published as: CN114820398A

Abstract

The present disclosure provides a method for replacing image fonts based on a diffusion modelMethods, systems, devices and media, the method comprising the steps of: establishing a diffusion process and conditional noise predictor model of a picture containing a font; inputting the picture containing the font into a conditional noise predictor according to the diffusion process, and training the conditional noise predictor according to the diffusion process to obtain a diffusion model; obtaining the mask of the original font and the target font on the font picture, and calculating to obtain the mask complement set mask of the original font and the target font _c (ii) a Complement mask according to mask _c And denoising, generating and replacing iteration are carried out on the original image by using a diffusion model to obtain a replacing character image. The method effectively reduces the complexity of the task of font replacement on the image, simultaneously benefits from the construction and fitting capacity of the diffusion model to the image information, and has more effective repairing effect on the content information of the background part compared with the traditional machine learning method.

Description

Image font replacing method, system, equipment and medium based on diffusion model

Technical Field

The present disclosure relates to the field of font replacement, and in particular, to a method, a system, a device, and a medium for replacing image fonts based on a diffusion model.

Background

Whether at work or in daily life, the requirement that the character fonts in the pictures need to be replaced by other fonts is met. However, in general, the characters and the background are integrated, and if the font in the picture is required to be modified, the modification difficulty is high, and the modification effect is not satisfactory. Techniques for automatically implementing font replacement using intelligent algorithms began to emerge.

However, the quality of the image generated by the conventional font replacement technology completely depends on the quality of partial restoration of the background, the task complexity of the conventional method for repairing the full-font background by complementing is higher, the repairing effect is general, and actually the requirement of font replacement does not require the perfect repair of the image content except the original font mask, but the background restoration of the image outside the mask after the new font replacement.

Disclosure of Invention

The invention provides a method, a system, equipment and a medium for replacing a picture font based on a diffusion model, which can solve the problems that the existing method for replacing the font on the picture depends on the reconstruction and repair of the background, the repair difficulty is high, the quality of the picture generated after the repair is poor, and the like, and the requirement for replacing the font is converted into the problem of repairing the image outside a new font mask, so that the quality of the image after the font replacement is greatly improved.

According to an aspect of the embodiments of the present disclosure, there is provided a method for replacing a font of an image based on a diffusion model, including the following steps:

s102, establishing a diffusion process and a conditional noise predictor model of a picture containing a font;

s104, inputting the picture containing the font into a conditional noise predictor according to a diffusion process, and training the conditional noise predictor according to the diffusion process to obtain a diffusion model;

s106, obtaining masks of the original font and the target font on the font picture, and calculating to obtain a mask complement mask of the original font and the target font _c ；

S108, according to the mask complement mask _c And denoising the original image by using a diffusion model and performing replacement iteration to obtain a replacement character image.

Optionally, the specific steps of training the conditional noise predictor according to the diffusion process are as follows:

s1042, performing diffusion and noise addition on the original image according to the diffusion coefficient to obtain a diffused and noise-added image;

s1044, inputting the images subjected to diffusion and noise addition and the diffusion time steps as input variables into the noise predictor model, outputting a prediction result, calculating loss according to the prediction result, and performing gradient descent according to a loss function to update the conditional noise predictor model;

and S1046, circularly executing the steps S1042 and S1044 until the conditional noise predictor model converges, and taking the conditional noise predictor model as a diffusion model.

Optionally, step S106 is specifically:

s1062, acquiring an original character mask in the original image; replacing the original character with the new font to obtain a replaced character image, and acquiring a replaced character mask code in the replaced character image;

s1064, calculating a mask complement set according to the original word mask and the alternative word maskmask _c The mask complement mask _c Is the area on the original literal mask but outside the alternate literal mask;

s1066, obtaining the mask complement set mask by using a diffusion model reverse noise reduction mode _c The information of (2) is used to repair the replacement text image.

Optionally, step S108 specifically includes:

s1082, setting the initial input as Gaussian noise, and recording as the current noise data x _t+1 ；

S1084, noise data x using the trained diffusion model _t+1 Performing reverse noise reduction generation according to the diffusion process to obtain the previous noise reduction data x after noise reduction processing _t ；

S1086, complementing set mask according to mask _c Represented unknown region and known region pixel information, for the previous step noise reduction data x _t And (3) respectively processing and updating:

wherein,

in order to be the diffusion coefficient,

is a random Gaussian distribution of noise, x ₀ Data for overlaying a target font directly on an original picture;

s1088, circularly executing T steps S1084 and S1086 to finally obtain the replacement character image, wherein T is the total diffusion time step.

According to another aspect of the embodiments of the present disclosure, there is provided a system for replacing a font of a picture based on a diffusion model, including:

the noise predictor establishing module is used for establishing a diffusion process containing font pictures and a conditional noise predictor model;

the noise predictor training module is used for inputting the pictures containing the fonts into the conditional noise predictor according to the diffusion process and training the conditional noise predictor according to the diffusion process so as to obtain a noise predictor model containing the pictures containing the fonts;

a mask complement acquisition module, configured to acquire masks of the original font and the target font on the font picture, and calculate a mask complement of the original font and the target font _c ；

An image replacement module for complementing the set mask according to the mask _c And denoising, generating and replacing the original image to obtain a replaced character image.

Optionally, the noise predictor training module further comprises:

the model building module is used for building a conditional noise predictor model;

the diffusion and noise adding module is used for performing diffusion and noise adding on the original image according to the diffusion coefficient to obtain a diffused and noise added image;

the updating module is used for inputting the images subjected to diffusion and noise addition and the diffusion time steps as input variables into the noise predictor model, outputting a prediction result, calculating loss according to the prediction result and performing gradient descent according to a loss function to update the conditional noise predictor model;

and circularly executing the diffusion noise adding step in the diffusion noise adding module and the updating step in the updating module until the conditional noise predictor model converges, and taking the conditional noise predictor model as a diffusion model.

Optionally, the mask complement acquisition module is specifically configured to: acquiring an original character mask in an original image; replacing the original character with the new font to obtain a replaced character image, and acquiring a replaced character mask code in the replaced character image; calculating a mask complement mask according to the original character mask and the replacement character mask _c The mask complement mask _c Is the area on the original text mask but outside the replacement text mask; obtaining the mask complement mask by using a diffusion model reverse noise reduction mode _c The information of (2) is used to repair the replacement text image.

Optionally, the image replacement module further comprises:

the initial input is set to Gaussian noise and is recorded as the current noise data x _t+1 ；

A noise reduction module for applying the trained diffusion model to the noise data x _t+1 Performing reverse noise reduction generation according to the diffusion process to obtain the previous noise reduction data x after noise reduction processing _t ；

A replacement module for complementing the set mask according to the mask _c Represented unknown region and known region pixel information, for the previous step noise reduction data x _t And respectively processing and updating:

wherein,

in order to be a diffusion coefficient of the light,

and circularly executing the denoising step in the denoising module and the replacing step in the replacing module for T times, wherein T is the total diffusion time step, and finally obtaining the replacing character image.

According to another aspect of the embodiments of the present disclosure, an electronic device is provided, which includes a memory, a processor, and a computer program stored on the memory and executable on the processor, and the processor implements the above-mentioned method for replacing a font of a picture based on a diffusion model when executing the computer program.

According to another aspect of the embodiments of the present disclosure, there is provided a computer-readable storage medium, on which a computer program is stored, the program, when being executed by a processor, implementing the steps of the above-mentioned diffusion model-based picture font replacement method.

According to the method, the conventional full-font repairing background is directly transformed into the full-font repairing of the complement parts of the new and old fonts by using the replaced new font information, so that the complexity of the font replacing task on the image is effectively reduced, and meanwhile, the method benefits from the construction fitting capacity of the diffusion model to the image information, and has a more effective repairing effect on the content information of the background part compared with the traditional machine learning method.

Drawings

Fig. 1 shows a flowchart of a method for replacing a font of a picture based on a diffusion model in embodiment 1;

FIG. 2 shows a flow chart of forward denoising and reverse denoising generation for a diffusion model;

FIG. 3 is a flow diagram illustrating an iterative process for a font background repair loop after replacement using a diffusion model inversion process;

FIG. 4 is a schematic diagram illustrating a mask complement mask _c Calculating a flow chart;

fig. 5 schematically shows a picture font replacement system block diagram based on a diffusion model.

Detailed Description

The technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the drawings in the embodiments of the present disclosure, and it is obvious that the described embodiments are only a part of the embodiments of the present disclosure, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments disclosed herein without making any creative effort, shall fall within the protection scope of the present disclosure.

Example 1

According to an aspect of the embodiments of the present disclosure, there is provided a method for replacing a font of an image based on a diffusion model, as shown in fig. 1, including the following steps:

in this embodiment, a conditional noise predictor model is built and model parameters are initialized, the basic structure of the conditional noise predictor model is an Unet network structure with equal input and output, and the input variable of the conditional noise predictor model is an image x after diffusion and noise addition _t And diffusion time step t, output as modelAnd predicting the diffusion noise of the picture containing the font.

in this embodiment, the specific steps of training the conditional noise predictor according to the diffusion process are as follows:

as shown in fig. 2, which is a schematic diagram of a diffusion model of an image with fonts used in the embodiment of the present disclosure, a forward noise adding process is performed from left to right, gaussian noise is sequentially and gradually added to the content of the original image with fonts, and after T-step noise adding, the image is completely converted into a noise image x conforming to gaussian prior _T 。

In some embodiments, each diffusion noise step is calculated as follows:

setting the diffusion coefficient of each step of the diffusion model, wherein the specific setting mode is as follows: the steps from the first step of iteration to the last T are decremented according to a cosine scheme:

in the present embodiment, the first and second electrodes are,

for diffusion coefficient, T is the total number of diffusion steps, usually set to 1000,s for a small amount of 0.008.

The diffusion coefficient corresponding to the current diffusion step number t is used for the original image information x ₀ And (3) performing diffusion and noise addition:

wherein,

in order to be the diffusion coefficient,

is a random gaussian distribution of noise.

in this embodiment, a training batch sample x is extracted from the image training set data containing fonts ₀ Randomly extracting each image in the batch from T diffusion time steps, and numbering the original images x according to the extracted time steps ₀ Number x is obtained by adding noise according to the diffusion noise adding method in 1 _t ；

Noisy image batch x _t Sending the corresponding time step number t into a conditional noise predictor model, and outputting a prediction result

And calculating loss, and performing gradient descent according to a loss function to update the conditional noise predictor model. The loss function L is as follows:

wherein,

the amount of Gaussian noise actually used when the batch of original images are subjected to diffusion and noise addition;

S106, obtaining masks of the original font and the target font on the font picture, and calculating to obtain mask complement masks of the original font and the target font _c ；

In this embodiment, as shown in fig. 3, which is a schematic diagram of a font replacement method used in the embodiment of the present disclosure, compared with directly generating an image from an a priori gaussian distribution, a font replacement task already has most of the content of original information of a real image, and needs to predict information of an uncertain region part after font replacement, and therefore, as shown in fig. 4, the method is performed according to the following steps:

in this embodiment, the original text mask is obtained by using the detection and recognition model _1， Specifically, in fig. 3, the blue part is the range of the original word box; the replaced character mask is determined by a recommendation system and a user to obtain a replaced character mask ₂ Specifically, in fig. 3, the green part is the new frame range;

s1064, calculating a mask complement set mask according to the original word mask and the replacement word mask _c The mask complement mask _c Is the area on the original literal mask but outside the alternate literal mask;

in this embodiment, in mask ₁ At the mask, but at the upper part ₂ The other part, mask _c The information of the region is background information of the model needing repairing and filling, and the specific calculation mode is as follows:

mask _c = mask ₁ and (1 - mask ₂ )，

the characters after being replaced are directly inserted into the input image to obtain the image which is well replaced but is to be repaired

。

S1066, obtaining the mask complement set mask by using a diffusion model reverse noise reduction mode _c The information of (2), the information is used for repairing the replacement character image.

In this embodiment, a reverse denoising process of the conditional noise predictor obtained by training a large number of font-containing high definition picture data sets in step S104 may be used to generate and repair font-containing high definition pictures. The overall flow of the noise reduction generation process is as follows:

randomly initializing a noise image, namely, initially inputting the image to be completely Gaussian noise, wherein the dimension is the actual dimension of the image;

executing T times of noise reduction circulation, sequentially taking { T, T-1., 2,1} in a time step T, and predicting the image output of the next step as follows:

wherein,

for the rate of increase of the diffusion coefficient at the current step,zis a random gaussian noise, and is,

as diffusion coefficient:

the variance parameter of the diffusion step is calculated by the diffusion coefficient, and the value is as follows:

therefore, through T-step iteration, a vivid high-definition image with fonts can be finally obtained;

s108, according to the mask complement mask _c And denoising, generating and replacing iteration are carried out on the original image by using a diffusion model to obtain a replacing character image.

In this embodiment, step S108 specifically includes:

S1086, complementing set mask according to mask _c Represented unknown region and known region pixel information, and denoising the previous step data x _t And (3) respectively processing and updating:

wherein,

in order to be the diffusion coefficient,

is a random Gaussian distribution of noise, x ₀ Data for directly overlaying the target font on the original picture;

Example 2

According to another aspect of the embodiments of the present disclosure, there is provided a system 100 for replacing a font of a picture based on a diffusion model, as shown in fig. 5, including:

the noise predictor establishing module 1 is used for establishing a diffusion process of a picture containing a font and a conditional noise predictor model;

the noise predictor training module 2 is used for inputting the pictures containing the fonts into the conditional noise predictor according to the diffusion process and training the conditional noise predictor according to the diffusion process so as to obtain a noise predictor model containing the pictures containing the fonts;

a mask complement obtaining module 3, configured to obtain masks of the original font and the target font on the font picture, and calculate a mask complement of the original font and the target font _c ；

An image replacement module 4 for complementing the set mask according to the mask _c For the original imageAnd denoising, generating and replacing iteration to obtain a replaced character image.

In this embodiment, the noise predictor training module 2 further includes:

the updating module is used for inputting the diffused and noised image and the diffusion time step as input variables into the noise predictor model, outputting a prediction result, calculating loss according to the prediction result and performing gradient descent according to a loss function to update the conditional noise predictor model;

In this embodiment, the mask complement acquisition module is specifically configured to: acquiring an original character mask in an original image; replacing the original character with the new font to obtain a replaced character image, and acquiring a replaced character mask code in the replaced character image; calculating a mask complement mask according to the original character mask and the replacement character mask _c The mask complement mask _c Is the area on the original literal mask but outside the alternate literal mask; obtaining the mask complement mask by using a diffusion model reverse noise reduction mode _c The information of (2) is used to repair the replacement text image.

In this embodiment, the image replacement module 4 further includes:

wherein,

in order to be the diffusion coefficient,

and circularly executing the noise reduction step in the noise reduction module and the replacement step in the replacement module for T times, wherein T is the total diffusion time step, and finally obtaining the replacement character image.

Example 3

The embodiment provides an electronic device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and the processor implements the method for replacing a font of a picture based on a diffusion model in embodiment 1 when executing the computer program.

Embodiment 3 of the present disclosure is merely an example, and should not bring any limitation to the function and the scope of use of the embodiments of the present disclosure.

The electronic device may be embodied in the form of a general purpose computing device, which may be, for example, a server device. Components of the electronic device may include, but are not limited to: at least one processor, at least one memory, and a bus connecting different system components (including the memory and the processor).

The buses include a data bus, an address bus, and a control bus.

The memory may include volatile memory, such as Random Access Memory (RAM) and/or cache memory, and may further include read-only memory (ROM).

The memory may also include program means having a set of (at least one) program modules including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each of which, or some combination thereof, may comprise an implementation of a network environment.

The processor executes various functional applications and data processing by executing computer programs stored in the memory.

The electronic device may also communicate with one or more external devices (e.g., keyboard, pointing device, etc.). Such communication may be through an input/output (I/O) interface. Also, the electronic device may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the internet) via a network adapter. The network adapter communicates with other modules of the electronic device over the bus. It should be appreciated that although not shown in the figures, other hardware and/or software modules may be used in conjunction with the electronic device, including but not limited to: microcode, device drivers, redundant processors, external disk drive arrays, RAID (disk array) systems, tape drives, and data backup storage systems, etc.

It should be noted that although in the above detailed description several units/modules or sub-units/modules of the electronic device are mentioned, such a division is merely exemplary and not mandatory. Indeed, the features and functionality of two or more of the units/modules described above may be embodied in one unit/module, according to embodiments of the application. Conversely, the features and functions of one unit/module described above may be further divided into embodiments by a plurality of units/modules.

EXAMPLE 4

The present embodiment provides a computer-readable storage medium storing a computer program which, when executed by a processor, implements the steps of the diffusion model-based picture font replacement method of embodiment 1.

More specific examples, among others, that the readable storage medium may employ may include, but are not limited to: a portable disk, hard disk, random access memory, read only memory, erasable programmable read only memory, optical storage device, magnetic storage device, or any suitable combination of the foregoing.

In a possible implementation manner, the present disclosure may also be implemented in the form of a program product, which includes program code for causing a terminal device to execute the steps of implementing the diffusion model-based picture font replacement method described in embodiment 1 when the program product is run on the terminal device.

Where program code for carrying out the disclosure is written in any combination of one or more programming languages, the program code may execute entirely on the user device, partly on the user device, as a stand-alone software package, partly on the user device and partly on a remote device or entirely on the remote device.

Example 5

The embodiment provides an application of a method for replacing a picture font based on a diffusion model, which comprises the following specific steps:

according to the input picture containing the font, the type and the text content of the font on the original image are obtained through the detection and identification model;

recommending a plurality of fonts with similar lattices in the font library according to the type of the fonts, and displaying the preview picture of the new font after typesetting on an interface;

if the typesetting effect generated by the recommendation system is not satisfactory, the user can operate the customized interface to modify the related fonts and the typesetting attributes and adjust the final font typesetting effect presentation;

calculating to obtain mask complement set mask _c And inserting the new font type setting effect directly into the picture to be restored on the original image according to the finally determined new font type setting effect

；

Sending the image into a diffusion model, and after the font replacement is repaired, masking a complementary set mask on the image _c And the represented background unknown area obtains a final font replacement image.

Although embodiments of the present disclosure have been shown and described, it will be appreciated by those skilled in the art that changes, modifications, substitutions and alterations may be made in these embodiments without departing from the principles and spirit of the disclosure, the scope of which is defined in the appended claims and their equivalents.

Claims

1. A method for replacing image fonts based on a diffusion model is characterized by comprising the following steps:

S108, according to the mask complement mask _c Denoising, generating and replacing iteration are carried out on the original image by using a diffusion model to obtain a replacing character image;

step S108 specifically includes:

S1086, complementing set mask according to mask _c Represented unknown region and known region pixel information, for the previous step noise reduction data x _t And respectively processing and updating:

wherein,

in order to be the diffusion coefficient,

s1088, executing T steps S1084 and S1086 in a circulating manner, and finally obtaining the replaced character image, wherein T is the total diffusion time step.

2. The diffusion model-based picture font replacement method as claimed in claim 1, wherein the specific steps of training the conditional noise predictor according to the diffusion process are as follows:

s1044, inputting the images subjected to diffusion and noise addition and the diffusion time steps as input variables into the conditional noise predictor model, outputting a prediction result, calculating loss according to the prediction result, and performing gradient descent according to a loss function to update the conditional noise predictor model;

3. The method for replacing image fonts based on diffusion models as claimed in claim 1, wherein the step S106 is specifically as follows:

4. A system for replacing a font of a picture based on a diffusion model, comprising:

the noise predictor training module is used for inputting the pictures containing the fonts into the conditional noise predictor according to the diffusion process and training the conditional noise predictor according to the diffusion process so as to obtain a noise predictor model containing the fonts;

a mask complement acquisition module for acquiring masks of the original font and the target font on the font picture and calculating to obtain a mask complement mask of the original font and the target font _c ；

An image replacement module for complementing the set mask according to the mask _c Denoising, generating and replacing iteration are carried out on the original image by using a diffusion model to obtain a replacing character image;

the image replacement module further comprises:

wherein,

in order to be the diffusion coefficient,

5. The diffusion model-based picture font replacement system of claim 4, wherein the noise predictor training module further comprises:

the updating module is used for inputting the image subjected to diffusion and noise addition and the diffusion time step as input variables into the conditional noise predictor model, outputting a prediction result, calculating loss according to the prediction result and performing gradient descent according to a loss function to update the conditional noise predictor model;

6. The diffusion model-based picture font replacement system of claim 4, wherein the mask complement acquisition module is specifically configured to: acquiring an original character mask in an original image; replacing the original character with the new font to obtain a replaced character image, and acquiring a replaced character mask code in the replaced character image; calculating a mask complement mask according to the original character mask and the replacement character mask _c The mask complement mask _c Is the area on the original literal mask but outside the alternate literal mask; obtaining the mask complement mask by using a diffusion model reverse noise reduction mode _c The information of (2) is used to repair the replacement text image.

7. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the method for replacing a font of a picture based on a diffusion model according to any one of claims 1 to 3 when executing the computer program.

8. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the diffusion model based picture font replacement method according to any one of claims 1 to 3.