CN114820398B - Image font replacing method, system, equipment and medium based on diffusion model - Google Patents
Image font replacing method, system, equipment and medium based on diffusion model Download PDFInfo
- Publication number
- CN114820398B CN114820398B CN202210765322.XA CN202210765322A CN114820398B CN 114820398 B CN114820398 B CN 114820398B CN 202210765322 A CN202210765322 A CN 202210765322A CN 114820398 B CN114820398 B CN 114820398B
- Authority
- CN
- China
- Prior art keywords
- mask
- diffusion
- noise
- font
- image
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000009792 diffusion process Methods 0.000 title claims abstract description 151
- 238000000034 method Methods 0.000 title claims abstract description 32
- 230000000295 complement effect Effects 0.000 claims abstract description 39
- 238000012549 training Methods 0.000 claims abstract description 19
- 230000009467 reduction Effects 0.000 claims description 38
- 238000012545 processing Methods 0.000 claims description 13
- 230000008439 repair process Effects 0.000 claims description 11
- 238000004590 computer program Methods 0.000 claims description 10
- 230000006870 function Effects 0.000 claims description 10
- 230000000694 effects Effects 0.000 abstract description 8
- 238000010276 construction Methods 0.000 abstract description 2
- 238000010801 machine learning Methods 0.000 abstract description 2
- 238000010586 diagram Methods 0.000 description 5
- 230000008569 process Effects 0.000 description 4
- 230000004048 modification Effects 0.000 description 3
- 238000012986 modification Methods 0.000 description 3
- 238000001514 detection method Methods 0.000 description 2
- 230000004075 alteration Effects 0.000 description 1
- 238000003491 array Methods 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000007796 conventional method Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000012804 iterative process Methods 0.000 description 1
- 230000000873 masking effect Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T5/00—Image enhancement or restoration
- G06T5/77—Retouching; Inpainting; Scratch removal
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T5/00—Image enhancement or restoration
- G06T5/70—Denoising; Smoothing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
Landscapes
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Controls And Circuits For Display Device (AREA)
Abstract
The present disclosure provides a method for replacing image fonts based on a diffusion modelMethods, systems, devices and media, the method comprising the steps of: establishing a diffusion process and conditional noise predictor model of a picture containing a font; inputting the picture containing the font into a conditional noise predictor according to the diffusion process, and training the conditional noise predictor according to the diffusion process to obtain a diffusion model; obtaining the mask of the original font and the target font on the font picture, and calculating to obtain the mask complement set mask of the original font and the target font c (ii) a Complement mask according to mask c And denoising, generating and replacing iteration are carried out on the original image by using a diffusion model to obtain a replacing character image. The method effectively reduces the complexity of the task of font replacement on the image, simultaneously benefits from the construction and fitting capacity of the diffusion model to the image information, and has more effective repairing effect on the content information of the background part compared with the traditional machine learning method.
Description
Technical Field
The present disclosure relates to the field of font replacement, and in particular, to a method, a system, a device, and a medium for replacing image fonts based on a diffusion model.
Background
Whether at work or in daily life, the requirement that the character fonts in the pictures need to be replaced by other fonts is met. However, in general, the characters and the background are integrated, and if the font in the picture is required to be modified, the modification difficulty is high, and the modification effect is not satisfactory. Techniques for automatically implementing font replacement using intelligent algorithms began to emerge.
However, the quality of the image generated by the conventional font replacement technology completely depends on the quality of partial restoration of the background, the task complexity of the conventional method for repairing the full-font background by complementing is higher, the repairing effect is general, and actually the requirement of font replacement does not require the perfect repair of the image content except the original font mask, but the background restoration of the image outside the mask after the new font replacement.
Disclosure of Invention
The invention provides a method, a system, equipment and a medium for replacing a picture font based on a diffusion model, which can solve the problems that the existing method for replacing the font on the picture depends on the reconstruction and repair of the background, the repair difficulty is high, the quality of the picture generated after the repair is poor, and the like, and the requirement for replacing the font is converted into the problem of repairing the image outside a new font mask, so that the quality of the image after the font replacement is greatly improved.
According to an aspect of the embodiments of the present disclosure, there is provided a method for replacing a font of an image based on a diffusion model, including the following steps:
s102, establishing a diffusion process and a conditional noise predictor model of a picture containing a font;
s104, inputting the picture containing the font into a conditional noise predictor according to a diffusion process, and training the conditional noise predictor according to the diffusion process to obtain a diffusion model;
s106, obtaining masks of the original font and the target font on the font picture, and calculating to obtain a mask complement mask of the original font and the target font c ;
S108, according to the mask complement mask c And denoising the original image by using a diffusion model and performing replacement iteration to obtain a replacement character image.
Optionally, the specific steps of training the conditional noise predictor according to the diffusion process are as follows:
s1042, performing diffusion and noise addition on the original image according to the diffusion coefficient to obtain a diffused and noise-added image;
s1044, inputting the images subjected to diffusion and noise addition and the diffusion time steps as input variables into the noise predictor model, outputting a prediction result, calculating loss according to the prediction result, and performing gradient descent according to a loss function to update the conditional noise predictor model;
and S1046, circularly executing the steps S1042 and S1044 until the conditional noise predictor model converges, and taking the conditional noise predictor model as a diffusion model.
Optionally, step S106 is specifically:
s1062, acquiring an original character mask in the original image; replacing the original character with the new font to obtain a replaced character image, and acquiring a replaced character mask code in the replaced character image;
s1064, calculating a mask complement set according to the original word mask and the alternative word maskmask c The mask complement mask c Is the area on the original literal mask but outside the alternate literal mask;
s1066, obtaining the mask complement set mask by using a diffusion model reverse noise reduction mode c The information of (2) is used to repair the replacement text image.
Optionally, step S108 specifically includes:
s1082, setting the initial input as Gaussian noise, and recording as the current noise data x t+1 ;
S1084, noise data x using the trained diffusion model t+1 Performing reverse noise reduction generation according to the diffusion process to obtain the previous noise reduction data x after noise reduction processing t ;
S1086, complementing set mask according to mask c Represented unknown region and known region pixel information, for the previous step noise reduction data x t And (3) respectively processing and updating:
wherein,in order to be the diffusion coefficient,is a random Gaussian distribution of noise, x 0 Data for overlaying a target font directly on an original picture;
s1088, circularly executing T steps S1084 and S1086 to finally obtain the replacement character image, wherein T is the total diffusion time step.
According to another aspect of the embodiments of the present disclosure, there is provided a system for replacing a font of a picture based on a diffusion model, including:
the noise predictor establishing module is used for establishing a diffusion process containing font pictures and a conditional noise predictor model;
the noise predictor training module is used for inputting the pictures containing the fonts into the conditional noise predictor according to the diffusion process and training the conditional noise predictor according to the diffusion process so as to obtain a noise predictor model containing the pictures containing the fonts;
a mask complement acquisition module, configured to acquire masks of the original font and the target font on the font picture, and calculate a mask complement of the original font and the target font c ;
An image replacement module for complementing the set mask according to the mask c And denoising, generating and replacing the original image to obtain a replaced character image.
Optionally, the noise predictor training module further comprises:
the model building module is used for building a conditional noise predictor model;
the diffusion and noise adding module is used for performing diffusion and noise adding on the original image according to the diffusion coefficient to obtain a diffused and noise added image;
the updating module is used for inputting the images subjected to diffusion and noise addition and the diffusion time steps as input variables into the noise predictor model, outputting a prediction result, calculating loss according to the prediction result and performing gradient descent according to a loss function to update the conditional noise predictor model;
and circularly executing the diffusion noise adding step in the diffusion noise adding module and the updating step in the updating module until the conditional noise predictor model converges, and taking the conditional noise predictor model as a diffusion model.
Optionally, the mask complement acquisition module is specifically configured to: acquiring an original character mask in an original image; replacing the original character with the new font to obtain a replaced character image, and acquiring a replaced character mask code in the replaced character image; calculating a mask complement mask according to the original character mask and the replacement character mask c The mask complement mask c Is the area on the original text mask but outside the replacement text mask; obtaining the mask complement mask by using a diffusion model reverse noise reduction mode c The information of (2) is used to repair the replacement text image.
Optionally, the image replacement module further comprises:
the initial input is set to Gaussian noise and is recorded as the current noise data x t+1 ;
A noise reduction module for applying the trained diffusion model to the noise data x t+1 Performing reverse noise reduction generation according to the diffusion process to obtain the previous noise reduction data x after noise reduction processing t ;
A replacement module for complementing the set mask according to the mask c Represented unknown region and known region pixel information, for the previous step noise reduction data x t And respectively processing and updating:
wherein,in order to be a diffusion coefficient of the light,is a random Gaussian distribution of noise, x 0 Data for overlaying a target font directly on an original picture;
and circularly executing the denoising step in the denoising module and the replacing step in the replacing module for T times, wherein T is the total diffusion time step, and finally obtaining the replacing character image.
According to another aspect of the embodiments of the present disclosure, an electronic device is provided, which includes a memory, a processor, and a computer program stored on the memory and executable on the processor, and the processor implements the above-mentioned method for replacing a font of a picture based on a diffusion model when executing the computer program.
According to another aspect of the embodiments of the present disclosure, there is provided a computer-readable storage medium, on which a computer program is stored, the program, when being executed by a processor, implementing the steps of the above-mentioned diffusion model-based picture font replacement method.
According to the method, the conventional full-font repairing background is directly transformed into the full-font repairing of the complement parts of the new and old fonts by using the replaced new font information, so that the complexity of the font replacing task on the image is effectively reduced, and meanwhile, the method benefits from the construction fitting capacity of the diffusion model to the image information, and has a more effective repairing effect on the content information of the background part compared with the traditional machine learning method.
Drawings
Fig. 1 shows a flowchart of a method for replacing a font of a picture based on a diffusion model in embodiment 1;
FIG. 2 shows a flow chart of forward denoising and reverse denoising generation for a diffusion model;
FIG. 3 is a flow diagram illustrating an iterative process for a font background repair loop after replacement using a diffusion model inversion process;
FIG. 4 is a schematic diagram illustrating a mask complement mask c Calculating a flow chart;
fig. 5 schematically shows a picture font replacement system block diagram based on a diffusion model.
Detailed Description
The technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the drawings in the embodiments of the present disclosure, and it is obvious that the described embodiments are only a part of the embodiments of the present disclosure, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments disclosed herein without making any creative effort, shall fall within the protection scope of the present disclosure.
Example 1
According to an aspect of the embodiments of the present disclosure, there is provided a method for replacing a font of an image based on a diffusion model, as shown in fig. 1, including the following steps:
s102, establishing a diffusion process and a conditional noise predictor model of a picture containing a font;
in this embodiment, a conditional noise predictor model is built and model parameters are initialized, the basic structure of the conditional noise predictor model is an Unet network structure with equal input and output, and the input variable of the conditional noise predictor model is an image x after diffusion and noise addition t And diffusion time step t, output as modelAnd predicting the diffusion noise of the picture containing the font.
S104, inputting the picture containing the font into a conditional noise predictor according to a diffusion process, and training the conditional noise predictor according to the diffusion process to obtain a diffusion model;
in this embodiment, the specific steps of training the conditional noise predictor according to the diffusion process are as follows:
s1042, performing diffusion and noise addition on the original image according to the diffusion coefficient to obtain a diffused and noise-added image;
as shown in fig. 2, which is a schematic diagram of a diffusion model of an image with fonts used in the embodiment of the present disclosure, a forward noise adding process is performed from left to right, gaussian noise is sequentially and gradually added to the content of the original image with fonts, and after T-step noise adding, the image is completely converted into a noise image x conforming to gaussian prior T 。
In some embodiments, each diffusion noise step is calculated as follows:
setting the diffusion coefficient of each step of the diffusion model, wherein the specific setting mode is as follows: the steps from the first step of iteration to the last T are decremented according to a cosine scheme:
in the present embodiment, the first and second electrodes are,for diffusion coefficient, T is the total number of diffusion steps, usually set to 1000,s for a small amount of 0.008.
The diffusion coefficient corresponding to the current diffusion step number t is used for the original image information x 0 And (3) performing diffusion and noise addition:
S1044, inputting the images subjected to diffusion and noise addition and the diffusion time steps as input variables into the noise predictor model, outputting a prediction result, calculating loss according to the prediction result, and performing gradient descent according to a loss function to update the conditional noise predictor model;
in this embodiment, a training batch sample x is extracted from the image training set data containing fonts 0 Randomly extracting each image in the batch from T diffusion time steps, and numbering the original images x according to the extracted time steps 0 Number x is obtained by adding noise according to the diffusion noise adding method in 1 t ;
Noisy image batch x t Sending the corresponding time step number t into a conditional noise predictor model, and outputting a prediction resultAnd calculating loss, and performing gradient descent according to a loss function to update the conditional noise predictor model. The loss function L is as follows:
wherein,the amount of Gaussian noise actually used when the batch of original images are subjected to diffusion and noise addition;
and S1046, circularly executing the steps S1042 and S1044 until the conditional noise predictor model converges, and taking the conditional noise predictor model as a diffusion model.
S106, obtaining masks of the original font and the target font on the font picture, and calculating to obtain mask complement masks of the original font and the target font c ;
In this embodiment, as shown in fig. 3, which is a schematic diagram of a font replacement method used in the embodiment of the present disclosure, compared with directly generating an image from an a priori gaussian distribution, a font replacement task already has most of the content of original information of a real image, and needs to predict information of an uncertain region part after font replacement, and therefore, as shown in fig. 4, the method is performed according to the following steps:
s1062, acquiring an original character mask in the original image; replacing the original character with the new font to obtain a replaced character image, and acquiring a replaced character mask code in the replaced character image;
in this embodiment, the original text mask is obtained by using the detection and recognition model 1, Specifically, in fig. 3, the blue part is the range of the original word box; the replaced character mask is determined by a recommendation system and a user to obtain a replaced character mask 2 Specifically, in fig. 3, the green part is the new frame range;
s1064, calculating a mask complement set mask according to the original word mask and the replacement word mask c The mask complement mask c Is the area on the original literal mask but outside the alternate literal mask;
in this embodiment, in mask 1 At the mask, but at the upper part 2 The other part, mask c The information of the region is background information of the model needing repairing and filling, and the specific calculation mode is as follows:
mask c = mask 1 and (1 - mask 2 ),
the characters after being replaced are directly inserted into the input image to obtain the image which is well replaced but is to be repaired。
S1066, obtaining the mask complement set mask by using a diffusion model reverse noise reduction mode c The information of (2), the information is used for repairing the replacement character image.
In this embodiment, a reverse denoising process of the conditional noise predictor obtained by training a large number of font-containing high definition picture data sets in step S104 may be used to generate and repair font-containing high definition pictures. The overall flow of the noise reduction generation process is as follows:
randomly initializing a noise image, namely, initially inputting the image to be completely Gaussian noise, wherein the dimension is the actual dimension of the image;
executing T times of noise reduction circulation, sequentially taking { T, T-1., 2,1} in a time step T, and predicting the image output of the next step as follows:
wherein,for the rate of increase of the diffusion coefficient at the current step,zis a random gaussian noise, and is,as diffusion coefficient:
the variance parameter of the diffusion step is calculated by the diffusion coefficient, and the value is as follows:
therefore, through T-step iteration, a vivid high-definition image with fonts can be finally obtained;
s108, according to the mask complement mask c And denoising, generating and replacing iteration are carried out on the original image by using a diffusion model to obtain a replacing character image.
In this embodiment, step S108 specifically includes:
s1082, setting the initial input as Gaussian noise, and recording as the current noise data x t+1 ;
S1084, noise data x using the trained diffusion model t+1 Performing reverse noise reduction generation according to the diffusion process to obtain the previous noise reduction data x after noise reduction processing t ;
S1086, complementing set mask according to mask c Represented unknown region and known region pixel information, and denoising the previous step data x t And (3) respectively processing and updating:
wherein,in order to be the diffusion coefficient,is a random Gaussian distribution of noise, x 0 Data for directly overlaying the target font on the original picture;
s1088, circularly executing T steps S1084 and S1086 to finally obtain the replacement character image, wherein T is the total diffusion time step.
Example 2
According to another aspect of the embodiments of the present disclosure, there is provided a system 100 for replacing a font of a picture based on a diffusion model, as shown in fig. 5, including:
the noise predictor establishing module 1 is used for establishing a diffusion process of a picture containing a font and a conditional noise predictor model;
the noise predictor training module 2 is used for inputting the pictures containing the fonts into the conditional noise predictor according to the diffusion process and training the conditional noise predictor according to the diffusion process so as to obtain a noise predictor model containing the pictures containing the fonts;
a mask complement obtaining module 3, configured to obtain masks of the original font and the target font on the font picture, and calculate a mask complement of the original font and the target font c ;
An image replacement module 4 for complementing the set mask according to the mask c For the original imageAnd denoising, generating and replacing iteration to obtain a replaced character image.
In this embodiment, the noise predictor training module 2 further includes:
the diffusion and noise adding module is used for performing diffusion and noise adding on the original image according to the diffusion coefficient to obtain a diffused and noise added image;
the updating module is used for inputting the diffused and noised image and the diffusion time step as input variables into the noise predictor model, outputting a prediction result, calculating loss according to the prediction result and performing gradient descent according to a loss function to update the conditional noise predictor model;
and circularly executing the diffusion noise adding step in the diffusion noise adding module and the updating step in the updating module until the conditional noise predictor model converges, and taking the conditional noise predictor model as a diffusion model.
In this embodiment, the mask complement acquisition module is specifically configured to: acquiring an original character mask in an original image; replacing the original character with the new font to obtain a replaced character image, and acquiring a replaced character mask code in the replaced character image; calculating a mask complement mask according to the original character mask and the replacement character mask c The mask complement mask c Is the area on the original literal mask but outside the alternate literal mask; obtaining the mask complement mask by using a diffusion model reverse noise reduction mode c The information of (2) is used to repair the replacement text image.
In this embodiment, the image replacement module 4 further includes:
the initial input is set to Gaussian noise and is recorded as the current noise data x t+1 ;
A noise reduction module for applying the trained diffusion model to the noise data x t+1 Performing reverse noise reduction generation according to the diffusion process to obtain the previous noise reduction data x after noise reduction processing t ;
A replacement module for complementing the set mask according to the mask c Represented unknown region and known region pixel information, for the previous step noise reduction data x t And respectively processing and updating:
wherein,in order to be the diffusion coefficient,is a random Gaussian distribution of noise, x 0 Data for overlaying a target font directly on an original picture;
and circularly executing the noise reduction step in the noise reduction module and the replacement step in the replacement module for T times, wherein T is the total diffusion time step, and finally obtaining the replacement character image.
Example 3
The embodiment provides an electronic device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and the processor implements the method for replacing a font of a picture based on a diffusion model in embodiment 1 when executing the computer program.
The electronic device may be embodied in the form of a general purpose computing device, which may be, for example, a server device. Components of the electronic device may include, but are not limited to: at least one processor, at least one memory, and a bus connecting different system components (including the memory and the processor).
The buses include a data bus, an address bus, and a control bus.
The memory may include volatile memory, such as Random Access Memory (RAM) and/or cache memory, and may further include read-only memory (ROM).
The memory may also include program means having a set of (at least one) program modules including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each of which, or some combination thereof, may comprise an implementation of a network environment.
The processor executes various functional applications and data processing by executing computer programs stored in the memory.
The electronic device may also communicate with one or more external devices (e.g., keyboard, pointing device, etc.). Such communication may be through an input/output (I/O) interface. Also, the electronic device may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the internet) via a network adapter. The network adapter communicates with other modules of the electronic device over the bus. It should be appreciated that although not shown in the figures, other hardware and/or software modules may be used in conjunction with the electronic device, including but not limited to: microcode, device drivers, redundant processors, external disk drive arrays, RAID (disk array) systems, tape drives, and data backup storage systems, etc.
It should be noted that although in the above detailed description several units/modules or sub-units/modules of the electronic device are mentioned, such a division is merely exemplary and not mandatory. Indeed, the features and functionality of two or more of the units/modules described above may be embodied in one unit/module, according to embodiments of the application. Conversely, the features and functions of one unit/module described above may be further divided into embodiments by a plurality of units/modules.
EXAMPLE 4
The present embodiment provides a computer-readable storage medium storing a computer program which, when executed by a processor, implements the steps of the diffusion model-based picture font replacement method of embodiment 1.
More specific examples, among others, that the readable storage medium may employ may include, but are not limited to: a portable disk, hard disk, random access memory, read only memory, erasable programmable read only memory, optical storage device, magnetic storage device, or any suitable combination of the foregoing.
In a possible implementation manner, the present disclosure may also be implemented in the form of a program product, which includes program code for causing a terminal device to execute the steps of implementing the diffusion model-based picture font replacement method described in embodiment 1 when the program product is run on the terminal device.
Where program code for carrying out the disclosure is written in any combination of one or more programming languages, the program code may execute entirely on the user device, partly on the user device, as a stand-alone software package, partly on the user device and partly on a remote device or entirely on the remote device.
Example 5
The embodiment provides an application of a method for replacing a picture font based on a diffusion model, which comprises the following specific steps:
according to the input picture containing the font, the type and the text content of the font on the original image are obtained through the detection and identification model;
recommending a plurality of fonts with similar lattices in the font library according to the type of the fonts, and displaying the preview picture of the new font after typesetting on an interface;
if the typesetting effect generated by the recommendation system is not satisfactory, the user can operate the customized interface to modify the related fonts and the typesetting attributes and adjust the final font typesetting effect presentation;
calculating to obtain mask complement set mask c And inserting the new font type setting effect directly into the picture to be restored on the original image according to the finally determined new font type setting effect;
Sending the image into a diffusion model, and after the font replacement is repaired, masking a complementary set mask on the image c And the represented background unknown area obtains a final font replacement image.
Although embodiments of the present disclosure have been shown and described, it will be appreciated by those skilled in the art that changes, modifications, substitutions and alterations may be made in these embodiments without departing from the principles and spirit of the disclosure, the scope of which is defined in the appended claims and their equivalents.
Claims (8)
1. A method for replacing image fonts based on a diffusion model is characterized by comprising the following steps:
s102, establishing a diffusion process and a conditional noise predictor model of a picture containing a font;
s104, inputting the picture containing the font into a conditional noise predictor according to a diffusion process, and training the conditional noise predictor according to the diffusion process to obtain a diffusion model;
s106, obtaining masks of the original font and the target font on the font picture, and calculating to obtain mask complement masks of the original font and the target font c ;
S108, according to the mask complement mask c Denoising, generating and replacing iteration are carried out on the original image by using a diffusion model to obtain a replacing character image;
step S108 specifically includes:
s1082, setting the initial input as Gaussian noise, and recording as the current noise data x t+1 ;
S1084, noise data x using the trained diffusion model t+1 Performing reverse noise reduction generation according to the diffusion process to obtain the previous noise reduction data x after noise reduction processing t ;
S1086, complementing set mask according to mask c Represented unknown region and known region pixel information, for the previous step noise reduction data x t And respectively processing and updating:
wherein,in order to be the diffusion coefficient,is a random Gaussian distribution of noise, x 0 Data for overlaying a target font directly on an original picture;
s1088, executing T steps S1084 and S1086 in a circulating manner, and finally obtaining the replaced character image, wherein T is the total diffusion time step.
2. The diffusion model-based picture font replacement method as claimed in claim 1, wherein the specific steps of training the conditional noise predictor according to the diffusion process are as follows:
s1042, performing diffusion and noise addition on the original image according to the diffusion coefficient to obtain a diffused and noise-added image;
s1044, inputting the images subjected to diffusion and noise addition and the diffusion time steps as input variables into the conditional noise predictor model, outputting a prediction result, calculating loss according to the prediction result, and performing gradient descent according to a loss function to update the conditional noise predictor model;
and S1046, circularly executing the steps S1042 and S1044 until the conditional noise predictor model converges, and taking the conditional noise predictor model as a diffusion model.
3. The method for replacing image fonts based on diffusion models as claimed in claim 1, wherein the step S106 is specifically as follows:
s1062, acquiring an original character mask in the original image; replacing the original character with the new font to obtain a replaced character image, and acquiring a replaced character mask code in the replaced character image;
s1064, calculating a mask complement set mask according to the original word mask and the replacement word mask c The mask complement mask c Is the area on the original literal mask but outside the alternate literal mask;
s1066, obtaining the mask complement set mask by using a diffusion model reverse noise reduction mode c The information of (2) is used to repair the replacement text image.
4. A system for replacing a font of a picture based on a diffusion model, comprising:
the noise predictor establishing module is used for establishing a diffusion process containing font pictures and a conditional noise predictor model;
the noise predictor training module is used for inputting the pictures containing the fonts into the conditional noise predictor according to the diffusion process and training the conditional noise predictor according to the diffusion process so as to obtain a noise predictor model containing the fonts;
a mask complement acquisition module for acquiring masks of the original font and the target font on the font picture and calculating to obtain a mask complement mask of the original font and the target font c ;
An image replacement module for complementing the set mask according to the mask c Denoising, generating and replacing iteration are carried out on the original image by using a diffusion model to obtain a replacing character image;
the image replacement module further comprises:
the initial input is set to Gaussian noise and is recorded as the current noise data x t+1 ;
A noise reduction module for applying the trained diffusion model to the noise data x t+1 Performing reverse noise reduction generation according to the diffusion process to obtain the previous noise reduction data x after noise reduction processing t ;
A replacement module for complementing the set mask according to the mask c Represented unknown region and known region pixel information, for the previous step noise reduction data x t And respectively processing and updating:
wherein,in order to be the diffusion coefficient,is a random Gaussian distribution of noise, x 0 Data for overlaying a target font directly on an original picture;
and circularly executing the noise reduction step in the noise reduction module and the replacement step in the replacement module for T times, wherein T is the total diffusion time step, and finally obtaining the replacement character image.
5. The diffusion model-based picture font replacement system of claim 4, wherein the noise predictor training module further comprises:
the diffusion and noise adding module is used for performing diffusion and noise adding on the original image according to the diffusion coefficient to obtain a diffused and noise added image;
the updating module is used for inputting the image subjected to diffusion and noise addition and the diffusion time step as input variables into the conditional noise predictor model, outputting a prediction result, calculating loss according to the prediction result and performing gradient descent according to a loss function to update the conditional noise predictor model;
and circularly executing the diffusion noise adding step in the diffusion noise adding module and the updating step in the updating module until the conditional noise predictor model converges, and taking the conditional noise predictor model as a diffusion model.
6. The diffusion model-based picture font replacement system of claim 4, wherein the mask complement acquisition module is specifically configured to: acquiring an original character mask in an original image; replacing the original character with the new font to obtain a replaced character image, and acquiring a replaced character mask code in the replaced character image; calculating a mask complement mask according to the original character mask and the replacement character mask c The mask complement mask c Is the area on the original literal mask but outside the alternate literal mask; obtaining the mask complement mask by using a diffusion model reverse noise reduction mode c The information of (2) is used to repair the replacement text image.
7. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the method for replacing a font of a picture based on a diffusion model according to any one of claims 1 to 3 when executing the computer program.
8. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the diffusion model based picture font replacement method according to any one of claims 1 to 3.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210765322.XA CN114820398B (en) | 2022-07-01 | 2022-07-01 | Image font replacing method, system, equipment and medium based on diffusion model |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210765322.XA CN114820398B (en) | 2022-07-01 | 2022-07-01 | Image font replacing method, system, equipment and medium based on diffusion model |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114820398A CN114820398A (en) | 2022-07-29 |
CN114820398B true CN114820398B (en) | 2022-11-04 |
Family
ID=82522994
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210765322.XA Active CN114820398B (en) | 2022-07-01 | 2022-07-01 | Image font replacing method, system, equipment and medium based on diffusion model |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114820398B (en) |
Families Citing this family (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115631103B (en) * | 2022-10-17 | 2023-09-05 | 北京百度网讯科技有限公司 | Training method and device for image generation model, and image generation method and device |
CN116433501B (en) * | 2023-02-08 | 2024-01-09 | 阿里巴巴(中国)有限公司 | Image processing method and device |
CN116205819B (en) * | 2023-03-23 | 2024-02-09 | 北京百度网讯科技有限公司 | Character image generation method, training method and device of deep learning model |
CN116363261B (en) * | 2023-03-31 | 2024-07-16 | 北京百度网讯科技有限公司 | Training method of image editing model, image editing method and device |
CN117036696A (en) * | 2023-07-21 | 2023-11-10 | 清华大学深圳国际研究生院 | Image segmentation method, device, equipment and storage medium |
CN116645260B (en) * | 2023-07-27 | 2024-02-02 | 中国海洋大学 | Digital watermark attack method based on conditional diffusion model |
CN116704588B (en) * | 2023-08-03 | 2023-09-29 | 腾讯科技(深圳)有限公司 | Face image replacing method, device, equipment and storage medium |
CN117830079B (en) * | 2023-12-27 | 2024-07-26 | 北京智象未来科技有限公司 | Real picture prediction method, device, equipment and storage medium |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113516136A (en) * | 2021-07-09 | 2021-10-19 | 中国工商银行股份有限公司 | Handwritten image generation method, model training method, device and equipment |
WO2021208612A1 (en) * | 2020-04-13 | 2021-10-21 | 华为技术有限公司 | Data processing method and device |
Family Cites Families (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105005975B (en) * | 2015-07-08 | 2018-06-01 | 南京信息工程大学 | The image de-noising method of anisotropy parameter based on PCNN and image entropy |
US10613726B2 (en) * | 2017-12-22 | 2020-04-07 | Adobe Inc. | Removing and replacing objects in images according to a directed user conversation |
CN111626284B (en) * | 2020-05-26 | 2023-10-03 | 广东小天才科技有限公司 | Method and device for removing handwriting fonts, electronic equipment and storage medium |
EP4147172A1 (en) * | 2020-06-29 | 2023-03-15 | Google LLC | Machine learning for high quality image processing |
CN113052775B (en) * | 2021-03-31 | 2023-05-23 | 华南理工大学 | Image shadow removing method and device |
CN113177882B (en) * | 2021-04-29 | 2022-08-05 | 浙江大学 | Single-frame image super-resolution processing method based on diffusion model |
CN113420546A (en) * | 2021-06-24 | 2021-09-21 | 平安国际智慧城市科技股份有限公司 | Text error correction method and device, electronic equipment and readable storage medium |
CN113920013B (en) * | 2021-10-14 | 2023-06-16 | 中国科学院深圳先进技术研究院 | Super-resolution-based small image multi-target detection method |
CN114022384A (en) * | 2021-11-05 | 2022-02-08 | 安徽大学 | Adaptive edge preserving denoising method based on anisotropic diffusion model |
CN113836274A (en) * | 2021-11-25 | 2021-12-24 | 平安科技(深圳)有限公司 | Abstract extraction method, device, equipment and medium based on semantic analysis |
-
2022
- 2022-07-01 CN CN202210765322.XA patent/CN114820398B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2021208612A1 (en) * | 2020-04-13 | 2021-10-21 | 华为技术有限公司 | Data processing method and device |
CN113516136A (en) * | 2021-07-09 | 2021-10-19 | 中国工商银行股份有限公司 | Handwritten image generation method, model training method, device and equipment |
Also Published As
Publication number | Publication date |
---|---|
CN114820398A (en) | 2022-07-29 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN114820398B (en) | Image font replacing method, system, equipment and medium based on diffusion model | |
US10613726B2 (en) | Removing and replacing objects in images according to a directed user conversation | |
CN110188760B (en) | Image processing model training method, image processing method and electronic equipment | |
US20230237841A1 (en) | Occlusion Detection | |
CN111079532A (en) | Video content description method based on text self-encoder | |
US10614347B2 (en) | Identifying parameter image adjustments using image variation and sequential processing | |
CN116993864A (en) | Image generation method and device, electronic equipment and storage medium | |
CN117058007A (en) | Object class repair in digital images using class-specific repair neural networks | |
CN113487512B (en) | Digital image restoration method and device based on edge information guidance | |
CN112069877B (en) | Face information identification method based on edge information and attention mechanism | |
CN117746186A (en) | Training method of low-rank adaptive model, text image generation method and system | |
CN117315758A (en) | Facial expression detection method and device, electronic equipment and storage medium | |
CN108520259B (en) | Foreground target extraction method, device, equipment and storage medium | |
CN116403142A (en) | Video processing method, device, electronic equipment and medium | |
CN112241994B (en) | Model training method, rendering method, device, electronic equipment and storage medium | |
CN116363363A (en) | Unsupervised domain adaptive semantic segmentation method, device, equipment and readable storage medium | |
CN113554549A (en) | Text image generation method and device, computer equipment and storage medium | |
CN118134765B (en) | Image processing method, apparatus and storage medium | |
CN111144066B (en) | Adjusting method, device and equipment for font of font library and storage medium | |
CN114742999B (en) | Deep three-network semi-supervised semantic segmentation method and system | |
CN115291992B (en) | Auxiliary labeling method for graphic user interface picture, electronic equipment and storage medium | |
CN117651972A (en) | Image processing method, device, terminal equipment, electronic equipment and storage medium | |
Ko et al. | Edge-Aware Interactive Contrast Enhancement | |
Fahim et al. | Semi-supervised atmospheric component learning in low-light image problem | |
Sakhi | Segmentation Guided Image Inpainting |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |