CN114820398B - Image font replacing method, system, equipment and medium based on diffusion model - Google Patents

Image font replacing method, system, equipment and medium based on diffusion model Download PDF

Info

Publication number
CN114820398B
CN114820398B CN202210765322.XA CN202210765322A CN114820398B CN 114820398 B CN114820398 B CN 114820398B CN 202210765322 A CN202210765322 A CN 202210765322A CN 114820398 B CN114820398 B CN 114820398B
Authority
CN
China
Prior art keywords
mask
diffusion
noise
font
image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210765322.XA
Other languages
Chinese (zh)
Other versions
CN114820398A (en
Inventor
郑智展
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
SHANGHAI YICHUANG INFORMATION TECHNOLOGY CO LTD
Beijing Hanyi Innovation Technology Co ltd
Original Assignee
SHANGHAI YICHUANG INFORMATION TECHNOLOGY CO LTD
Beijing Hanyi Innovation Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by SHANGHAI YICHUANG INFORMATION TECHNOLOGY CO LTD, Beijing Hanyi Innovation Technology Co ltd filed Critical SHANGHAI YICHUANG INFORMATION TECHNOLOGY CO LTD
Priority to CN202210765322.XA priority Critical patent/CN114820398B/en
Publication of CN114820398A publication Critical patent/CN114820398A/en
Application granted granted Critical
Publication of CN114820398B publication Critical patent/CN114820398B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/77Retouching; Inpainting; Scratch removal
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/70Denoising; Smoothing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning

Landscapes

  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Controls And Circuits For Display Device (AREA)

Abstract

The present disclosure provides a method for replacing image fonts based on a diffusion modelMethods, systems, devices and media, the method comprising the steps of: establishing a diffusion process and conditional noise predictor model of a picture containing a font; inputting the picture containing the font into a conditional noise predictor according to the diffusion process, and training the conditional noise predictor according to the diffusion process to obtain a diffusion model; obtaining the mask of the original font and the target font on the font picture, and calculating to obtain the mask complement set mask of the original font and the target font c (ii) a Complement mask according to mask c And denoising, generating and replacing iteration are carried out on the original image by using a diffusion model to obtain a replacing character image. The method effectively reduces the complexity of the task of font replacement on the image, simultaneously benefits from the construction and fitting capacity of the diffusion model to the image information, and has more effective repairing effect on the content information of the background part compared with the traditional machine learning method.

Description

Image font replacing method, system, equipment and medium based on diffusion model
Technical Field
The present disclosure relates to the field of font replacement, and in particular, to a method, a system, a device, and a medium for replacing image fonts based on a diffusion model.
Background
Whether at work or in daily life, the requirement that the character fonts in the pictures need to be replaced by other fonts is met. However, in general, the characters and the background are integrated, and if the font in the picture is required to be modified, the modification difficulty is high, and the modification effect is not satisfactory. Techniques for automatically implementing font replacement using intelligent algorithms began to emerge.
However, the quality of the image generated by the conventional font replacement technology completely depends on the quality of partial restoration of the background, the task complexity of the conventional method for repairing the full-font background by complementing is higher, the repairing effect is general, and actually the requirement of font replacement does not require the perfect repair of the image content except the original font mask, but the background restoration of the image outside the mask after the new font replacement.
Disclosure of Invention
The invention provides a method, a system, equipment and a medium for replacing a picture font based on a diffusion model, which can solve the problems that the existing method for replacing the font on the picture depends on the reconstruction and repair of the background, the repair difficulty is high, the quality of the picture generated after the repair is poor, and the like, and the requirement for replacing the font is converted into the problem of repairing the image outside a new font mask, so that the quality of the image after the font replacement is greatly improved.
According to an aspect of the embodiments of the present disclosure, there is provided a method for replacing a font of an image based on a diffusion model, including the following steps:
s102, establishing a diffusion process and a conditional noise predictor model of a picture containing a font;
s104, inputting the picture containing the font into a conditional noise predictor according to a diffusion process, and training the conditional noise predictor according to the diffusion process to obtain a diffusion model;
s106, obtaining masks of the original font and the target font on the font picture, and calculating to obtain a mask complement mask of the original font and the target font c
S108, according to the mask complement mask c And denoising the original image by using a diffusion model and performing replacement iteration to obtain a replacement character image.
Optionally, the specific steps of training the conditional noise predictor according to the diffusion process are as follows:
s1042, performing diffusion and noise addition on the original image according to the diffusion coefficient to obtain a diffused and noise-added image;
s1044, inputting the images subjected to diffusion and noise addition and the diffusion time steps as input variables into the noise predictor model, outputting a prediction result, calculating loss according to the prediction result, and performing gradient descent according to a loss function to update the conditional noise predictor model;
and S1046, circularly executing the steps S1042 and S1044 until the conditional noise predictor model converges, and taking the conditional noise predictor model as a diffusion model.
Optionally, step S106 is specifically:
s1062, acquiring an original character mask in the original image; replacing the original character with the new font to obtain a replaced character image, and acquiring a replaced character mask code in the replaced character image;
s1064, calculating a mask complement set according to the original word mask and the alternative word maskmask c The mask complement mask c Is the area on the original literal mask but outside the alternate literal mask;
s1066, obtaining the mask complement set mask by using a diffusion model reverse noise reduction mode c The information of (2) is used to repair the replacement text image.
Optionally, step S108 specifically includes:
s1082, setting the initial input as Gaussian noise, and recording as the current noise data x t+1
S1084, noise data x using the trained diffusion model t+1 Performing reverse noise reduction generation according to the diffusion process to obtain the previous noise reduction data x after noise reduction processing t
S1086, complementing set mask according to mask c Represented unknown region and known region pixel information, for the previous step noise reduction data x t And (3) respectively processing and updating:
Figure 145877DEST_PATH_IMAGE001
wherein the content of the first and second substances,
Figure 274370DEST_PATH_IMAGE002
in order to be the diffusion coefficient,
Figure 701941DEST_PATH_IMAGE003
is a random Gaussian distribution of noise, x 0 Data for overlaying a target font directly on an original picture;
s1088, circularly executing T steps S1084 and S1086 to finally obtain the replacement character image, wherein T is the total diffusion time step.
According to another aspect of the embodiments of the present disclosure, there is provided a system for replacing a font of a picture based on a diffusion model, including:
the noise predictor establishing module is used for establishing a diffusion process containing font pictures and a conditional noise predictor model;
the noise predictor training module is used for inputting the pictures containing the fonts into the conditional noise predictor according to the diffusion process and training the conditional noise predictor according to the diffusion process so as to obtain a noise predictor model containing the pictures containing the fonts;
a mask complement acquisition module, configured to acquire masks of the original font and the target font on the font picture, and calculate a mask complement of the original font and the target font c
An image replacement module for complementing the set mask according to the mask c And denoising, generating and replacing the original image to obtain a replaced character image.
Optionally, the noise predictor training module further comprises:
the model building module is used for building a conditional noise predictor model;
the diffusion and noise adding module is used for performing diffusion and noise adding on the original image according to the diffusion coefficient to obtain a diffused and noise added image;
the updating module is used for inputting the images subjected to diffusion and noise addition and the diffusion time steps as input variables into the noise predictor model, outputting a prediction result, calculating loss according to the prediction result and performing gradient descent according to a loss function to update the conditional noise predictor model;
and circularly executing the diffusion noise adding step in the diffusion noise adding module and the updating step in the updating module until the conditional noise predictor model converges, and taking the conditional noise predictor model as a diffusion model.
Optionally, the mask complement acquisition module is specifically configured to: acquiring an original character mask in an original image; replacing the original character with the new font to obtain a replaced character image, and acquiring a replaced character mask code in the replaced character image; calculating a mask complement mask according to the original character mask and the replacement character mask c The mask complement mask c Is the area on the original text mask but outside the replacement text mask; obtaining the mask complement mask by using a diffusion model reverse noise reduction mode c The information of (2) is used to repair the replacement text image.
Optionally, the image replacement module further comprises:
the initial input is set to Gaussian noise and is recorded as the current noise data x t+1
A noise reduction module for applying the trained diffusion model to the noise data x t+1 Performing reverse noise reduction generation according to the diffusion process to obtain the previous noise reduction data x after noise reduction processing t
A replacement module for complementing the set mask according to the mask c Represented unknown region and known region pixel information, for the previous step noise reduction data x t And respectively processing and updating:
Figure 261973DEST_PATH_IMAGE001
wherein the content of the first and second substances,
Figure 15165DEST_PATH_IMAGE004
in order to be a diffusion coefficient of the light,
Figure 896533DEST_PATH_IMAGE003
is a random Gaussian distribution of noise, x 0 Data for overlaying a target font directly on an original picture;
and circularly executing the denoising step in the denoising module and the replacing step in the replacing module for T times, wherein T is the total diffusion time step, and finally obtaining the replacing character image.
According to another aspect of the embodiments of the present disclosure, an electronic device is provided, which includes a memory, a processor, and a computer program stored on the memory and executable on the processor, and the processor implements the above-mentioned method for replacing a font of a picture based on a diffusion model when executing the computer program.
According to another aspect of the embodiments of the present disclosure, there is provided a computer-readable storage medium, on which a computer program is stored, the program, when being executed by a processor, implementing the steps of the above-mentioned diffusion model-based picture font replacement method.
According to the method, the conventional full-font repairing background is directly transformed into the full-font repairing of the complement parts of the new and old fonts by using the replaced new font information, so that the complexity of the font replacing task on the image is effectively reduced, and meanwhile, the method benefits from the construction fitting capacity of the diffusion model to the image information, and has a more effective repairing effect on the content information of the background part compared with the traditional machine learning method.
Drawings
Fig. 1 shows a flowchart of a method for replacing a font of a picture based on a diffusion model in embodiment 1;
FIG. 2 shows a flow chart of forward denoising and reverse denoising generation for a diffusion model;
FIG. 3 is a flow diagram illustrating an iterative process for a font background repair loop after replacement using a diffusion model inversion process;
FIG. 4 is a schematic diagram illustrating a mask complement mask c Calculating a flow chart;
fig. 5 schematically shows a picture font replacement system block diagram based on a diffusion model.
Detailed Description
The technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the drawings in the embodiments of the present disclosure, and it is obvious that the described embodiments are only a part of the embodiments of the present disclosure, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments disclosed herein without making any creative effort, shall fall within the protection scope of the present disclosure.
Example 1
According to an aspect of the embodiments of the present disclosure, there is provided a method for replacing a font of an image based on a diffusion model, as shown in fig. 1, including the following steps:
s102, establishing a diffusion process and a conditional noise predictor model of a picture containing a font;
in this embodiment, a conditional noise predictor model is built and model parameters are initialized, the basic structure of the conditional noise predictor model is an Unet network structure with equal input and output, and the input variable of the conditional noise predictor model is an image x after diffusion and noise addition t And diffusion time step t, output as modelAnd predicting the diffusion noise of the picture containing the font.
S104, inputting the picture containing the font into a conditional noise predictor according to a diffusion process, and training the conditional noise predictor according to the diffusion process to obtain a diffusion model;
in this embodiment, the specific steps of training the conditional noise predictor according to the diffusion process are as follows:
s1042, performing diffusion and noise addition on the original image according to the diffusion coefficient to obtain a diffused and noise-added image;
as shown in fig. 2, which is a schematic diagram of a diffusion model of an image with fonts used in the embodiment of the present disclosure, a forward noise adding process is performed from left to right, gaussian noise is sequentially and gradually added to the content of the original image with fonts, and after T-step noise adding, the image is completely converted into a noise image x conforming to gaussian prior T
In some embodiments, each diffusion noise step is calculated as follows:
setting the diffusion coefficient of each step of the diffusion model, wherein the specific setting mode is as follows: the steps from the first step of iteration to the last T are decremented according to a cosine scheme:
Figure 862215DEST_PATH_IMAGE005
in the present embodiment, the first and second electrodes are,
Figure 778219DEST_PATH_IMAGE006
for diffusion coefficient, T is the total number of diffusion steps, usually set to 1000,s for a small amount of 0.008.
The diffusion coefficient corresponding to the current diffusion step number t is used for the original image information x 0 And (3) performing diffusion and noise addition:
Figure 436733DEST_PATH_IMAGE007
wherein the content of the first and second substances,
Figure 70977DEST_PATH_IMAGE004
in order to be the diffusion coefficient,
Figure 840350DEST_PATH_IMAGE003
is a random gaussian distribution of noise.
S1044, inputting the images subjected to diffusion and noise addition and the diffusion time steps as input variables into the noise predictor model, outputting a prediction result, calculating loss according to the prediction result, and performing gradient descent according to a loss function to update the conditional noise predictor model;
in this embodiment, a training batch sample x is extracted from the image training set data containing fonts 0 Randomly extracting each image in the batch from T diffusion time steps, and numbering the original images x according to the extracted time steps 0 Number x is obtained by adding noise according to the diffusion noise adding method in 1 t
Noisy image batch x t Sending the corresponding time step number t into a conditional noise predictor model, and outputting a prediction result
Figure 843816DEST_PATH_IMAGE008
And calculating loss, and performing gradient descent according to a loss function to update the conditional noise predictor model. The loss function L is as follows:
Figure 938810DEST_PATH_IMAGE009
wherein the content of the first and second substances,
Figure 794771DEST_PATH_IMAGE010
the amount of Gaussian noise actually used when the batch of original images are subjected to diffusion and noise addition;
and S1046, circularly executing the steps S1042 and S1044 until the conditional noise predictor model converges, and taking the conditional noise predictor model as a diffusion model.
S106, obtaining masks of the original font and the target font on the font picture, and calculating to obtain mask complement masks of the original font and the target font c
In this embodiment, as shown in fig. 3, which is a schematic diagram of a font replacement method used in the embodiment of the present disclosure, compared with directly generating an image from an a priori gaussian distribution, a font replacement task already has most of the content of original information of a real image, and needs to predict information of an uncertain region part after font replacement, and therefore, as shown in fig. 4, the method is performed according to the following steps:
s1062, acquiring an original character mask in the original image; replacing the original character with the new font to obtain a replaced character image, and acquiring a replaced character mask code in the replaced character image;
in this embodiment, the original text mask is obtained by using the detection and recognition model 1, Specifically, in fig. 3, the blue part is the range of the original word box; the replaced character mask is determined by a recommendation system and a user to obtain a replaced character mask 2 Specifically, in fig. 3, the green part is the new frame range;
s1064, calculating a mask complement set mask according to the original word mask and the replacement word mask c The mask complement mask c Is the area on the original literal mask but outside the alternate literal mask;
in this embodiment, in mask 1 At the mask, but at the upper part 2 The other part, mask c The information of the region is background information of the model needing repairing and filling, and the specific calculation mode is as follows:
mask c = mask 1 and (1 - mask 2 ),
the characters after being replaced are directly inserted into the input image to obtain the image which is well replaced but is to be repaired
Figure 367835DEST_PATH_IMAGE011
S1066, obtaining the mask complement set mask by using a diffusion model reverse noise reduction mode c The information of (2), the information is used for repairing the replacement character image.
In this embodiment, a reverse denoising process of the conditional noise predictor obtained by training a large number of font-containing high definition picture data sets in step S104 may be used to generate and repair font-containing high definition pictures. The overall flow of the noise reduction generation process is as follows:
randomly initializing a noise image, namely, initially inputting the image to be completely Gaussian noise, wherein the dimension is the actual dimension of the image;
executing T times of noise reduction circulation, sequentially taking { T, T-1., 2,1} in a time step T, and predicting the image output of the next step as follows:
Figure 789589DEST_PATH_IMAGE012
wherein the content of the first and second substances,
Figure 55485DEST_PATH_IMAGE013
for the rate of increase of the diffusion coefficient at the current step,zis a random gaussian noise, and is,
Figure 133163DEST_PATH_IMAGE014
as diffusion coefficient:
Figure 775496DEST_PATH_IMAGE015
Figure 255019DEST_PATH_IMAGE016
the variance parameter of the diffusion step is calculated by the diffusion coefficient, and the value is as follows:
Figure 190352DEST_PATH_IMAGE017
therefore, through T-step iteration, a vivid high-definition image with fonts can be finally obtained;
s108, according to the mask complement mask c And denoising, generating and replacing iteration are carried out on the original image by using a diffusion model to obtain a replacing character image.
In this embodiment, step S108 specifically includes:
s1082, setting the initial input as Gaussian noise, and recording as the current noise data x t+1
S1084, noise data x using the trained diffusion model t+1 Performing reverse noise reduction generation according to the diffusion process to obtain the previous noise reduction data x after noise reduction processing t
S1086, complementing set mask according to mask c Represented unknown region and known region pixel information, and denoising the previous step data x t And (3) respectively processing and updating:
Figure 755326DEST_PATH_IMAGE018
wherein the content of the first and second substances,
Figure 404613DEST_PATH_IMAGE006
in order to be the diffusion coefficient,
Figure 4221DEST_PATH_IMAGE003
is a random Gaussian distribution of noise, x 0 Data for directly overlaying the target font on the original picture;
s1088, circularly executing T steps S1084 and S1086 to finally obtain the replacement character image, wherein T is the total diffusion time step.
Example 2
According to another aspect of the embodiments of the present disclosure, there is provided a system 100 for replacing a font of a picture based on a diffusion model, as shown in fig. 5, including:
the noise predictor establishing module 1 is used for establishing a diffusion process of a picture containing a font and a conditional noise predictor model;
the noise predictor training module 2 is used for inputting the pictures containing the fonts into the conditional noise predictor according to the diffusion process and training the conditional noise predictor according to the diffusion process so as to obtain a noise predictor model containing the pictures containing the fonts;
a mask complement obtaining module 3, configured to obtain masks of the original font and the target font on the font picture, and calculate a mask complement of the original font and the target font c
An image replacement module 4 for complementing the set mask according to the mask c For the original imageAnd denoising, generating and replacing iteration to obtain a replaced character image.
In this embodiment, the noise predictor training module 2 further includes:
the diffusion and noise adding module is used for performing diffusion and noise adding on the original image according to the diffusion coefficient to obtain a diffused and noise added image;
the updating module is used for inputting the diffused and noised image and the diffusion time step as input variables into the noise predictor model, outputting a prediction result, calculating loss according to the prediction result and performing gradient descent according to a loss function to update the conditional noise predictor model;
and circularly executing the diffusion noise adding step in the diffusion noise adding module and the updating step in the updating module until the conditional noise predictor model converges, and taking the conditional noise predictor model as a diffusion model.
In this embodiment, the mask complement acquisition module is specifically configured to: acquiring an original character mask in an original image; replacing the original character with the new font to obtain a replaced character image, and acquiring a replaced character mask code in the replaced character image; calculating a mask complement mask according to the original character mask and the replacement character mask c The mask complement mask c Is the area on the original literal mask but outside the alternate literal mask; obtaining the mask complement mask by using a diffusion model reverse noise reduction mode c The information of (2) is used to repair the replacement text image.
In this embodiment, the image replacement module 4 further includes:
the initial input is set to Gaussian noise and is recorded as the current noise data x t+1
A noise reduction module for applying the trained diffusion model to the noise data x t+1 Performing reverse noise reduction generation according to the diffusion process to obtain the previous noise reduction data x after noise reduction processing t
A replacement module for complementing the set mask according to the mask c Represented unknown region and known region pixel information, for the previous step noise reduction data x t And respectively processing and updating:
Figure 611920DEST_PATH_IMAGE019
wherein, the first and the second end of the pipe are connected with each other,
Figure 664190DEST_PATH_IMAGE002
in order to be the diffusion coefficient,
Figure 382747DEST_PATH_IMAGE003
is a random Gaussian distribution of noise, x 0 Data for overlaying a target font directly on an original picture;
and circularly executing the noise reduction step in the noise reduction module and the replacement step in the replacement module for T times, wherein T is the total diffusion time step, and finally obtaining the replacement character image.
Example 3
The embodiment provides an electronic device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and the processor implements the method for replacing a font of a picture based on a diffusion model in embodiment 1 when executing the computer program.
Embodiment 3 of the present disclosure is merely an example, and should not bring any limitation to the function and the scope of use of the embodiments of the present disclosure.
The electronic device may be embodied in the form of a general purpose computing device, which may be, for example, a server device. Components of the electronic device may include, but are not limited to: at least one processor, at least one memory, and a bus connecting different system components (including the memory and the processor).
The buses include a data bus, an address bus, and a control bus.
The memory may include volatile memory, such as Random Access Memory (RAM) and/or cache memory, and may further include read-only memory (ROM).
The memory may also include program means having a set of (at least one) program modules including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each of which, or some combination thereof, may comprise an implementation of a network environment.
The processor executes various functional applications and data processing by executing computer programs stored in the memory.
The electronic device may also communicate with one or more external devices (e.g., keyboard, pointing device, etc.). Such communication may be through an input/output (I/O) interface. Also, the electronic device may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the internet) via a network adapter. The network adapter communicates with other modules of the electronic device over the bus. It should be appreciated that although not shown in the figures, other hardware and/or software modules may be used in conjunction with the electronic device, including but not limited to: microcode, device drivers, redundant processors, external disk drive arrays, RAID (disk array) systems, tape drives, and data backup storage systems, etc.
It should be noted that although in the above detailed description several units/modules or sub-units/modules of the electronic device are mentioned, such a division is merely exemplary and not mandatory. Indeed, the features and functionality of two or more of the units/modules described above may be embodied in one unit/module, according to embodiments of the application. Conversely, the features and functions of one unit/module described above may be further divided into embodiments by a plurality of units/modules.
EXAMPLE 4
The present embodiment provides a computer-readable storage medium storing a computer program which, when executed by a processor, implements the steps of the diffusion model-based picture font replacement method of embodiment 1.
More specific examples, among others, that the readable storage medium may employ may include, but are not limited to: a portable disk, hard disk, random access memory, read only memory, erasable programmable read only memory, optical storage device, magnetic storage device, or any suitable combination of the foregoing.
In a possible implementation manner, the present disclosure may also be implemented in the form of a program product, which includes program code for causing a terminal device to execute the steps of implementing the diffusion model-based picture font replacement method described in embodiment 1 when the program product is run on the terminal device.
Where program code for carrying out the disclosure is written in any combination of one or more programming languages, the program code may execute entirely on the user device, partly on the user device, as a stand-alone software package, partly on the user device and partly on a remote device or entirely on the remote device.
Example 5
The embodiment provides an application of a method for replacing a picture font based on a diffusion model, which comprises the following specific steps:
according to the input picture containing the font, the type and the text content of the font on the original image are obtained through the detection and identification model;
recommending a plurality of fonts with similar lattices in the font library according to the type of the fonts, and displaying the preview picture of the new font after typesetting on an interface;
if the typesetting effect generated by the recommendation system is not satisfactory, the user can operate the customized interface to modify the related fonts and the typesetting attributes and adjust the final font typesetting effect presentation;
calculating to obtain mask complement set mask c And inserting the new font type setting effect directly into the picture to be restored on the original image according to the finally determined new font type setting effect
Figure 836862DEST_PATH_IMAGE011
Sending the image into a diffusion model, and after the font replacement is repaired, masking a complementary set mask on the image c And the represented background unknown area obtains a final font replacement image.
Although embodiments of the present disclosure have been shown and described, it will be appreciated by those skilled in the art that changes, modifications, substitutions and alterations may be made in these embodiments without departing from the principles and spirit of the disclosure, the scope of which is defined in the appended claims and their equivalents.

Claims (8)

1. A method for replacing image fonts based on a diffusion model is characterized by comprising the following steps:
s102, establishing a diffusion process and a conditional noise predictor model of a picture containing a font;
s104, inputting the picture containing the font into a conditional noise predictor according to a diffusion process, and training the conditional noise predictor according to the diffusion process to obtain a diffusion model;
s106, obtaining masks of the original font and the target font on the font picture, and calculating to obtain mask complement masks of the original font and the target font c
S108, according to the mask complement mask c Denoising, generating and replacing iteration are carried out on the original image by using a diffusion model to obtain a replacing character image;
step S108 specifically includes:
s1082, setting the initial input as Gaussian noise, and recording as the current noise data x t+1
S1084, noise data x using the trained diffusion model t+1 Performing reverse noise reduction generation according to the diffusion process to obtain the previous noise reduction data x after noise reduction processing t
S1086, complementing set mask according to mask c Represented unknown region and known region pixel information, for the previous step noise reduction data x t And respectively processing and updating:
Figure DEST_PATH_IMAGE001
wherein, the first and the second end of the pipe are connected with each other,
Figure 275891DEST_PATH_IMAGE002
in order to be the diffusion coefficient,
Figure DEST_PATH_IMAGE003
is a random Gaussian distribution of noise, x 0 Data for overlaying a target font directly on an original picture;
s1088, executing T steps S1084 and S1086 in a circulating manner, and finally obtaining the replaced character image, wherein T is the total diffusion time step.
2. The diffusion model-based picture font replacement method as claimed in claim 1, wherein the specific steps of training the conditional noise predictor according to the diffusion process are as follows:
s1042, performing diffusion and noise addition on the original image according to the diffusion coefficient to obtain a diffused and noise-added image;
s1044, inputting the images subjected to diffusion and noise addition and the diffusion time steps as input variables into the conditional noise predictor model, outputting a prediction result, calculating loss according to the prediction result, and performing gradient descent according to a loss function to update the conditional noise predictor model;
and S1046, circularly executing the steps S1042 and S1044 until the conditional noise predictor model converges, and taking the conditional noise predictor model as a diffusion model.
3. The method for replacing image fonts based on diffusion models as claimed in claim 1, wherein the step S106 is specifically as follows:
s1062, acquiring an original character mask in the original image; replacing the original character with the new font to obtain a replaced character image, and acquiring a replaced character mask code in the replaced character image;
s1064, calculating a mask complement set mask according to the original word mask and the replacement word mask c The mask complement mask c Is the area on the original literal mask but outside the alternate literal mask;
s1066, obtaining the mask complement set mask by using a diffusion model reverse noise reduction mode c The information of (2) is used to repair the replacement text image.
4. A system for replacing a font of a picture based on a diffusion model, comprising:
the noise predictor establishing module is used for establishing a diffusion process containing font pictures and a conditional noise predictor model;
the noise predictor training module is used for inputting the pictures containing the fonts into the conditional noise predictor according to the diffusion process and training the conditional noise predictor according to the diffusion process so as to obtain a noise predictor model containing the fonts;
a mask complement acquisition module for acquiring masks of the original font and the target font on the font picture and calculating to obtain a mask complement mask of the original font and the target font c
An image replacement module for complementing the set mask according to the mask c Denoising, generating and replacing iteration are carried out on the original image by using a diffusion model to obtain a replacing character image;
the image replacement module further comprises:
the initial input is set to Gaussian noise and is recorded as the current noise data x t+1
A noise reduction module for applying the trained diffusion model to the noise data x t+1 Performing reverse noise reduction generation according to the diffusion process to obtain the previous noise reduction data x after noise reduction processing t
A replacement module for complementing the set mask according to the mask c Represented unknown region and known region pixel information, for the previous step noise reduction data x t And respectively processing and updating:
Figure 130714DEST_PATH_IMAGE004
wherein the content of the first and second substances,
Figure 113714DEST_PATH_IMAGE002
in order to be the diffusion coefficient,
Figure 712185DEST_PATH_IMAGE003
is a random Gaussian distribution of noise, x 0 Data for overlaying a target font directly on an original picture;
and circularly executing the noise reduction step in the noise reduction module and the replacement step in the replacement module for T times, wherein T is the total diffusion time step, and finally obtaining the replacement character image.
5. The diffusion model-based picture font replacement system of claim 4, wherein the noise predictor training module further comprises:
the diffusion and noise adding module is used for performing diffusion and noise adding on the original image according to the diffusion coefficient to obtain a diffused and noise added image;
the updating module is used for inputting the image subjected to diffusion and noise addition and the diffusion time step as input variables into the conditional noise predictor model, outputting a prediction result, calculating loss according to the prediction result and performing gradient descent according to a loss function to update the conditional noise predictor model;
and circularly executing the diffusion noise adding step in the diffusion noise adding module and the updating step in the updating module until the conditional noise predictor model converges, and taking the conditional noise predictor model as a diffusion model.
6. The diffusion model-based picture font replacement system of claim 4, wherein the mask complement acquisition module is specifically configured to: acquiring an original character mask in an original image; replacing the original character with the new font to obtain a replaced character image, and acquiring a replaced character mask code in the replaced character image; calculating a mask complement mask according to the original character mask and the replacement character mask c The mask complement mask c Is the area on the original literal mask but outside the alternate literal mask; obtaining the mask complement mask by using a diffusion model reverse noise reduction mode c The information of (2) is used to repair the replacement text image.
7. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the method for replacing a font of a picture based on a diffusion model according to any one of claims 1 to 3 when executing the computer program.
8. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the diffusion model based picture font replacement method according to any one of claims 1 to 3.
CN202210765322.XA 2022-07-01 2022-07-01 Image font replacing method, system, equipment and medium based on diffusion model Active CN114820398B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210765322.XA CN114820398B (en) 2022-07-01 2022-07-01 Image font replacing method, system, equipment and medium based on diffusion model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210765322.XA CN114820398B (en) 2022-07-01 2022-07-01 Image font replacing method, system, equipment and medium based on diffusion model

Publications (2)

Publication Number Publication Date
CN114820398A CN114820398A (en) 2022-07-29
CN114820398B true CN114820398B (en) 2022-11-04

Family

ID=82522994

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210765322.XA Active CN114820398B (en) 2022-07-01 2022-07-01 Image font replacing method, system, equipment and medium based on diffusion model

Country Status (1)

Country Link
CN (1) CN114820398B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115631103B (en) * 2022-10-17 2023-09-05 北京百度网讯科技有限公司 Training method and device for image generation model, and image generation method and device
CN116433501B (en) * 2023-02-08 2024-01-09 阿里巴巴(中国)有限公司 Image processing method and device
CN116205819B (en) * 2023-03-23 2024-02-09 北京百度网讯科技有限公司 Character image generation method, training method and device of deep learning model
CN116363261A (en) * 2023-03-31 2023-06-30 北京百度网讯科技有限公司 Training method of image editing model, image editing method and device
CN116645260B (en) * 2023-07-27 2024-02-02 中国海洋大学 Digital watermark attack method based on conditional diffusion model
CN116704588B (en) * 2023-08-03 2023-09-29 腾讯科技(深圳)有限公司 Face image replacing method, device, equipment and storage medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113516136A (en) * 2021-07-09 2021-10-19 中国工商银行股份有限公司 Handwritten image generation method, model training method, device and equipment
WO2021208612A1 (en) * 2020-04-13 2021-10-21 华为技术有限公司 Data processing method and device

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105005975B (en) * 2015-07-08 2018-06-01 南京信息工程大学 The image de-noising method of anisotropy parameter based on PCNN and image entropy
US10613726B2 (en) * 2017-12-22 2020-04-07 Adobe Inc. Removing and replacing objects in images according to a directed user conversation
CN111626284B (en) * 2020-05-26 2023-10-03 广东小天才科技有限公司 Method and device for removing handwriting fonts, electronic equipment and storage medium
WO2022005448A1 (en) * 2020-06-29 2022-01-06 Google Llc Machine learning for high quality image processing
CN113052775B (en) * 2021-03-31 2023-05-23 华南理工大学 Image shadow removing method and device
CN113177882B (en) * 2021-04-29 2022-08-05 浙江大学 Single-frame image super-resolution processing method based on diffusion model
CN113420546A (en) * 2021-06-24 2021-09-21 平安国际智慧城市科技股份有限公司 Text error correction method and device, electronic equipment and readable storage medium
CN113920013B (en) * 2021-10-14 2023-06-16 中国科学院深圳先进技术研究院 Super-resolution-based small image multi-target detection method
CN114022384A (en) * 2021-11-05 2022-02-08 安徽大学 Adaptive edge preserving denoising method based on anisotropic diffusion model
CN113836274A (en) * 2021-11-25 2021-12-24 平安科技(深圳)有限公司 Abstract extraction method, device, equipment and medium based on semantic analysis

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021208612A1 (en) * 2020-04-13 2021-10-21 华为技术有限公司 Data processing method and device
CN113516136A (en) * 2021-07-09 2021-10-19 中国工商银行股份有限公司 Handwritten image generation method, model training method, device and equipment

Also Published As

Publication number Publication date
CN114820398A (en) 2022-07-29

Similar Documents

Publication Publication Date Title
CN114820398B (en) Image font replacing method, system, equipment and medium based on diffusion model
US11328523B2 (en) Image composites using a generative neural network
US10613726B2 (en) Removing and replacing objects in images according to a directed user conversation
CN110188760B (en) Image processing model training method, image processing method and electronic equipment
CN111079532A (en) Video content description method based on text self-encoder
EP3779891A1 (en) Method and device for training neural network model, and method and device for generating time-lapse photography video
CN110941964A (en) Bilingual corpus screening method and device and storage medium
US20220392025A1 (en) Restoring degraded digital images through a deep learning framework
CN112037109A (en) Improved image watermarking method and system based on saliency target detection
CN117058007A (en) Object class repair in digital images using class-specific repair neural networks
CN113487512A (en) Digital image restoration method and device based on edge information guidance
CN117315758A (en) Facial expression detection method and device, electronic equipment and storage medium
CN112614149A (en) Semantic synthesis method based on instance segmentation
CN116403142A (en) Video processing method, device, electronic equipment and medium
CN112069877B (en) Face information identification method based on edge information and attention mechanism
CN113554549A (en) Text image generation method and device, computer equipment and storage medium
CN112241994B (en) Model training method, rendering method, device, electronic equipment and storage medium
CN117201874B (en) Face image replacement method and device, electronic equipment and storage medium
Samii et al. Iterative learning: Leveraging the computer as an on-demand expert artist
CN115291992B (en) Auxiliary labeling method for graphic user interface picture, electronic equipment and storage medium
US20240127510A1 (en) Stylized glyphs using generative ai
CN117651972A (en) Image processing method, device, terminal equipment, electronic equipment and storage medium
Fahim et al. Semi-supervised atmospheric component learning in low-light image problem
Ko et al. Edge-Aware Interactive Contrast Enhancement
Sakhi Segmentation Guided Image Inpainting

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant