CN116309135A - Diffusion model processing method and device and picture processing method and device - Google Patents

Diffusion model processing method and device and picture processing method and device Download PDF

Info

Publication number
CN116309135A
CN116309135A CN202310177857.XA CN202310177857A CN116309135A CN 116309135 A CN116309135 A CN 116309135A CN 202310177857 A CN202310177857 A CN 202310177857A CN 116309135 A CN116309135 A CN 116309135A
Authority
CN
China
Prior art keywords
time step
picture
noise
target
diffusion model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310177857.XA
Other languages
Chinese (zh)
Inventor
阳展韬
沈宇军
张晗
冯睿蠡
黄梁华
刘宇
张轶飞
赵德丽
周靖人
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba Damo Institute Hangzhou Technology Co Ltd
Original Assignee
Alibaba Damo Institute Hangzhou Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Damo Institute Hangzhou Technology Co Ltd filed Critical Alibaba Damo Institute Hangzhou Technology Co Ltd
Priority to CN202310177857.XA priority Critical patent/CN116309135A/en
Publication of CN116309135A publication Critical patent/CN116309135A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/70Denoising; Smoothing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/50Image enhancement or restoration using two or more images, e.g. averaging or subtraction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10004Still image; Photographic image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning

Landscapes

  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Image Processing (AREA)

Abstract

The embodiment of the specification provides a diffusion model processing method and device, and a picture processing method and device, wherein the diffusion model processing method comprises the steps of determining a time step set of a diffusion model and a time step interval corresponding to the time step set; determining a first time step from the time step set, and determining a target time step corresponding to the first time step according to the time step interval; inputting the noise adding picture corresponding to the first time step and the target time step into a diffusion model to obtain the prediction noise corresponding to the noise adding picture; and processing the diffusion model according to the target noise and the prediction noise corresponding to the noise adding picture. According to the method, the time step set is divided into the time step intervals, and when the diffusion model is trained subsequently, the diffusion model shares the time step conditions in one time step interval, namely, the first time steps share the target time steps in the corresponding time step intervals, so that the time step conditions are reduced, the training load is greatly reduced, and the model training performance is improved.

Description

Diffusion model processing method and device and picture processing method and device
Technical Field
Embodiments of the present disclosure relate to the field of computer technologies, and in particular, to a diffusion model processing method and apparatus, a picture processing method and apparatus, a computing device, and a computer readable storage medium.
Background
The diffusion model is a generation model, and the picture distribution can be gradually destroyed into Gaussian noise by constructing a Markov chain; and then gradually denoising through network learning reverse distribution to generate a picture. Diffusion models achieve very surprising effects on a variety of tasks, including but not limited to multi-modal generation tasks of text-generated images, and the like.
However, a conventional diffusion model typically learns all single step transition probabilities through one network, and the learning of different probabilities is controlled by time-step conditions. This approach can create a heavy training burden, resulting in insufficient model capacity and poor training results.
Disclosure of Invention
In view of this, the present embodiment provides a diffusion model processing method. One or more embodiments of the present disclosure relate to a diffusion model processing apparatus, a picture processing method, a picture processing apparatus, a computing device, a computer-readable storage medium, and a computer program, which solve the technical drawbacks of the prior art.
According to a first aspect of embodiments of the present disclosure, there is provided a diffusion model processing method, including:
Determining a time step set of a diffusion model and a time step interval corresponding to the time step set;
determining a first time step from the time step set, and determining a target time step corresponding to the first time step according to the time step interval, wherein the first time step is any time step in the time step set;
inputting the noise adding picture corresponding to the first time step and the target time step into a diffusion model to obtain the prediction noise corresponding to the noise adding picture;
and processing the diffusion model according to the target noise corresponding to the noise-added picture and the prediction noise.
According to a second aspect of embodiments of the present specification, there is provided a diffusion model processing apparatus comprising:
the interval dividing module is configured to determine a time step set of the diffusion model and a time step interval corresponding to the time step set;
the target time step determining module is configured to determine a first time step from the time step set, and determine a target time step corresponding to the first time step according to the time step interval, wherein the first time step is any time step in the time step set;
The first model prediction module is configured to input a noise-added picture corresponding to the first time step and the target time step into a diffusion model to obtain prediction noise corresponding to the noise-added picture;
and the model processing module is configured to process the diffusion model according to the target noise corresponding to the noise-added picture and the prediction noise.
According to a third aspect of embodiments of the present specification, there is provided a picture processing method, including:
determining a target noise adding picture, and inputting the target noise adding picture into a diffusion model to obtain prediction noise corresponding to the target noise adding picture;
determining a denoised target picture according to the target denoised picture and the prediction noise corresponding to the target denoised picture,
wherein the diffusion model is obtained by the diffusion model processing method.
According to a fourth aspect of embodiments of the present specification, there is provided a picture processing apparatus including:
the second model prediction module is configured to determine a target noise-added picture, input the target noise-added picture into a diffusion model and obtain prediction noise corresponding to the target noise-added picture;
a target picture determining module configured to determine a denoised target picture according to the target denoised picture and a prediction noise corresponding to the target denoised picture,
Wherein the diffusion model is obtained by the diffusion model processing method.
According to a fifth aspect of embodiments of the present specification, there is provided a computing device comprising:
a memory and a processor;
the memory is configured to store computer executable instructions that, when executed by the processor, implement the steps of the diffusion model processing method or the picture processing method described above.
According to a sixth aspect of embodiments of the present specification, there is provided a computer-readable storage medium storing computer-executable instructions which, when executed by a processor, implement the steps of the diffusion model processing method or the picture processing method described above.
According to a seventh aspect of the embodiments of the present specification, there is provided a computer program, wherein the computer program, when executed in a computer, causes the computer to execute the steps of the diffusion model processing method or the picture processing method described above.
One embodiment of the present disclosure implements a method for processing a diffusion model, including determining a time step set of the diffusion model and a time step interval corresponding to the time step set; determining a first time step from the time step set, and determining a target time step corresponding to the first time step according to the time step interval, wherein the first time step is any time step in the time step set; inputting the noise adding picture corresponding to the first time step and the target time step into a diffusion model to obtain the prediction noise corresponding to the noise adding picture; and processing the diffusion model according to the target noise corresponding to the noise-added picture and the prediction noise.
Specifically, according to the diffusion model processing method, the time step set is divided into the time step intervals, and when the diffusion model is trained subsequently, the diffusion model shares the time step conditions in one time step interval, namely, the first time steps share the target time steps in the corresponding time step intervals, so that the time step conditions are reduced, the training burden is greatly lightened, and the model training performance is improved.
Drawings
FIG. 1 is a schematic structural diagram of a denoising diffusion probability model according to an embodiment of the present disclosure;
FIG. 2 is a schematic diagram of a specific implementation scenario of a diffusion model processing method according to an embodiment of the present disclosure;
FIG. 3 is a flow chart of a diffusion model processing method according to one embodiment of the present disclosure;
FIG. 4 is a flowchart of a method for processing a picture according to an embodiment of the present disclosure;
FIG. 5 is a schematic view of a diffusion model processing apparatus according to one embodiment of the present disclosure;
fig. 6 is a schematic structural diagram of a picture processing apparatus according to an embodiment of the present disclosure;
FIG. 7 is a block diagram of a computing device provided in one embodiment of the present description.
Detailed Description
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present description. This description may be embodied in many other forms than described herein and similarly generalized by those skilled in the art to whom this disclosure pertains without departing from the spirit of the disclosure and, therefore, this disclosure is not limited by the specific implementations disclosed below.
The terminology used in the one or more embodiments of the specification is for the purpose of describing particular embodiments only and is not intended to be limiting of the one or more embodiments of the specification. As used in this specification, one or more embodiments and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used in one or more embodiments of the present specification refers to and encompasses any or all possible combinations of one or more of the associated listed items.
It should be understood that, although the terms first, second, etc. may be used in one or more embodiments of this specification to describe various information, these information should not be limited by these terms. These terms are only used to distinguish one type of information from another. For example, a first may also be referred to as a second, and similarly, a second may also be referred to as a first, without departing from the scope of one or more embodiments of the present description. The word "if" as used herein may be interpreted as "at … …" or "at … …" or "responsive to a determination", depending on the context.
First, terms related to one or more embodiments of the present specification will be explained.
Diffusion model: a method for generating images for generating a plurality of kinds of high quality pictures.
Dividing the interval: the time steps in the Markov chain of the diffusion model are divided into a plurality of intervals.
Denoising diffusion probability model: DDPM, denoising Diffusion Probabilistic Model.
In this specification, a diffusion model processing method is provided. One or more embodiments of the present specification relate to a diffusion model processing apparatus, a picture processing method, a picture processing apparatus, a computing device, a computer-readable storage medium, and a computer program, which are described in detail in the following embodiments one by one.
Referring to fig. 1, fig. 1 shows a schematic structural diagram of a denoising diffusion probability model according to an embodiment of the present specification.
In practical application, the basic idea of the denoising diffusion probability model is to construct a Markov chain, and uniformly model transition probabilities of all time steps through a network, wherein time step conditions are used as condition inputs of the network.
X in fig. 1 can be understood as a picture, t can be understood as a time step, then x t I.e. a picture of the t-th time step. If the denoising diffusion probability model is applied to a picture denoising scene, the x is as follows t The method can be understood as the noise-added picture of the t time step, and x t-1 The method can be understood as that after denoising the denoising picture of the t time step, the denoising picture of the t-1 time step is obtained; similarly, the last x 0 It can be understood as a picture obtained after complete denoising.
Referring to fig. 2, fig. 2 is a schematic diagram illustrating a specific implementation scenario of a diffusion model processing method according to an embodiment of the present disclosure.
Fig. 2 includes a cloud-side device 202 and an end-side device 204, where the cloud-side device 202 may be understood as a cloud server, and of course, in another implementation, the cloud-side device 202 may be replaced by a physical server; the end side device 204 includes, but is not limited to, a desktop computer, a notebook computer, etc.; for ease of understanding, in the embodiments of the present disclosure, the cloud-side device 202 is a cloud server, and the end-side device 204 is a notebook computer.
The application of the diffusion model processing method provided by the embodiment of the specification to the picture denoising scene is described in detail.
In practice, diffusion model training is performed at the cloud-side device 202, where the diffusion model may be understood as a diffusion model (TSDM) that reduces time-step conditions.
As shown in fig. 2, for a specific structure of the diffusion model of fig. 2, reference may be made to the structure of the denoising diffusion probability model of fig. 1.
Denoising and spreading in FIG. 1In the probabilistic model, a network e is needed θ (x t T) modeling T transition probabilities, where T is a time step condition that is used to suggest which transition probability the network is currently modeling.
In practical applications, the denoising diffusion probability model needs a large enough diffusion step number (i.e., t) to completely destroy (i.e., add noise to) the picture signal, so the number of transition probabilities that a single network needs to model is usually very large, which results in a great training burden of the network (i.e., the denoising diffusion probability model). If the training burden of the network is reduced, the training can be realized by reducing the number of spreading steps, but at the same time, x is also caused T Is not low enough so that the final sampling quality is significantly degraded.
The diffusion model provided in the embodiment of the present disclosure in fig. 2 reduces the number of transition probabilities that the network needs to model by dividing the number of diffusion steps into sections, dividing a plurality of diffusion steps into one section, and training to obtain the diffusion model by reducing only the number of time step conditions (i.e., t) while maintaining the number of diffusion steps unchanged, thereby reducing the burden on the network.
When the end-side device 204 needs to use the diffusion model, the diffusion model obtained after training of the cloud-side device 202 may be called for functional use; in addition, if the computing resources and computing power of the end-side device 204 are sufficient, the diffusion model trained in the cloud-side device 202 may be deployed on the end-side device 204. The deployment implementation is specifically implemented according to practical application, and is not limited in any way herein.
According to the diffusion model processing method provided by the embodiment of the specification, the time step set is divided into the time step intervals, and when the diffusion model is trained subsequently, the diffusion model shares the time step conditions in one time step interval, namely, the first time steps share the target time steps in the corresponding time step intervals, so that the time step conditions are reduced, the training burden is greatly lightened, and the model training performance is improved.
Referring to fig. 3, fig. 3 shows a flowchart of a diffusion model processing method according to an embodiment of the present disclosure, which specifically includes the following steps.
Step 302: and determining a time step set of the diffusion model and a time step interval corresponding to the time step set.
Wherein the diffusion model can be understood as a diffusion model (TSDM) of the above embodiment that reduces time step conditions; the time step set includes a plurality of time steps, and a single time step may be understood as T, T-1, etc. in fig. 1, and may also be understood as the number of diffusion steps described in the above embodiment, for example, in the case that the number of diffusion steps is 10, then it may be understood that the time step set includes 10 time steps.
To reduce the number of time-step conditions in subsequent diffusion model training, this may be achieved by partitioning the time-step intervals in a set of time steps. The specific implementation mode is as follows:
the determining the time step set of the diffusion model and the time step interval corresponding to the time step set comprises the following steps:
determining a time step set of a diffusion model, and dividing time step intervals in the time step set according to preset dividing conditions to obtain time step intervals corresponding to the time step set.
The preset dividing condition may be set according to practical applications, and the embodiment of the present disclosure does not limit this, for example, the preset dividing condition may be understood as dividing every 50 time steps into one time step interval.
For example, the time step set of the diffusion model includes 1000 time steps, and the preset dividing condition is to divide every 50 time steps into a time step interval.
Then, determining a time step set of a diffusion model, and dividing time step intervals in the time step set according to preset dividing conditions to obtain time step intervals corresponding to the time step set; it can be understood that a set of time steps of the diffusion model is determined [ 1000 time steps ], according to preset dividing conditions: dividing each 50 time steps into a time step interval; dividing intervals to obtain time step intervals corresponding to the time step sets, wherein the time step intervals are as follows: 20 time step intervals. When dividing the time step intervals in the time step set, the time steps in each time step interval corresponding to the obtained time step set are also ordered according to the time sequence. For example, the first time step interval contains 0-50 time steps arranged in time series.
Step 304: determining a first time step from the time step set, and determining a target time step corresponding to the first time step according to the time step interval, wherein the first time step is any time step in the time step set.
In particular, a first time step may be understood as any time step of a set of time steps, such as a first time step, a second time step, a third time step, or an nth time step, etc.
After the first time step is determined from the time step set, a target time step corresponding to the first time step can be determined according to a time step interval corresponding to the time step set.
In practical application, because the time steps in the time step set are divided according to the preset dividing condition, each time step in the time step set has a corresponding time step interval. Then any one of the set of time steps may determine a target time step corresponding thereto from the time step interval corresponding thereto.
In specific implementation, the time step corresponding to the interval endpoint of the time step interval corresponding to each time step can be used as the target time step corresponding to each time step, so that the target time step corresponding to the interval endpoint of the time step interval corresponding to each time step can be used as the time step condition in subsequent diffusion model training, thereby reducing the time step condition in diffusion model training and improving the training efficiency and training effect of the diffusion model. The specific implementation mode is as follows:
The determining the target time step corresponding to the first time step according to the time step interval includes:
and determining an interval endpoint of the time step interval, and determining a target time step corresponding to the first time step according to the interval endpoint.
Along the above example, if the first time step is the 20 th time step, the time step interval corresponding to the first time step is 0-50, and then the target time step corresponding to the first time step is determined according to the interval endpoint of the time step interval: 0 time steps or 50 time steps.
Because each time step interval comprises two interval endpoints, determining the target time step corresponding to the first time step according to the interval endpoint of the time step interval can comprise two cases, wherein one case is that the target time step corresponding to the first time step is determined according to the interval endpoint of the time step interval when the interval endpoint of the time step interval is the interval left endpoint; similarly, if the interval end point of the time step interval is the right interval end point, determining the target time step corresponding to the first time step according to the interval end point of the time step interval, and then determining the target time step as the right interval end point of the time step interval.
First, taking an interval end point of a time step interval as an interval left end point as an example, a target time step corresponding to a first time step determined according to the interval end point is described in detail, and specific implementation manners are as follows:
determining an interval endpoint of the time step interval, determining a target time step corresponding to the first time step according to the interval endpoint, including:
and determining a section left end point of the time step section, and determining the section left end point as a target time step corresponding to the first time step, wherein a section right end point of the time step section is a left end point included in a next time step section.
Taking the first time step as the 53 rd time step, the time step interval corresponding to the first time step is 50-100 as an example for explanation.
If the left end point of the interval 50-100 is 50, the 50 th time step in the time step interval can be used as the target time step corresponding to the first time step. I.e. as long as the first time step belongs to the time step interval 50-100, the target time steps corresponding to the first time step are all the left end points of the interval of the time step interval: time step 50.
In the above example, if the time step set includes 1000 time steps and is divided into 20 time step intervals, the time step interval including 50-100 time steps may be represented by [50, 100 ], that is, the right end point of the interval of the time step interval is the left end point included in the next time step interval, that is, the time step interval including 100-150 may be represented by [100, 150 ].
According to the diffusion model processing method provided by the embodiment of the specification, the time step corresponding to the left end point of the time step interval corresponding to each time step can be used as the target time step corresponding to each time step, so that the target time step corresponding to the left end point of the time step interval corresponding to each time step can be used as the time step condition in subsequent diffusion model training, the time step condition in diffusion model training is reduced, and the training efficiency and the training effect of the diffusion model are improved.
Next, taking an interval end point of the time step interval as an interval right end point as an example, a target time step corresponding to the first time step determined according to the interval end point is described in detail, and the specific implementation manner is as follows:
the determining the interval endpoint of the time step interval, determining the time step corresponding to the first time step according to the interval endpoint, includes:
and determining a right end point of the interval of the time step interval, and determining the right end point of the interval as a target time step corresponding to the first time step, wherein a left end point of the interval of the time step interval is a right end point included in the interval of the last time step.
In the above example, the 53 th time step is still taken as the first time step, and the time step interval corresponding to the first time step is 50-100.
If the right end point of the interval 50-100 is 100, the 100 th time step in the time step interval can be used as the target time step corresponding to the first time step. I.e. as long as the first time step belongs to the time step interval 50-100, the target time steps corresponding to the first time step are all the right end points of the interval of the time step interval: time step 100.
Still further, if the time step set includes 1000 time steps and is divided into 20 time step intervals, the time step interval including 50-100 time steps may be represented by (50, 100), i.e., the left end point of the time step interval is the right end point included in the previous time step interval, i.e., the time step interval including 100-150 time steps may be represented by [50, 100 ].
According to the diffusion model processing method provided by the embodiment of the specification, the time step corresponding to the right end point of the time step interval corresponding to each time step can be used as the target time step corresponding to each time step, so that the target time step corresponding to the left end point of the time step interval corresponding to each time step can be used as the time step condition in subsequent diffusion model training, the time step condition in diffusion model training is reduced, and the training efficiency and the training effect of the diffusion model are improved.
Of course, implementations are not excluded in which the target time step corresponding to the first time step may be a middle time step of the time step interval. In practical application, from one observation, it can be found that the network inputs different time steps t, if the values of the time steps t are relatively close, for the same input, the network predictions are also very close (this is caused by network continuity), so reducing the time steps t can reduce the network load; the time steps in the same time step interval are relatively close, so that any time step in each time step interval can be selected as the target time step corresponding to the first time step.
Step 306: and inputting the noise adding picture corresponding to the first time step and the target time step into a diffusion model to obtain the prediction noise corresponding to the noise adding picture.
The noise adding picture corresponding to the first time step can be understood as a noise adding picture which adds noise in the first time step in the forward process of the diffusion model.
Specifically, the diffusion model is divided into two stages, including forward passThe forward process is to construct a Markov chain to gradually add noise to the picture signal to become a noise signal, i.e. a noisy picture. Specifically, a discrete Markov chain { x } is first constructed 0 ,x 1 ,..,x N The transition probability of the forward procedure can be expressed as equation 1:
Figure BDA0004111681690000071
wherein,,
Figure BDA0004111681690000072
0 ,β 1 ,...,β N and is a pre-designed noise sequence.
As can be seen from equation 1, the distribution of the forward to step T is as shown in equation 2:
Figure BDA0004111681690000073
the distribution is very close to the standard normal distribution, and the inverse generation process can be directly sampled from a gaussian distribution.
Then, according to the above formula 1 and formula 2, a noise adding picture corresponding to each time step in the forward direction process of the diffusion model can be obtained; the specific implementation mode is as follows:
inputting the noise-added picture corresponding to the first time step and the target time step into a diffusion model, and before obtaining the prediction noise corresponding to the noise-added picture, further comprising:
determining an initial picture and target noise corresponding to the first time step;
and determining a noise adding picture corresponding to the first time step and target noise corresponding to the noise adding picture according to the initial picture and the target noise.
The initial picture may be understood as an uncorrupted picture of any size and in any format, or a picture that has been subjected to noise and is output in a time step previous to the first time step.
Specifically, under the condition that the first time step is the original picture, determining the original picture and the target noise corresponding to the first time step, and adding the target noise to the original picture to generate the noise adding picture corresponding to the first time step and the target noise corresponding to the noise adding picture. And under the condition that the first time step is not the original picture, the initial picture can be understood as a noise adding picture output in the last time step of the first time step, then the initial picture and the target noise corresponding to the first time step are determined, and the target noise is added to the initial picture, so that the noise adding picture corresponding to the first time step and the target noise corresponding to the noise adding picture can be generated. Of course, in the process of denoising the initial picture in the forward direction of the diffusion model, not only denoising is needed according to the target noise corresponding to the first time step, but also the noise intensity corresponding to the first time step is considered, and the specific picture denoising process is not limited in any way in the embodiment of the present specification.
In the implementation, after the noise-added picture corresponding to the first time step and the target time step corresponding to the first time step are input into the diffusion model, the prediction noise corresponding to the noise-added picture can be obtained.
Step 308: and processing the diffusion model according to the target noise corresponding to the noise-added picture and the prediction noise.
Specifically, the processing the diffusion model according to the target noise corresponding to the noise-added picture and the prediction noise includes:
and calculating a noise loss function according to the target noise corresponding to the noise-added picture and the predicted noise, adjusting network parameters of the diffusion model according to the noise loss function, and obtaining the diffusion model under the condition that a preset training ending condition is met.
In practical application, after determining the target noise corresponding to the noise-added picture and the predicted noise output by the diffusion model, a noise loss function can be calculated according to the difference between the target noise and the predicted noise, and then the network parameters of the diffusion model can be adjusted according to the noise loss function to realize the training of the diffusion model, and the diffusion model after training is obtained under the condition that the preset training ending condition is met. The preset training ending condition may be understood that the training frequency meets a preset frequency threshold (such as 1 ten thousand times or 2 ten thousand times); or the model performance (e.g., accuracy, precision, etc.) of the diffusion model meets a preset performance threshold, etc.
In specific implementation, the calculation process of the noise loss function can be referred to as the following formula 3:
Figure BDA0004111681690000081
wherein, equation 3 represents the above noise loss function of training, during the training process, T is sampled first, T obeys a uniform distribution from 0 to T, which is equivalent to randomly selecting one from integers from 0 to T as T. Next to x 0 Sampling is equivalent to taking a real picture from the data set, and finally sampling from the standard normal distribution to obtain noise E. After sampling, the noise E is added to the real picture x 0 Obtaining the noise-added picture
Figure BDA0004111681690000082
The intensity of the noise is determined by t, a specific parameter alpha t Is designed in advance. The denoised picture x t And->
Figure BDA0004111681690000083
The input is put into a diffusion model, the output (i.e. the predicted noise) and the true noise E are calculated to be lost (the aim of training is to make the neural network possible from a noisy picture x t And the intensity of the noise (determined by t) predicts the noise e added to the picture.
Wherein,,
Figure BDA0004111681690000091
t is a time-step inter-zone sequence t= { T 0 ,t 1 ,…,t n The element inside is the left end point of the interval of the time step interval. For example, if 1000T are divided into 20 intervals, t= {0,50,100, …,1000}. f (f) T (t) is a function, the input is the current time step t, and the output is the left end point of the time step interval to which t belongs. For example, the time step interval corresponding to t=53 is 50-100, so f T (53)=50。
To synthesize the above, the single step transition probability of the diffusion model inverse process can be referred to as the following equation 4, namely the inverse denoising process of the diffusion model:
Figure BDA0004111681690000092
wherein equation 3 represents the single step transition probability of the inverse process of the diffusion model, i.e., given x t Under the condition of x t-1 Obeying a mean value of
Figure BDA0004111681690000093
Variance is->
Figure BDA0004111681690000094
And the variance is preset. Mean->
Figure BDA0004111681690000095
Middle beta t Is a series of fixed parameters indicating the different t-plus-noise intensities, alpha, in the forward direction t Is made of beta t The calculated parameters are as follows: />
Figure BDA0004111681690000096
Figure BDA0004111681690000097
Is a neural network, the input is a noisy picture x t Left end point f corresponding to t T (t) the output is added to x t Noise on the floor.
According to the diffusion model processing method provided by the embodiment of the specification, the time step set is divided into the time step intervals, and when the diffusion model is trained subsequently, the diffusion model shares the time step conditions in one time step interval, namely, the first time steps share the target time steps in the corresponding time step intervals, so that the time step conditions are reduced, the training burden is greatly lightened, and the model training performance is improved.
In addition, after the diffusion model is obtained through training, the diffusion model can be practically applied. The specific implementation mode is as follows:
And after the diffusion model is processed according to the target noise corresponding to the noise-added picture and the prediction noise, the method further comprises the following steps:
determining a target noise adding picture, and inputting the target noise adding picture into a diffusion model to obtain prediction noise corresponding to the target noise adding picture;
and determining the denoised target picture according to the target denoised picture and the prediction noise corresponding to the target denoised picture.
The target noise-added picture can be understood as noise-added pictures with any size and any format.
In another implementation manner, the target noise-added picture can be understood as a video frame, that is, the diffusion model can denoise the noise-added video frame to obtain a clear and accurate video frame. The specific implementation mode is as follows:
the determining the target noise-added picture comprises the following steps:
and determining a noisy video frame set, and determining any video frame in the video frame set as a target noisy picture.
Wherein the noisy video frame set comprises a plurality of video frames of any type to which various kinds of noise are added.
Specifically, after the noisy video frame set is determined, any video frame in the video frame set can be used as a target noisy picture to perform subsequent denoising processing.
After determining the target noise-added picture, the target noise-added picture can be directly input into a diffusion model obtained through training by the diffusion model processing method, and the diffusion model can output the prediction noise corresponding to the target noise-added picture; and then removing noise in the target noise-added picture according to the predicted noise corresponding to the target noise-added picture, so that the denoised target picture can be accurately obtained.
In another implementation embodiment, the diffusion model processing method provided in the present specification may also be applied to the field of generating images by using AI (Artificial Intelligence ) for text; on the basis that the diffusion model is a picture generation model, AI image generation is carried out by taking the text as a condition; in specific implementation, the text condition may be that a pre-trained encoder is adopted to encode the text condition, then the encoding is combined to a diffusion model through a self-attention mechanism, and on the basis of generating pictures by the diffusion model, an AI image normalized by the text condition is generated by combining the encoding.
For example, the text condition is: and (3) generating an AI image of the Tadi bear-in-age square slide plate, inputting the text into an AI image generation model (namely a text encoder and a diffusion model), encoding the text by the text encoder of the AI image generation model, combining the encoding with the diffusion model through self-attention knowledge to generate the AI image, and finally outputting the AI image of the Tadi bear-in-age square slide plate by the AI image generation model.
In practical application, the method can generate not only AI images but also ordinary two-dimensional images, and the like, and particularly, the condition setting is performed according to practical application, and the description is not limited in any way.
Referring to fig. 4, fig. 4 shows a flowchart of a picture processing method according to an embodiment of the present disclosure, which specifically includes the following steps.
Step 402: and determining a target noise adding picture, and inputting the target noise adding picture into a diffusion model to obtain the prediction noise corresponding to the target noise adding picture.
Step 404: determining a denoised target picture according to the target denoised picture and the prediction noise corresponding to the target denoised picture,
wherein the diffusion model is obtained by the diffusion model processing method.
Specifically, for specific implementation steps of the image processing method, reference may be made to the detailed description of the diffusion model processing method in the above embodiment, which is not repeated in this embodiment of the present disclosure.
According to the picture processing method provided by the embodiment of the specification, the picture denoising can be rapidly and accurately performed by reducing the time-step condition-trained high-performance diffusion model, the denoised target picture is obtained, and the picture denoising performance is greatly improved.
Corresponding to the above method embodiments, the present disclosure further provides an embodiment of a diffusion model processing apparatus, and fig. 5 shows a schematic structural diagram of a diffusion model processing apparatus provided in one embodiment of the present disclosure. As shown in fig. 5, the apparatus includes:
the interval dividing module 502 is configured to determine a time step set of the diffusion model and a time step interval corresponding to the time step set;
a target time step determining module 504 configured to determine a first time step from the set of time steps, and determine a target time step corresponding to the first time step according to the time step interval, where the first time step is any time step in the set of time steps;
a first model prediction module 506, configured to input a noise-added picture corresponding to the first time step and the target time step into a diffusion model, so as to obtain prediction noise corresponding to the noise-added picture;
and the model processing module 508 is configured to process the diffusion model according to the target noise corresponding to the noise-added picture and the prediction noise.
Optionally, the apparatus further comprises:
a noise adding module configured to:
determining an initial picture and target noise corresponding to the first time step;
And determining a noise adding picture corresponding to the first time step and target noise corresponding to the noise adding picture according to the initial picture and the target noise.
Optionally, the interval dividing module 502 is further configured to:
determining a time step set of a diffusion model, and dividing time step intervals in the time step set according to preset dividing conditions to obtain time step intervals corresponding to the time step set.
Optionally, the target time step determining module 504 is further configured to:
and determining an interval endpoint of the time step interval, and determining a target time step corresponding to the first time step according to the interval endpoint.
Optionally, the target time step determining module 504 is further configured to:
and determining a section left end point of the time step section, and determining the section left end point as a target time step corresponding to the first time step, wherein a section right end point of the time step section is a left end point included in a next time step section.
Optionally, the target time step determining module 504 is further configured to:
and determining a right end point of the interval of the time step interval, and determining the right end point of the interval as a target time step corresponding to the first time step, wherein a left end point of the interval of the time step interval is a right end point included in the interval of the last time step.
Optionally, the model processing module 508 is further configured to:
and calculating a noise loss function according to the target noise corresponding to the noise-added picture and the predicted noise, adjusting network parameters of the diffusion model according to the noise loss function, and obtaining the diffusion model under the condition that a preset training ending condition is met.
Optionally, the apparatus further comprises:
a denoising module configured to:
determining a target noise adding picture, and inputting the target noise adding picture into a diffusion model to obtain prediction noise corresponding to the target noise adding picture;
and determining the denoised target picture according to the target denoised picture and the prediction noise corresponding to the target denoised picture.
Optionally, the denoising module is further configured to:
and determining a noisy video frame set, and determining any video frame in the video frame set as a target noisy picture.
According to the diffusion model processing device provided by the embodiment of the specification, the time step set is divided into the time step intervals, and when the diffusion model is trained subsequently, the diffusion model shares the time step conditions in one time step interval, namely, the first time steps share the target time steps in the corresponding time step intervals, so that the time step conditions are reduced, the training burden is greatly lightened, and the model training performance is improved.
The above is a schematic scheme of a diffusion model processing apparatus of the present embodiment. It should be noted that, the technical solution of the diffusion model processing apparatus and the technical solution of the diffusion model processing method belong to the same concept, and details of the technical solution of the diffusion model processing apparatus, which are not described in detail, can be referred to the description of the technical solution of the diffusion model processing method.
Corresponding to the above method embodiments, the present disclosure further provides an embodiment of a picture processing apparatus, and fig. 6 shows a schematic structural diagram of a picture processing apparatus provided in one embodiment of the present disclosure. As shown in fig. 6, the apparatus includes:
a second model prediction module 602 configured to determine a target noise-added picture, and input the target noise-added picture into a diffusion model to obtain prediction noise corresponding to the target noise-added picture;
a target picture determining module 604 configured to determine a denoised target picture according to the target denoised picture and the prediction noise corresponding to the target denoised picture,
wherein the diffusion model is obtained by the diffusion model processing method.
According to the image processing device provided by the embodiment of the specification, through reducing the high-performance diffusion model trained under the time step condition, image denoising can be rapidly and accurately performed, a denoised target image is obtained, and the image denoising performance is greatly improved.
The above is a schematic solution of a picture processing apparatus of the present embodiment. It should be noted that, the technical solution of the image processing apparatus and the technical solution of the image processing method belong to the same concept, and details of the technical solution of the image processing apparatus, which are not described in detail, can be referred to the description of the technical solution of the image processing method.
Fig. 7 illustrates a block diagram of a computing device 700 provided in accordance with one embodiment of the present description. The components of computing device 700 include, but are not limited to, memory 710 and processor 720. Processor 720 is coupled to memory 710 via bus 730, and database 750 is used to store data.
Computing device 700 also includes access device 740, access device 740 enabling computing device 700 to communicate via one or more networks 760. Examples of such networks include public switched telephone networks (PSTN, public Switched Telephone Network), local area networks (LAN, local Area Network), wide area networks (WAN, wide Area Network), personal area networks (PAN, personal Area Network), or combinations of communication networks such as the internet. The access device 740 may include one or more of any type of network interface, wired or wireless, such as a network interface card (NIC, network interface controller), such as an IEEE802.11 wireless local area network (WLAN, wireless Local Area Network) wireless interface, a worldwide interoperability for microwave access (Wi-MAX, worldwide Interoperability for Microwave Access) interface, an ethernet interface, a universal serial bus (USB, universal Serial Bus) interface, a cellular network interface, a bluetooth interface, a near field communication (NFC, near Field Communication) interface, and so forth.
In one embodiment of the present description, the above-described components of computing device 700, as well as other components not shown in FIG. 7, may also be connected to each other, such as by a bus. It should be understood that the block diagram of the computing device illustrated in FIG. 7 is for exemplary purposes only and is not intended to limit the scope of the present description. Those skilled in the art may add or replace other components as desired.
Computing device 700 may be any type of stationary or mobile computing device, including a mobile computer or mobile computing device (e.g., tablet, personal digital assistant, laptop, notebook, netbook, etc.), mobile phone (e.g., smart phone), wearable computing device (e.g., smart watch, smart glasses, etc.), or other type of mobile device, or a stationary computing device such as a desktop computer or personal computer (PC, personal Computer). Computing device 700 may also be a mobile or stationary server.
The processor 720 is configured to execute computer-executable instructions that, when executed by the processor, implement the steps of the diffusion model processing method or the picture processing method described above. The foregoing is a schematic illustration of a computing device of this embodiment. It should be noted that, the technical solution of the computing device and the technical solution of the diffusion model processing method or the picture processing method belong to the same concept, and details of the technical solution of the computing device, which are not described in detail, can be referred to the description of the technical solution of the diffusion model processing method or the picture processing method.
An embodiment of the present disclosure also provides a computer-readable storage medium storing computer-executable instructions that, when executed by a processor, implement the steps of the diffusion model processing method or the picture processing method described above.
The above is an exemplary version of a computer-readable storage medium of the present embodiment. It should be noted that, the technical solution of the storage medium and the technical solution of the diffusion model processing method or the picture processing method belong to the same concept, and details of the technical solution of the storage medium which are not described in detail can be referred to the description of the technical solution of the diffusion model processing method or the picture processing method.
An embodiment of the present disclosure further provides a computer program, where the computer program, when executed in a computer, causes the computer to perform the steps of the diffusion model processing method or the picture processing method described above.
The above is an exemplary version of a computer program of the present embodiment. It should be noted that, the technical solution of the computer program and the technical solution of the diffusion model processing method or the picture processing method belong to the same concept, and details of the technical solution of the computer program, which are not described in detail, can be referred to the description of the technical solution of the diffusion model processing method or the picture processing method.
The foregoing describes specific embodiments of the present disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims can be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.
The computer instructions include computer program code that may be in source code form, object code form, executable file or some intermediate form, etc. The computer readable medium may include: any entity or device capable of carrying the computer program code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer Memory, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), an electrical carrier signal, a telecommunications signal, a software distribution medium, and so forth. It should be noted that the computer readable medium contains content that can be appropriately scaled according to the requirements of jurisdictions in which such content is subject to legislation and patent practice, such as in certain jurisdictions in which such content is subject to legislation and patent practice, the computer readable medium does not include electrical carrier signals and telecommunication signals.
It should be noted that, for simplicity of description, the foregoing method embodiments are all expressed as a series of combinations of actions, but it should be understood by those skilled in the art that the embodiments are not limited by the order of actions described, as some steps may be performed in other order or simultaneously according to the embodiments of the present disclosure. Further, those skilled in the art will appreciate that the embodiments described in the specification are all preferred embodiments, and that the acts and modules referred to are not necessarily all required for the embodiments described in the specification.
In the foregoing embodiments, the descriptions of the embodiments are emphasized, and for parts of one embodiment that are not described in detail, reference may be made to the related descriptions of other embodiments.
The preferred embodiments of the present specification disclosed above are merely used to help clarify the present specification. Alternative embodiments are not intended to be exhaustive or to limit the invention to the precise form disclosed. Obviously, many modifications and variations are possible in light of the teaching of the embodiments. The embodiments were chosen and described in order to best explain the principles of the embodiments and the practical application, to thereby enable others skilled in the art to best understand and utilize the invention. This specification is to be limited only by the claims and the full scope and equivalents thereof.

Claims (14)

1. A diffusion model processing method comprising:
determining a time step set of a diffusion model and a time step interval corresponding to the time step set;
determining a first time step from the time step set, and determining a target time step corresponding to the first time step according to the time step interval, wherein the first time step is any time step in the time step set;
inputting the noise adding picture corresponding to the first time step and the target time step into a diffusion model to obtain the prediction noise corresponding to the noise adding picture;
and processing the diffusion model according to the target noise corresponding to the noise-added picture and the prediction noise.
2. The diffusion model processing method according to claim 1, wherein before inputting the noise-added picture corresponding to the first time step and the target time step into a diffusion model to obtain the prediction noise corresponding to the noise-added picture, the method further comprises:
determining an initial picture and target noise corresponding to the first time step;
and determining a noise adding picture corresponding to the first time step and target noise corresponding to the noise adding picture according to the initial picture and the target noise.
3. The diffusion model processing method according to claim 1, wherein the determining the time step set of the diffusion model and the time step interval corresponding to the time step set includes:
determining a time step set of a diffusion model, and dividing time step intervals in the time step set according to preset dividing conditions to obtain time step intervals corresponding to the time step set.
4. The diffusion model processing method according to claim 1, wherein the determining the target time step corresponding to the first time step according to the time step interval includes:
and determining an interval endpoint of the time step interval, and determining a target time step corresponding to the first time step according to the interval endpoint.
5. The diffusion model processing method according to claim 4, wherein determining the interval end point of the time step interval, and determining the target time step corresponding to the first time step according to the interval end point, comprises:
and determining a section left end point of the time step section, and determining the section left end point as a target time step corresponding to the first time step, wherein a section right end point of the time step section is a left end point included in a next time step section.
6. The diffusion model processing method according to claim 4, wherein determining the interval end point of the time step interval, and determining the time step corresponding to the first time step according to the interval end point, comprises:
and determining a right end point of the interval of the time step interval, and determining the right end point of the interval as a target time step corresponding to the first time step, wherein a left end point of the interval of the time step interval is a right end point included in the interval of the last time step.
7. The diffusion model processing method according to claim 1, wherein the processing the diffusion model according to the target noise corresponding to the noisy picture and the prediction noise includes:
and calculating a noise loss function according to the target noise corresponding to the noise-added picture and the predicted noise, adjusting network parameters of the diffusion model according to the noise loss function, and obtaining the diffusion model under the condition that a preset training ending condition is met.
8. The method for processing a diffusion model according to any one of claims 1 to 7, wherein after the processing the diffusion model according to the target noise corresponding to the noisy picture and the prediction noise, the method further comprises:
Determining a target noise adding picture, and inputting the target noise adding picture into a diffusion model to obtain prediction noise corresponding to the target noise adding picture;
and determining the denoised target picture according to the target denoised picture and the prediction noise corresponding to the target denoised picture.
9. The diffusion model processing method according to claim 8, the determining a target noisy picture, comprising:
and determining a noisy video frame set, and determining any video frame in the video frame set as a target noisy picture.
10. A diffusion model processing apparatus comprising:
the interval dividing module is configured to determine a time step set of the diffusion model and a time step interval corresponding to the time step set;
the target time step determining module is configured to determine a first time step from the time step set, and determine a target time step corresponding to the first time step according to the time step interval, wherein the first time step is any time step in the time step set;
the first model prediction module is configured to input a noise-added picture corresponding to the first time step and the target time step into a diffusion model to obtain prediction noise corresponding to the noise-added picture;
And the model processing module is configured to process the diffusion model according to the target noise corresponding to the noise-added picture and the prediction noise.
11. A picture processing method, comprising:
determining a target noise adding picture, and inputting the target noise adding picture into a diffusion model to obtain prediction noise corresponding to the target noise adding picture;
determining a denoised target picture according to the target denoised picture and the prediction noise corresponding to the target denoised picture, wherein the diffusion model is obtained by the diffusion model processing method according to any one of the claims 1-9.
12. A picture processing apparatus comprising:
the second model prediction module is configured to determine a target noise-added picture, input the target noise-added picture into a diffusion model and obtain prediction noise corresponding to the target noise-added picture;
a target picture determining module configured to determine a denoised target picture according to the target denoised picture and a prediction noise corresponding to the target denoised picture,
wherein the diffusion model is obtained by the diffusion model processing method according to any one of the preceding claims 1 to 9.
13. A computing device, comprising:
A memory and a processor;
the memory is configured to store computer executable instructions that, when executed by a processor, implement the steps of the diffusion model processing method of any one of claims 1 to 9 or the picture processing method of claim 11.
14. A computer-readable storage medium storing computer-executable instructions which, when executed by a processor, implement the steps of the diffusion model processing method of any one of claims 1 to 9 or the picture processing method of claim 11.
CN202310177857.XA 2023-02-16 2023-02-16 Diffusion model processing method and device and picture processing method and device Pending CN116309135A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310177857.XA CN116309135A (en) 2023-02-16 2023-02-16 Diffusion model processing method and device and picture processing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310177857.XA CN116309135A (en) 2023-02-16 2023-02-16 Diffusion model processing method and device and picture processing method and device

Publications (1)

Publication Number Publication Date
CN116309135A true CN116309135A (en) 2023-06-23

Family

ID=86829887

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310177857.XA Pending CN116309135A (en) 2023-02-16 2023-02-16 Diffusion model processing method and device and picture processing method and device

Country Status (1)

Country Link
CN (1) CN116309135A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116543240A (en) * 2023-07-06 2023-08-04 华中科技大学 Defending method for machine learning against attacks
CN116912352A (en) * 2023-09-12 2023-10-20 苏州浪潮智能科技有限公司 Picture generation method and device, electronic equipment and storage medium

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116543240A (en) * 2023-07-06 2023-08-04 华中科技大学 Defending method for machine learning against attacks
CN116543240B (en) * 2023-07-06 2023-09-19 华中科技大学 Defending method for machine learning against attacks
CN116912352A (en) * 2023-09-12 2023-10-20 苏州浪潮智能科技有限公司 Picture generation method and device, electronic equipment and storage medium
CN116912352B (en) * 2023-09-12 2024-01-26 苏州浪潮智能科技有限公司 Picture generation method and device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
Van Den Oord et al. Pixel recurrent neural networks
CN116309135A (en) Diffusion model processing method and device and picture processing method and device
US7783459B2 (en) Analog system for computing sparse codes
CN112529150A (en) Model structure, model training method, image enhancement method and device
JP6789894B2 (en) Network coefficient compressor, network coefficient compression method and program
CN110321962B (en) Data processing method and device
CN111738020B (en) Translation model training method and device
KR102299958B1 (en) Systems and methods for image compression at multiple, different bitrates
US20220164666A1 (en) Efficient mixed-precision search for quantizers in artificial neural networks
CN115424088A (en) Image processing model training method and device
US20220067888A1 (en) Image processing method and apparatus, storage medium, and electronic device
CN116208807A (en) Video frame processing method and device, and video frame denoising method and device
US20230252294A1 (en) Data processing method, apparatus, and device, and computer-readable storage medium
CN115641485A (en) Generative model training method and device
Liu et al. Computation-performance optimization of convolutional neural networks with redundant kernel removal
CN114071141A (en) Image processing method and equipment
CN115984944A (en) Expression information identification method, device, equipment, readable storage medium and product
CN108986047B (en) Image noise reduction method
CN117726542A (en) Controllable noise removing method and system based on diffusion model
Marusic et al. Adaptive prediction for lossless image compression
CN110120009B (en) Background blurring implementation method based on salient object detection and depth estimation algorithm
CN111882028A (en) Convolution operation device for convolution neural network
CN115546236B (en) Image segmentation method and device based on wavelet transformation
CN116597263A (en) Training method and related device for image synthesis model
CN116109537A (en) Distorted image reconstruction method and related device based on deep learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination