CN116309135A

CN116309135A - Diffusion model processing method and device and picture processing method and device

Info

Publication number: CN116309135A
Application number: CN202310177857.XA
Authority: CN
Inventors: 阳展韬; 沈宇军; 张晗; 冯睿蠡; 黄梁华; 刘宇; 张轶飞; 赵德丽; 周靖人
Original assignee: Alibaba Damo Institute Hangzhou Technology Co Ltd
Current assignee: Alibaba Damo Institute Hangzhou Technology Co Ltd
Priority date: 2023-02-16
Filing date: 2023-02-16
Publication date: 2023-06-23

Abstract

The embodiment of this specification provides a diffusion model processing method and device, and a picture processing method and device, wherein the diffusion model processing method includes determining the time step set of the diffusion model and the time step interval corresponding to the time step set; determining from the time step set The first time step, and determine the target time step corresponding to the first time step according to the time step interval; input the noise-added picture corresponding to the first time step and the target time step into the diffusion model, and obtain the prediction noise corresponding to the noise-added picture; according to The target noise and prediction noise corresponding to the noised image are processed by the diffusion model. In this method, the time step set is divided into time step intervals. When the diffusion model is trained later, the diffusion model can share the time step condition in a time step interval, that is, the first time step shares the target in its corresponding time step interval. Time step reduces the time step condition, greatly reduces the training burden and improves the model training performance.

Description

Diffusion model processing method and device, image processing method and device

技术领域technical field

本说明书实施例涉及计算机技术领域，特别涉及一种扩散模型处理方法及装置、一种图片处理方法及装置、一种计算设备、一种计算机可读存储介质。The embodiments of this specification relate to the field of computer technology, and in particular to a diffusion model processing method and device, an image processing method and device, a computing device, and a computer-readable storage medium.

背景技术Background technique

扩散模型是一种生成模型，可以通过构造一个马尔可夫链逐步的将图片分布破坏为高斯噪声；再通过网络学习反向分布逐步去噪生成图片。扩散模型在多种任务上都取得了非常惊艳的效果，包括但不限于文本生成图像的多模态生成任务等。Diffusion model is a generative model, which can gradually destroy the picture distribution into Gaussian noise by constructing a Markov chain; then learn the reverse distribution through the network to gradually denoise and generate pictures. Diffusion models have achieved amazing results in a variety of tasks, including but not limited to multimodal generation tasks of text-to-image generation.

但是，传统的扩散模型通常会通过一个网络学习所有的单步转移概率，通过时间步条件来控制不同概率的学习。这种方式会造成很重的训练负担，导致模型能力不足，训练效果较差。However, traditional diffusion models usually learn all single-step transition probabilities through a network, and control the learning of different probabilities through time-step conditions. This method will cause a heavy training burden, resulting in insufficient model capability and poor training effect.

发明内容Contents of the invention

有鉴于此，本说明书实施例提供了一种扩散模型处理方法。本说明书一个或者多个实施例同时涉及一种扩散模型处理装置，一种图片处理方法，一种图片处理装置，一种计算设备，一种计算机可读存储介质以及一种计算机程序，以解决现有技术中存在的技术缺陷。In view of this, the embodiment of this specification provides a diffusion model processing method. One or more embodiments of this specification also relate to a diffusion model processing device, an image processing method, an image processing device, a computing device, a computer-readable storage medium, and a computer program to solve current problems. There are technical flaws in the technology.

根据本说明书实施例的第一方面，提供了一种扩散模型处理方法，包括：According to the first aspect of the embodiments of this specification, a diffusion model processing method is provided, including:

确定扩散模型的时间步集合、以及所述时间步集合对应的时间步区间；Determining a time step set of the diffusion model and a time step interval corresponding to the time step set;

从所述时间步集合中确定第一时间步，并根据所述时间步区间确定所述第一时间步对应的目标时间步，其中，所述第一时间步为所述时间步集合中的任一时间步；Determine a first time step from the time step set, and determine a target time step corresponding to the first time step according to the time step interval, wherein the first time step is any in the time step set a time step;

将所述第一时间步对应的加噪图片以及所述目标时间步，输入扩散模型，获得所述加噪图片对应的预测噪声；Inputting the noise-added picture corresponding to the first time step and the target time step into a diffusion model to obtain the predicted noise corresponding to the noise-added picture;

根据所述加噪图片对应的目标噪声以及所述预测噪声，处理所述扩散模型。Processing the diffusion model according to the target noise corresponding to the noised picture and the predicted noise.

根据本说明书实施例的第二方面，提供了一种扩散模型处理装置，包括：According to the second aspect of the embodiments of this specification, a diffusion model processing device is provided, including:

区间划分模块，被配置为确定扩散模型的时间步集合、以及所述时间步集合对应的时间步区间；An interval division module configured to determine a time step set of the diffusion model and a time step interval corresponding to the time step set;

目标时间步确定模块，被配置为从所述时间步集合中确定第一时间步，并根据所述时间步区间确定所述第一时间步对应的目标时间步，其中，所述第一时间步为所述时间步集合中的任一时间步；A target time step determination module configured to determine a first time step from the set of time steps, and determine a target time step corresponding to the first time step according to the time step interval, wherein the first time step is any time step in the set of time steps;

第一模型预测模块，被配置为将所述第一时间步对应的加噪图片以及所述目标时间步，输入扩散模型，获得所述加噪图片对应的预测噪声；The first model prediction module is configured to input the noise-added picture corresponding to the first time step and the target time step into a diffusion model to obtain the predicted noise corresponding to the noise-added picture;

模型处理模块，被配置为根据所述加噪图片对应的目标噪声以及所述预测噪声，处理所述扩散模型。The model processing module is configured to process the diffusion model according to the target noise corresponding to the noise-added picture and the predicted noise.

根据本说明书实施例的第三方面，提供了一种图片处理方法，包括：According to a third aspect of the embodiments of this specification, there is provided an image processing method, including:

确定目标加噪图片，将所述目标加噪图片输入扩散模型，获得所述目标加噪图片对应的预测噪声；Determining the target noise-added picture, inputting the target noise-added picture into a diffusion model, and obtaining the predicted noise corresponding to the target noise-added picture;

根据所述目标加噪图片以及所述目标加噪图片对应的预测噪声，确定去噪后的目标图片，determining a denoised target picture according to the target noise-added picture and the predicted noise corresponding to the target noise-added picture,

其中，所述扩散模型通过上述扩散模型处理方法获得。Wherein, the diffusion model is obtained through the above-mentioned diffusion model processing method.

根据本说明书实施例的第四方面，提供了一种图片处理装置，包括：According to a fourth aspect of the embodiments of this specification, an image processing device is provided, including:

第二模型预测模块，被配置为确定目标加噪图片，将所述目标加噪图片输入扩散模型，获得所述目标加噪图片对应的预测噪声；The second model prediction module is configured to determine the target noise-added picture, input the target noise-added picture into the diffusion model, and obtain the predicted noise corresponding to the target noise-added picture;

目标图片确定模块，被配置为根据所述目标加噪图片以及所述目标加噪图片对应的预测噪声，确定去噪后的目标图片，The target picture determination module is configured to determine a denoised target picture according to the target noise-added picture and the predicted noise corresponding to the target noise-added picture,

根据本说明书实施例的第五方面，提供了一种计算设备，包括：According to a fifth aspect of the embodiments of this specification, a computing device is provided, including:

存储器和处理器；memory and processor;

所述存储器用于存储计算机可执行指令，所述处理器用于执行所述计算机可执行指令，该计算机可执行指令被处理器执行时实现上述扩散模型处理方法或者图片处理方法的步骤。The memory is used to store computer-executable instructions, and the processor is used to execute the computer-executable instructions. When the computer-executable instructions are executed by the processor, the steps of the above-mentioned diffusion model processing method or image processing method are implemented.

根据本说明书实施例的第六方面，提供了一种计算机可读存储介质，其存储有计算机可执行指令，该指令被处理器执行时实现上述扩散模型处理方法或者图片处理方法的步骤。According to a sixth aspect of the embodiments of the present specification, there is provided a computer-readable storage medium, which stores computer-executable instructions, and when the instructions are executed by a processor, the steps of the above-mentioned diffusion model processing method or image processing method are implemented.

根据本说明书实施例的第七方面，提供了一种计算机程序，其中，当所述计算机程序在计算机中执行时，令计算机执行上述扩散模型处理方法或者图片处理方法的步骤。According to a seventh aspect of the embodiments of the present specification, a computer program is provided, wherein, when the computer program is executed in a computer, the computer is instructed to execute the steps of the above-mentioned diffusion model processing method or image processing method.

本说明书一个实施例实现了一种扩散模型处理方法，包括确定扩散模型的时间步集合、以及所述时间步集合对应的时间步区间；从所述时间步集合中确定第一时间步，并根据所述时间步区间确定所述第一时间步对应的目标时间步，其中，所述第一时间步为所述时间步集合中的任一时间步；将所述第一时间步对应的加噪图片以及所述目标时间步，输入扩散模型，获得所述加噪图片对应的预测噪声；根据所述加噪图片对应的目标噪声以及所述预测噪声，处理所述扩散模型。An embodiment of this specification implements a diffusion model processing method, including determining a time step set of the diffusion model and a time step interval corresponding to the time step set; determining the first time step from the time step set, and according to The time step interval determines the target time step corresponding to the first time step, wherein the first time step is any time step in the time step set; the noise addition corresponding to the first time step The picture and the target time step are input into a diffusion model to obtain the predicted noise corresponding to the noise-added picture; and the diffusion model is processed according to the target noise corresponding to the noise-added picture and the predicted noise.

具体的，该扩散模型处理方法通过将时间步集合划分为时间步区间，后续进行扩散模型训练时，使得该扩散模型在一个时间步区间内共享时间步条件，即第一时间步均共享其对应时间步区间内的目标时间步，减少了时间步条件，极大的减轻了训练负担，提升了模型训练性能。Specifically, the diffusion model processing method divides the time step set into time step intervals, and when the diffusion model is trained later, the diffusion model can share the time step conditions in a time step interval, that is, the first time step shares its corresponding The target time step within the time step interval reduces the time step conditions, greatly reduces the training burden, and improves the model training performance.

附图说明Description of drawings

图1是本说明书一个实施例提供的一种去噪扩散概率模型的结构示意图；FIG. 1 is a schematic structural diagram of a denoising diffusion probability model provided by an embodiment of this specification;

图2是本说明书一个实施例提供的一种扩散模型处理方法的具体实现场景示意图；Fig. 2 is a schematic diagram of a specific implementation scenario of a diffusion model processing method provided by an embodiment of this specification;

图3是本说明书一个实施例提供的一种扩散模型处理方法的流程图；Fig. 3 is a flowchart of a diffusion model processing method provided by an embodiment of this specification;

图4是本说明书一个实施例提供的一种图片处理方法的流程图；Fig. 4 is a flow chart of an image processing method provided by an embodiment of this specification;

图5是本说明书一个实施例提供的一种扩散模型处理装置的结构示意图；Fig. 5 is a schematic structural diagram of a diffusion model processing device provided by an embodiment of this specification;

图6是本说明书一个实施例提供的一种图片处理装置的结构示意图；Fig. 6 is a schematic structural diagram of an image processing device provided by an embodiment of this specification;

图7是本说明书一个实施例提供的一种计算设备的结构框图。Fig. 7 is a structural block diagram of a computing device provided by an embodiment of this specification.

具体实施方式Detailed ways

在下面的描述中阐述了很多具体细节以便于充分理解本说明书。但是本说明书能够以很多不同于在此描述的其它方式来实施，本领域技术人员可以在不违背本说明书内涵的情况下做类似推广，因此本说明书不受下面公开的具体实施的限制。In the following description, numerous specific details are set forth in order to provide a thorough understanding of the specification. However, this specification can be implemented in many other ways different from those described here, and those skilled in the art can make similar extensions without violating the connotation of this specification, so this specification is not limited by the specific implementations disclosed below.

在本说明书一个或多个实施例中使用的术语是仅仅出于描述特定实施例的目的，而非旨在限制本说明书一个或多个实施例。在本说明书一个或多个实施例和所附权利要求书中所使用的单数形式的“一种”、“所述”和“该”也旨在包括多数形式，除非上下文清楚地表示其他含义。还应当理解，本说明书一个或多个实施例中使用的术语“和/或”是指并包含一个或多个相关联的列出项目的任何或所有可能组合。Terms used in one or more embodiments of this specification are for the purpose of describing specific embodiments only, and are not intended to limit one or more embodiments of this specification. As used in one or more embodiments of this specification and the appended claims, the singular forms "a", "the", and "the" are also intended to include the plural forms unless the context clearly dictates otherwise. It should also be understood that the term "and/or" used in one or more embodiments of the present specification refers to and includes any or all possible combinations of one or more associated listed items.

应当理解，尽管在本说明书一个或多个实施例中可能采用术语第一、第二等来描述各种信息，但这些信息不应限于这些术语。这些术语仅用来将同一类型的信息彼此区分开。例如，在不脱离本说明书一个或多个实施例范围的情况下，第一也可以被称为第二，类似地，第二也可以被称为第一。取决于语境，如在此所使用的词语“如果”可以被解释成为“在……时”或“当……时”或“响应于确定”。It should be understood that although the terms first, second, etc. may be used to describe various information in one or more embodiments of the present specification, the information should not be limited to these terms. These terms are only used to distinguish information of the same type from one another. For example, the first may also be referred to as the second, and similarly, the second may also be referred to as the first without departing from the scope of one or more embodiments of the present specification. Depending on the context, the word "if" as used herein may be interpreted as "at" or "when" or "in response to a determination."

首先，对本说明书一个或多个实施例涉及的名词术语进行解释。First, terms and terms involved in one or more embodiments of this specification are explained.

扩散模型：Diffusion Model，一种用于生成多种类高质量图片的图像生成方法。Diffusion Model: Diffusion Model, an image generation method for generating a variety of high-quality images.

区间划分：将扩散模型的马尔可夫链中的时间步划分成多个区间。Interval division: Divide the time steps in the Markov chain of the diffusion model into intervals.

去噪扩散概率模型：DDPM，Denoising Diffusion Probabilistic Model。Denoising Diffusion Probabilistic Model: DDPM, Denoising Diffusion Probabilistic Model.

在本说明书中，提供了一种扩散模型处理方法。本说明书一个或者多个实施例同时涉及一种扩散模型处理装置，一种图片处理方法，一种图片处理装置，一种计算设备，一种计算机可读存储介质以及一种计算机程序，在下面的实施例中逐一进行详细说明。In this specification, a diffusion model processing method is provided. One or more embodiments of this specification also relate to a diffusion model processing device, an image processing method, an image processing device, a computing device, a computer-readable storage medium, and a computer program. The following The detailed description will be given in detail one by one in the embodiment.

参见图1，图1示出了根据本说明书一个实施例提供的一种去噪扩散概率模型的结构示意图。Referring to FIG. 1 , FIG. 1 shows a schematic structural diagram of a denoising diffusion probability model provided according to an embodiment of the present specification.

实际应用中，去噪扩散概率模型的基本思路是构造一个马尔可夫链，通过一个网络统一对所有时间步的转移概率进行建模，其中，时间步条件作为网络的条件输入。In practical applications, the basic idea of the denoising diffusion probability model is to construct a Markov chain, and uniformly model the transition probabilities of all time steps through a network, where the time step conditions are used as the conditional input of the network.

图1中的x可以理解为图片，t可以理解为时间步，那么x_t即可以理解为第t个时间步的图片。若将该去噪扩散概率模型应用于图片去噪场景中，该x_t即可以理解为第t个时间步的加噪图片，x_t-1则可以理解为对第t个时间步的加噪图片去噪后，获得的第t-1个时间步的加噪图片；以此类推，最后的x₀则可以理解为完全去噪后获得的图片。The x in Figure 1 can be understood as a picture, and t can be understood as a time step, then x _t can be understood as the picture of the t-th time step. If the denoising diffusion probability model is applied to the image denoising scene, the x _t can be understood as the noise-added image of the t-th time step, and x _t-1 can be understood as the noise-added image of the t-th time step After the image is denoised, the noise-added image of the t-1th time step is obtained; by analogy, the final x ₀ can be understood as the image obtained after complete denoising.

参见图2，图2示出了根据本说明书一个实施例提供的一种扩散模型处理方法的具体实现场景示意图。Referring to FIG. 2 , FIG. 2 shows a schematic diagram of a specific implementation scenario of a diffusion model processing method provided according to an embodiment of the present specification.

图2中包括云侧设备202和端侧设备204，其中，云侧设备202可以理解为云服务器，当然，在另一种可实现方案中，云侧设备202也可以替换为物理服务器；该端侧设备204包括但不限于台式电脑、笔记本电脑等；为了便于理解，本说明书实施例中，均以云侧设备202为云服务器、端侧设备204为笔记本电脑为例进行详细介绍。Figure 2 includes a cloud-side device 202 and a terminal-side device 204, wherein the cloud-side device 202 can be understood as a cloud server, of course, in another feasible solution, the cloud-side device 202 can also be replaced by a physical server; The side device 204 includes but is not limited to a desktop computer, a notebook computer, etc.; for ease of understanding, in the embodiments of this specification, the cloud side device 202 is a cloud server, and the terminal side device 204 is a notebook computer as an example for detailed introduction.

对本说明书实施例提供的扩散模型处理方法应用于图片去噪场景进行详细说明。The application of the diffusion model processing method provided in the embodiment of this specification to the image denoising scene will be described in detail.

具体实施时，在云侧设备202进行扩散模型训练，其中，该扩散模型可以理解为减少时间步条件的扩散模型(TSDM)。During specific implementation, the diffusion model training is performed on the cloud-side device 202, wherein the diffusion model can be understood as a diffusion model (TSDM) with reduced time step conditions.

如图2所示，图2的扩散模型的具体结构，可以参见图1中的去噪扩散概率模型的结构。As shown in FIG. 2 , for the specific structure of the diffusion model in FIG. 2 , please refer to the structure of the denoising diffusion probability model in FIG. 1 .

图1中的去噪扩散概率模型中，需要用一个网络∈_θ(x_t,t)，对T个转移概率进行建模，其中，t是时间步条件，用来提示网络当前是在对哪一个转移概率进行建模。In the denoising diffusion probability model in Figure 1, it is necessary to use a network ∈ _θ (x _t ,t) to model T transition probabilities, where t is the time step condition, which is used to indicate where the network is currently A transition probability is modeled.

实际应用中，去噪扩散概率模型需要足够大的扩散步数(即t)，将图片信号完全破坏(即加噪)，所以单个网络需要建模的转移概率数量通常非常多，这就会导致网络(即去噪扩散概率模型)的训练负担很大。若为了减轻网络的训练负担，可以通过减少扩散步数实现，但是同时也会导致x_T的信噪比不够低，使得最后的采样质量大幅下降。In practical applications, the denoising diffusion probability model requires a sufficiently large number of diffusion steps (ie t) to completely destroy the image signal (ie add noise), so the number of transition probabilities that need to be modeled by a single network is usually very large, which will lead to The training burden of the network (i.e., the denoising diffusion probability model) is large. In order to reduce the training burden of the network, it can be achieved by reducing the number of diffusion steps, but at the same time, the signal-to-noise ratio of x _T is not low enough, so that the final sampling quality is greatly reduced.

图2中本说明书实施例提供的扩散模型，通过将扩散步数进行区间划分，将多个扩散步数划分为一个区间，通过在维持扩散步数不变的情况下，仅减少时间步条件(即t)的数量，训练获得该扩散模型，来减少网络需要建模的转移概率的数量，从而减轻网络的负担。The diffusion model provided by the embodiment of this specification in Fig. 2 divides the number of diffusion steps into an interval by dividing the number of diffusion steps into intervals, and only reduces the time step condition ( That is, the number of t), the diffusion model is obtained by training to reduce the number of transition probabilities that the network needs to model, thereby reducing the burden on the network.

当端侧设备204需要使用该扩散模型时，则可以调用云侧设备202训练后获得的扩散模型进行功能使用；另外，在端侧设备204的计算资源以及计算能力足够的情况下，也可以将在该云侧设备202中训练后的扩散模型，部署在端侧设备204。具体根据实际应用进行部署实现，在此不作任何限定。When the end-side device 204 needs to use the diffusion model, it can call the diffusion model obtained after the training of the cloud-side device 202 for functional use; The diffusion model trained in the cloud-side device 202 is deployed in the device-side device 204 . Deployment and implementation are performed according to actual applications, and no limitation is made here.

本说明书实施例提供的该扩散模型处理方法通过将时间步集合划分为时间步区间，后续进行扩散模型训练时，使得该扩散模型在一个时间步区间内共享时间步条件，即第一时间步均共享其对应时间步区间内的目标时间步，减少了时间步条件，极大的减轻了训练负担，提升了模型训练性能。The diffusion model processing method provided by the embodiment of this specification divides the set of time steps into time step intervals, and when the diffusion model is trained later, the diffusion model can share the time step condition in a time step interval, that is, the average time step of the first time step is Sharing the target time step in its corresponding time step interval reduces the time step conditions, greatly reduces the training burden, and improves the model training performance.

参见图3，图3示出了根据本说明书一个实施例提供的一种扩散模型处理方法的流程图，具体包括以下步骤。Referring to FIG. 3 , FIG. 3 shows a flow chart of a method for processing a diffusion model according to an embodiment of the present specification, which specifically includes the following steps.

步骤302：确定扩散模型的时间步集合、以及所述时间步集合对应的时间步区间。Step 302: Determine a time step set of the diffusion model and a time step interval corresponding to the time step set.

其中，扩散模型可以理解为上述实施例的减少时间步条件的扩散模型(TSDM)；时间步集合中包括多个时间步，单个时间步可以理解为上述图1中的T、T-1等，也可以理解为上述实施例中描述的扩散步数，例如扩散步数为10的情况下，则可以理解为该时间步集合中包括10个时间步。Wherein, the diffusion model can be understood as the diffusion model (TSDM) that reduces the time step condition of the above-mentioned embodiment; the time step set includes multiple time steps, and a single time step can be understood as T, T-1, etc. in the above-mentioned Fig. 1 , It can also be understood as the number of diffusion steps described in the above embodiments. For example, if the number of diffusion steps is 10, it can be understood that the set of time steps includes 10 time steps.

为了减少后续扩散模型训练时的时间步条件的数量，可以通过对时间步集合中的时间步进行区间划分实现。具体实现方式如下所述：In order to reduce the number of time step conditions during the subsequent diffusion model training, it can be realized by dividing the time steps in the time step set into intervals. The specific implementation is as follows:

所述确定扩散模型的时间步集合、以及所述时间步集合对应的时间步区间，包括：The time step set for determining the diffusion model and the time step interval corresponding to the time step set include:

确定扩散模型的时间步集合，并根据预设划分条件对所述时间步集合中的时间步进行区间划分，获得所述时间步集合对应的时间步区间。A set of time steps of the diffusion model is determined, and time steps in the set of time steps are divided into intervals according to preset division conditions to obtain intervals of time steps corresponding to the set of time steps.

其中，预设划分条件可以根据实际应用进行设置，本说明书实施例对此不作任何限定，例如预设划分条件可以理解为将每50个时间步划分为一个时间步区间。Wherein, the preset division condition can be set according to the actual application, which is not limited in the embodiment of this specification. For example, the preset division condition can be understood as dividing every 50 time steps into a time step interval.

举例说明，例如该扩散模型的时间步集合中包括1000个时间步，预设划分条件为将每50个时间步划分为一个时间步区间。For example, for example, the time step set of the diffusion model includes 1000 time steps, and the preset division condition is to divide every 50 time steps into a time step interval.

那么，确定扩散模型的时间步集合，并根据预设划分条件对所述时间步集合中的时间步进行区间划分，获得所述时间步集合对应的时间步区间；则可以理解为，确定扩散模型的时间步集合【1000个时间步】，根据预设划分条件：将每50个时间步划分为一个时间步区间；进行区间划分，获得该时间步集合对应的时间步区间为：20个时间步区间。其中，在对时间步集合中的时间步进行区间划分时，需要按照时间序列进行划分，因此获得的该时间步集合对应的、每个时间步区间中的时间步也是按照时间序列排序的。例如第一个时间步区间中包含的为0-50个按时间序列排列的时间步。Then, determine the time step set of the diffusion model, and divide the time steps in the time step set according to the preset division conditions to obtain the time step interval corresponding to the time step set; it can be understood as determining the diffusion model The time step set [1000 time steps], according to the preset division conditions: divide every 50 time steps into a time step interval; perform interval division, and obtain the time step interval corresponding to the time step set: 20 time steps interval. Wherein, when dividing the time steps in the time step set into intervals, it needs to be divided according to the time sequence, so the time steps in each time step interval corresponding to the obtained time step set are also sorted according to the time sequence. For example, the first time step interval contains 0-50 time steps arranged in time series.

步骤304：从所述时间步集合中确定第一时间步，并根据所述时间步区间确定所述第一时间步对应的目标时间步，其中，所述第一时间步为所述时间步集合中的任一时间步。Step 304: Determine a first time step from the set of time steps, and determine a target time step corresponding to the first time step according to the time step interval, wherein the first time step is the set of time steps any time step in .

具体的，第一时间步可以理解为时间步集合中的任意一个时间步，例如第一时间步、第二时间步、第三时间步或者第n时间步等。Specifically, the first time step may be understood as any time step in the time step set, such as the first time step, the second time step, the third time step, or the nth time step.

而从时间步集合中确定第一时间步之后，根据该时间步集合对应的时间步区间，则可以确定该第一时间步对应的目标时间步。After the first time step is determined from the time step set, the target time step corresponding to the first time step can be determined according to the time step interval corresponding to the time step set.

实际应用中，由于该时间步集合中的时间步，根据预设划分条件进行了区间划分，那么该时间步集合中的每个时间步均具有其对应的时间步区间。那么该时间步集合中的任意一个时间步均可以从与其对应的时间步区间中，确定与其对应的目标时间步。In practical applications, since the time steps in the time step set are divided into intervals according to preset division conditions, each time step in the time step set has its corresponding time step interval. Then any time step in the time step set can determine the corresponding target time step from the corresponding time step interval.

具体实施时，可以将每个时间步对应的时间步区间的区间端点对应的时间步，作为每个时间步对应的目标时间步，那么后续在进行扩散模型训练时，均可以将每个时间步对应的时间步区间的区间端点对应的目标时间步，作为时间步条件，以此减少扩散模型训练时的时间步条件，提升扩散模型的训练效率以及训练效果。具体实现方式如下所述：In specific implementation, the time step corresponding to the interval endpoint of the time step interval corresponding to each time step can be used as the target time step corresponding to each time step, then when performing diffusion model training later, each time step can be used as The target time step corresponding to the interval endpoint of the corresponding time step interval is used as the time step condition, so as to reduce the time step condition during the diffusion model training, and improve the training efficiency and training effect of the diffusion model. The specific implementation is as follows:

所述根据所述时间步区间确定所述第一时间步对应的目标时间步，包括：The determining the target time step corresponding to the first time step according to the time step interval includes:

确定所述时间步区间的区间端点，根据所述区间端点确定所述第一时间步对应的目标时间步。An interval endpoint of the time step interval is determined, and a target time step corresponding to the first time step is determined according to the interval endpoint.

沿用上例，若第一时间步为第20个时间步，该第一时间步对应的时间步区间为0-50，那么根据该时间步区间的区间端点确定第一时间步对应的目标时间步为：0个时间步或者50个时间步。Following the above example, if the first time step is the 20th time step, and the time step interval corresponding to the first time step is 0-50, then determine the target time step corresponding to the first time step according to the interval endpoint of the time step interval For: 0 time steps or 50 time steps.

由于每个时间步区间包括两个区间端点，因此根据时间步区间的区间端点确定第一时间步对应的目标时间步可以包括两种情况，一种是在时间步区间的区间端点为区间左端点的情况下，根据时间步区间的区间端点确定第一时间步对应的目标时间步，则为时间步区间的区间左端点；同理，另一种是在时间步区间的区间端点为区间右端点的情况下，根据时间步区间的区间端点确定第一时间步对应的目标时间步，则为时间步区间的区间右端点。Since each time step interval includes two interval endpoints, determining the target time step corresponding to the first time step according to the interval endpoints of the time step interval can include two situations, one is when the interval endpoint of the time step interval is the left endpoint of the interval In the case of , the target time step corresponding to the first time step is determined according to the interval endpoint of the time step interval, which is the left endpoint of the interval of the time step interval; similarly, the other is that the interval endpoint of the time step interval is the right endpoint of the interval In the case of , the target time step corresponding to the first time step is determined according to the interval endpoint of the time step interval, and it is the right endpoint of the interval of the time step interval.

首先，以时间步区间的区间端点为区间左端点为例，对根据区间端点确定第一时间步对应的目标时间步进行详细说明，具体实现方式如下所述：First, taking the interval endpoint of the time step interval as the left endpoint of the interval as an example, the determination of the target time step corresponding to the first time step according to the interval endpoint is described in detail. The specific implementation method is as follows:

所述确定所述时间步区间的区间端点，根据所述区间端点确定所述第一时间步对应的目标时间步，包括：The determining the interval endpoint of the time step interval, and determining the target time step corresponding to the first time step according to the interval endpoint includes:

确定所述时间步区间的区间左端点，并将所述区间左端点确定为所述第一时间步对应的目标时间步，其中，所述时间步区间的区间右端点为下一时间步区间包括的左端点。Determine the interval left endpoint of the time step interval, and determine the left endpoint of the interval as the target time step corresponding to the first time step, wherein the interval right endpoint of the time step interval is the next time step interval including the left endpoint of .

以第一时间步为第53个时间步，该第一时间步对应的时间步区间为50-100为例进行说明。The first time step is the 53rd time step, and the time step interval corresponding to the first time step is 50-100 as an example for illustration.

该时间步区间50-100的区间左端点为50，则可以将该时间步区间中的第50个时间步作为第一时间步对应的目标时间步。即只要第一时间步属于该时间步区间50-100，那么第一时间步对应的目标时间步均为该时间步区间的区间左端点：第50个时间步。The left endpoint of the time step interval 50-100 is 50, then the 50th time step in the time step interval can be used as the target time step corresponding to the first time step. That is, as long as the first time step belongs to the time step interval 50-100, then the target time step corresponding to the first time step is the left endpoint of the interval of the time step interval: the 50th time step.

沿用上例，若时间步集合中包括1000个时间步，划分为20个时间步区间的情况下，包括50-100时间步的时间步区间则可以用[50，100)表示，即该时间步区间的区间右端点为下一时间步区间包括的左端点，即包括100-150的时间步区间则可以用[100，150)表示。Using the above example, if the time step set includes 1000 time steps and is divided into 20 time step intervals, the time step interval including 50-100 time steps can be represented by [50, 100), that is, the time step The right endpoint of the interval is the left endpoint included in the next time step interval, that is, the time step interval including 100-150 can be represented by [100, 150).

本说明书实施例提供的扩散模型处理方法，可以将每个时间步对应的时间步区间的区间左端点对应的时间步，作为每个时间步对应的目标时间步，那么后续在进行扩散模型训练时，均可以将每个时间步对应的时间步区间的区间左端点对应的目标时间步，作为时间步条件，以此减少扩散模型训练时的时间步条件，提升扩散模型的训练效率以及训练效果。The diffusion model processing method provided by the embodiment of this specification can use the time step corresponding to the left end point of the time step interval corresponding to each time step as the target time step corresponding to each time step, then when performing diffusion model training later , the target time step corresponding to the left endpoint of the time step interval corresponding to each time step can be used as the time step condition, so as to reduce the time step condition during the diffusion model training and improve the training efficiency and training effect of the diffusion model.

其次，以时间步区间的区间端点为区间右端点为例，对根据区间端点确定第一时间步对应的目标时间步进行详细说明，具体实现方式如下所述：Secondly, taking the interval endpoint of the time step interval as the right endpoint of the interval as an example, the determination of the target time step corresponding to the first time step according to the interval endpoint is described in detail. The specific implementation method is as follows:

所述确定所述时间步区间的区间端点，根据所述区间端点确定所述第一时间步对应的时间步，包括：The determining the interval endpoint of the time step interval, and determining the time step corresponding to the first time step according to the interval endpoint includes:

确定所述时间步区间的区间右端点，并将所述区间右端点确定为所述第一时间步对应的目标时间步，其中，所述时间步区间的区间左端点为上一时间步区间包括的右端点。Determine the right endpoint of the interval of the time step interval, and determine the right endpoint of the interval as the target time step corresponding to the first time step, wherein the left endpoint of the interval of the time step interval is the last time step interval including the right endpoint of .

沿用上例，仍以第一时间步为第53个时间步，该第一时间步对应的时间步区间为50-100为例进行说明。Following the above example, the first time step is still the 53rd time step, and the time step interval corresponding to the first time step is 50-100 as an example for illustration.

该时间步区间50-100的区间右端点为100，则可以将该时间步区间中的第100个时间步作为第一时间步对应的目标时间步。即只要第一时间步属于该时间步区间50-100，那么第一时间步对应的目标时间步均为该时间步区间的区间右端点：第100个时间步。The right endpoint of the time step interval 50-100 is 100, then the 100th time step in the time step interval can be used as the target time step corresponding to the first time step. That is, as long as the first time step belongs to the time step interval 50-100, then the target time step corresponding to the first time step is the right endpoint of the interval of the time step interval: the 100th time step.

仍沿用上例，若时间步集合中包括1000个时间步，划分为20个时间步区间的情况下，包括50-100时间步的时间步区间则可以用(50，100]表示，即该时间步区间的区间左端点为上一时间步区间包括的右端点，即包括100-150的时间步区间则可以用[50，100)表示。Still using the above example, if the time step set includes 1000 time steps and is divided into 20 time step intervals, the time step interval including 50-100 time steps can be represented by (50, 100], that is, the time The left endpoint of the step interval is the right endpoint included in the previous time step interval, that is, the time step interval including 100-150 can be represented by [50, 100).

本说明书实施例提供的扩散模型处理方法，可以将每个时间步对应的时间步区间的区间右端点对应的时间步，作为每个时间步对应的目标时间步，那么后续在进行扩散模型训练时，均可以将每个时间步对应的时间步区间的区间左端点对应的目标时间步，作为时间步条件，以此减少扩散模型训练时的时间步条件，提升扩散模型的训练效率以及训练效果。The diffusion model processing method provided in the embodiment of this specification can use the time step corresponding to the right end point of the time step interval corresponding to each time step as the target time step corresponding to each time step, then when performing diffusion model training later , the target time step corresponding to the left endpoint of the time step interval corresponding to each time step can be used as the time step condition, so as to reduce the time step condition during the diffusion model training and improve the training efficiency and training effect of the diffusion model.

当然，也并不排除可以根据时间步区间的中间时间步作为第一时间步对应的目标时间步的实现方式。实际应用中，从一个观察出发，可以发现网络输入不同的时间步t，如果时间步t的值比较接近的话，对于相同的输入，网络预测也是非常接近的(这个是由网络连续性导致的)，所以减少时间步t可以减轻网络负担；而同一个时间步区间内的时间步的值均是比较接近的，因此选择每个时间步区间内的任意一个时间步均可以作为第一时间步对应的目标时间步。Of course, it does not rule out that an intermediate time step in the time step interval may be used as the target time step corresponding to the first time step. In practical applications, starting from an observation, it can be found that the network inputs different time steps t. If the value of the time step t is relatively close, the network prediction is also very close to the same input (this is caused by the continuity of the network) , so reducing the time step t can reduce the network burden; and the values of the time steps in the same time step interval are relatively close, so choosing any time step in each time step interval can be used as the first time step corresponding target time step.

步骤306：将所述第一时间步对应的加噪图片以及所述目标时间步，输入扩散模型，获得所述加噪图片对应的预测噪声。Step 306: Input the noise-added picture corresponding to the first time step and the target time step into a diffusion model to obtain the predicted noise corresponding to the noise-added picture.

其中，第一时间步对应的加噪图片可以理解为，扩散模型的前向过程中，在第一时间步进行加噪的加噪图片。Wherein, the noise-added picture corresponding to the first time step can be understood as a noise-added picture that is noise-added at the first time step in the forward process of the diffusion model.

具体的，扩散模型分为两个阶段，包括前向过程和反向过程，前向过程是构造一个马尔可夫链，将图片信号逐步加噪变成噪声信号，即带有噪声的图片。具体来说，首先构造一个离散的马尔可夫链{x₀，x₁，..，x_N}，那么前向过程的转移概率可以表示为公式1：Specifically, the diffusion model is divided into two stages, including the forward process and the reverse process. The forward process is to construct a Markov chain, and gradually add noise to the picture signal to become a noise signal, that is, a picture with noise. Specifically, first construct a discrete Markov chain {x ₀ , x ₁ , .., x _N }, then the transition probability of the forward process can be expressed as Equation 1:

其中，

{β₀，β₁，...，β_N}是一个预先设计的噪声序列。in,

{β ₀ , β ₁ , ..., β _N } is a predesigned noise sequence.

根据公式1可知，前向到T步的时候，其分布如公式2所示：According to formula 1, it can be seen that when it is forward to step T, its distribution is shown in formula 2:

该分布和标准正态分布十分接近，那么反向生成过程就可以直接从一个高斯分布采样。This distribution is very close to the standard normal distribution, so the inverse generative process can sample directly from a Gaussian distribution.

那么，根据上述公式1以及公式2即可获得扩散模型的前向过程中，每个时间步对应的加噪图片；具体实现方式如下所述：Then, according to the above formula 1 and formula 2, the noise-added picture corresponding to each time step in the forward process of the diffusion model can be obtained; the specific implementation method is as follows:

所述将所述第一时间步对应的加噪图片以及所述目标时间步，输入扩散模型，获得所述加噪图片对应的预测噪声之前，还包括：Before inputting the noise-added picture corresponding to the first time step and the target time step into the diffusion model, and obtaining the predicted noise corresponding to the noise-added picture, it also includes:

确定初始图片以及、所述第一时间步对应的目标噪声；determining the initial picture and the target noise corresponding to the first time step;

根据所述初始图片以及所述目标噪声，确定所述第一时间步对应的加噪图片以及、所述加噪图片对应的目标噪声。Determine a noise-added picture corresponding to the first time step and a target noise corresponding to the noise-added picture according to the initial picture and the target noise.

其中，初始图片可以理解为任意大小、任意格式的未加噪图片，或者是第一时间步的前一个时间步输出的已经加噪的图片。Among them, the initial picture can be understood as an unnoised picture of any size and any format, or a noisy picture output by the previous time step of the first time step.

具体的，在第一时间步为原始图片的情况下，确定初始图片以及该第一时间步对应的目标噪声，将该目标噪声添加到该初始图片上，即可生成该第一时间步对应的加噪图片、以及该加噪图片对应的目标噪声。而在第一时间步非原始图片的情况下，该初始图片则可以理解为该第一时间步的上一时间步输出的加噪图片，那么确定初始图片以及该第一时间步对应的目标噪声，将该目标噪声添加到该初始图片上，即可生成该第一时间步对应的加噪图片、以及该加噪图片对应的目标噪声。当然，扩散模型的前向过程对初始图片加噪的过程中，不仅需要根据第一时间步对应的目标噪声进行加噪，还要考虑该第一时间步对应的噪声强度等，具体的图片加噪过程本说明书实施例不作任何限定。Specifically, when the first time step is the original picture, determine the initial picture and the target noise corresponding to the first time step, add the target noise to the initial picture, and then generate the noise corresponding to the first time step The noise-added picture and the target noise corresponding to the noise-added picture. In the case that the first time step is not the original picture, the initial picture can be understood as the noised picture output by the previous time step of the first time step, then determine the initial picture and the target noise corresponding to the first time step , adding the target noise to the initial picture, the noise-added picture corresponding to the first time step and the target noise corresponding to the noise-added picture can be generated. Of course, in the process of adding noise to the initial image in the forward process of the diffusion model, it is necessary not only to add noise according to the target noise corresponding to the first time step, but also to consider the noise intensity corresponding to the first time step, etc. Noise process The embodiment of this specification does not make any limitation.

具体实施时，将第一时间步对应的加噪图片以及该第一时间步对应的目标时间步，输入扩散模型之后，即可获得该加噪图片对应的预测噪声。During specific implementation, after inputting the noise-added picture corresponding to the first time step and the target time step corresponding to the first time step into the diffusion model, the predicted noise corresponding to the noise-added picture can be obtained.

步骤308：根据所述加噪图片对应的目标噪声以及所述预测噪声，处理所述扩散模型。Step 308: Process the diffusion model according to the target noise corresponding to the noise-added picture and the predicted noise.

具体的，所述根据所述加噪图片对应的目标噪声以及所述预测噪声，处理所述扩散模型，包括：Specifically, processing the diffusion model according to the target noise corresponding to the noise-added picture and the predicted noise includes:

根据所述加噪图片对应的目标噪声以及所述预测噪声计算噪声损失函数，并根据所述噪声损失函数调整所述扩散模型的网络参数，并在满足预设训练结束条件的情况下，获得所述扩散模型。Calculate the noise loss function according to the target noise corresponding to the noise-added picture and the predicted noise, and adjust the network parameters of the diffusion model according to the noise loss function, and obtain the obtained results when the preset training end condition is met. The diffusion model described above.

实际应用中，在确定加噪图片对应的目标噪声，以及扩散模型输出的预测噪声之后，则可以根据该目标噪声以及预测噪声的差值，计算噪声损失函数，后续即可根据该噪声损失函数调整该扩散模型的网络参数，实现对该扩散模型的训练，并在满足预设训练结束条件的情况下，获得训练后的扩散模型。其中，该预设训练结束条件可以理解为训练次数满足预设次数阈值(如1万次或者2万次)；或者扩散模型的模型性能(如精确度、准确度等)满足预设性能阈值等。In practical applications, after determining the target noise corresponding to the noise-added image and the predicted noise output by the diffusion model, the noise loss function can be calculated according to the difference between the target noise and the predicted noise, and then adjusted according to the noise loss function The network parameters of the diffusion model realize the training of the diffusion model, and obtain the trained diffusion model under the condition that the preset training end condition is satisfied. Wherein, the preset training end condition can be understood as that the number of training times meets the preset number of times threshold (such as 10,000 times or 20,000 times); or the model performance (such as precision, accuracy, etc.) of the diffusion model meets the preset performance threshold, etc. .

具体实施时，该噪声损失函数的计算过程可以参见如下公式3：In specific implementation, the calculation process of the noise loss function can be referred to the following formula 3:

其中，公式3表示训练的上述噪声损失函数，在训练过程中，首先对t采样，t服从一个从0到T的均匀分布，相当于从0到T的整数中随便选一个作为t。接下来对x₀采样，相当于从数据集里采一个真实图片，最后从标准正态分布中采样得到一个噪声∈。采样完毕后，将噪声∈加在真实图片x₀上得到加噪图片

加噪的强度由t决定，具体的参数α_t是预先设计好的。将加噪后的图片x_t和/>

输入进扩散模型，将其输出(即预测噪声)和真实的噪声∈计算损失(训练的目标是让神经网络可以从一个加噪图片x_t和加噪的强度(由t决定)中预测加在图片上的噪声∈)。Among them, formula 3 represents the above-mentioned noise loss function for training. During the training process, t is first sampled, and t obeys a uniform distribution from 0 to T, which is equivalent to randomly selecting an integer from 0 to T as t. Next, sampling x ₀ is equivalent to sampling a real picture from the data set, and finally sampling a noise ∈ from the standard normal distribution. After sampling, add the noise ∈ to the real picture x ₀ to get the noised picture

The strength of adding noise is determined by t, and the specific parameter α _t is pre-designed. The noised image x _t and />

Input into the diffusion model, its output (ie predicted noise) and the real noise ∈ calculation loss (the goal of training is to allow the neural network to predict the added value from a noised image x _t and the intensity of the noise (determined by t) noise ∈ on the image).

其中，

in,

T是一个时间步区间序列T＝{t₀,t₁,…,t_n}，里面的元素为时间步区间的区间左端点。例如把1000个t划分为20个区间的话，T＝{0,50,100,…,1000}。f_T(t)是一个函数，输入是当前的时间步t，输出是t所属的时间步区间的左端点。例如t＝53对应的时间步区间是50-100，所以f_T(53)＝50。T is a sequence of time step intervals T={t ₀ ,t ₁ ,…,t _n }, and the elements inside are the left endpoints of the time step intervals. For example, if 1000 ts are divided into 20 intervals, T={0,50,100,...,1000}. f _T (t) is a function, the input is the current time step t, and the output is the left endpoint of the time step interval to which t belongs. For example, the time step interval corresponding to t=53 is 50-100, so f _T (53)=50.

综合所述，该扩散模型反向过程的单步转移概率可以参见如下公式4，即该扩散模型的反向去噪过程：In summary, the single-step transition probability of the reverse process of the diffusion model can be referred to the following formula 4, which is the reverse denoising process of the diffusion model:

其中，公式3表示扩散模型的反向过程的单步转移概率，即给定x_t的条件下，x_t-1服从一个均值为

方差为/>

的高斯分布，而该方差是预先设置的。均值/>

中β_t是一系列固定的参数，指示了前向过程中不同的t加噪的强度，α_t是用β_t算出来的参数：/>

是神经网络，输入是加噪的图片x_t，和t对应的左端点f_T(t)，输出即是加在x_t上的噪声。Among them, formula 3 represents the single-step transition probability of the reverse process of the diffusion model, that is, under the condition of given x _t , x _t-1 obeys a mean value

Variance is />

Gaussian distribution of , and the variance is preset. Mean />

Among them, β _t is a series of fixed parameters, indicating the strength of different t noise addition in the forward process, and α _t is a parameter calculated by β _t : />

is a neural network, the input is the noise-added image x _t , and the left endpoint f _T (t) corresponding to t, and the output is the noise added to x _t .

本说明书实施例提供的该扩散模型处理方法，通过将时间步集合划分为时间步区间，后续进行扩散模型训练时，使得该扩散模型在一个时间步区间内共享时间步条件，即第一时间步均共享其对应时间步区间内的目标时间步，减少了时间步条件，极大的减轻了训练负担，提升了模型训练性能。The diffusion model processing method provided in the embodiment of this specification divides the time step set into time step intervals, and when the diffusion model is trained subsequently, the diffusion model can share the time step condition in a time step interval, that is, the first time step Both share the target time step in the corresponding time step interval, which reduces the time step conditions, greatly reduces the training burden, and improves the model training performance.

此外，在训练获得扩散模型之后，即可将该扩散模型进行实际应用。具体实现方式如下所述：In addition, after the diffusion model is obtained through training, the diffusion model can be used in practice. The specific implementation is as follows:

所述根据所述加噪图片对应的目标噪声以及所述预测噪声，处理所述扩散模型之后，还包括：According to the target noise corresponding to the noise-added picture and the predicted noise, after processing the diffusion model, further include:

根据所述目标加噪图片以及所述目标加噪图片对应的预测噪声，确定去噪后的目标图片。A denoised target picture is determined according to the target noise-added picture and the predicted noise corresponding to the target noise-added picture.

其中，目标加噪图片可以理解为任意大小、任意格式的加噪图片。Wherein, the target noise-added picture can be understood as a noise-added picture of any size and any format.

在另一种可实现方式中，目标加噪图片可以理解为视频帧，即该扩散模型可以对加噪的视频帧进行去噪，获得清晰准确的视频帧。具体实现方式如下所述：In another practicable manner, the target noise-added picture can be understood as a video frame, that is, the diffusion model can denoise the noise-added video frame to obtain a clear and accurate video frame. The specific implementation is as follows:

所述确定目标加噪图片，包括：The noise-added picture of the determined target includes:

确定加噪的视频帧集合，并将所述视频帧集合中的任一视频帧，确定为目标加噪图片。A set of noised video frames is determined, and any video frame in the set of video frames is determined as a target noised picture.

其中，加噪的视频帧集合中包括多个添加了各种噪声的、任意类型的视频帧。Wherein, the noise-added video frame set includes multiple video frames of any type added with various noises.

具体的，确定加噪的视频帧集合后，可以将该视频帧集合中的任意一个视频帧作为目标加噪图片，进行后续的去噪处理。Specifically, after the noise-added video frame set is determined, any video frame in the video frame set may be used as a target noise-added picture for subsequent denoising processing.

而在确定目标加噪图片之后，则可以将该目标加噪图片直接输入、通过上述扩散模型处理方法训练获得的扩散模型中，该扩散模型即可输出该目标加噪图片对应的预测噪声；然后根据该目标加噪图片对应的预测噪声去除该目标加噪图片中的噪声，即可准确获得去噪后的目标图片。After the target noise-added picture is determined, the target noise-added picture can be directly input into the diffusion model obtained through the above-mentioned diffusion model processing method training, and the diffusion model can output the corresponding prediction noise of the target noise-added picture; then By removing the noise in the target noise-added picture according to the predicted noise corresponding to the target noise-added picture, the denoised target picture can be accurately obtained.

另一可实现实施例中，本说明书提供的一种扩散模型处理方法还可以应用于文本用AI(Artificial Intelligence，人工智能)生成图像领域；即在扩散模型为图片生成模型的基础上，再将文本作为条件进行AI图像生成；具体实施时，文本条件可以为采用预训练好的编码器，对文本条件进行编码后，再将该编码通过自注意力机制结合到扩散模型上，在该扩散模型进行图片生成的基础上，结合该编码生成受该文本条件规范的AI图像。In another realizable embodiment, a diffusion model processing method provided in this specification can also be applied to the field of image generation by AI (Artificial Intelligence, artificial intelligence) for text; that is, on the basis that the diffusion model is a picture generation model, the Text is used as a condition for AI image generation; in specific implementation, the text condition can use a pre-trained encoder to encode the text condition, and then combine the encoding with the diffusion model through the self-attention mechanism. In the diffusion model On the basis of image generation, the code is combined to generate an AI image regulated by the text condition.

例如，该文本条件为：生成一个泰迪熊在时代广场滑滑板的AI图像，那么将该文本输入AI图像生成模型(即文本编码器+扩散模型)，该AI图像生成模型的文本编码器对该文本进行编码，再将编码通过自注意力知己结合到扩散模型上进行AI图像生成，最终该AI图像生成模型即可输出一个泰迪熊在时代广场滑滑板的AI图像。For example, the text condition is: generate an AI image of a teddy bear skateboarding in Times Square, then input the text into the AI image generation model (ie text encoder + diffusion model), and the text encoder of the AI image generation model will The text is encoded, and then the encoding is combined with the diffusion model through self-attention to generate AI images. Finally, the AI image generation model can output an AI image of a teddy bear skateboarding in Times Square.

实际应用中，此种方式不仅可以生成AI图像也可以生成普通的二维图像等，具体根据实际应用进行条件设置，本说明书对此不作任何限定。In practical applications, this method can generate not only AI images but also ordinary two-dimensional images, etc. The specific conditions are set according to actual applications, and this specification does not make any restrictions on this.

参见图4，图4示出了本说明书一个实施例提供的一种图片处理方法的流程图，具体包括以下步骤。Referring to FIG. 4 , FIG. 4 shows a flowchart of an image processing method provided by an embodiment of this specification, which specifically includes the following steps.

步骤402：确定目标加噪图片，将所述目标加噪图片输入扩散模型，获得所述目标加噪图片对应的预测噪声。Step 402: Determine the target noise-added picture, input the target noise-added picture into the diffusion model, and obtain the predicted noise corresponding to the target noise-added picture.

步骤404：根据所述目标加噪图片以及所述目标加噪图片对应的预测噪声，确定去噪后的目标图片，Step 404: According to the target noise-added picture and the predicted noise corresponding to the target noise-added picture, determine the denoised target picture,

具体的，该图片处理方法的具体实现步骤，可以参见上述实施例的扩散模型处理方法的详细介绍，本说明书实施例对此不作赘述。Specifically, for the specific implementation steps of the image processing method, refer to the detailed introduction of the diffusion model processing method in the foregoing embodiment, which will not be described in detail in the embodiment of this specification.

本说明书实施例提供的图片处理方法，通过减少时间步条件训练的、高性能的扩散模型，即可快速且准确的进行图片去噪，获得去噪后的目标图片，极大的提升图片去噪性能。The image processing method provided by the embodiment of this specification can quickly and accurately denoise the image by reducing the time step condition training and high-performance diffusion model, obtain the denoised target image, and greatly improve the image denoising performance.

与上述方法实施例相对应，本说明书还提供了扩散模型处理装置实施例，图5示出了本说明书一个实施例提供的一种扩散模型处理装置的结构示意图。如图5所示，该装置包括：Corresponding to the foregoing method embodiments, this specification also provides an embodiment of a diffusion model processing device. FIG. 5 shows a schematic structural diagram of a diffusion model processing device provided by an embodiment of this specification. As shown in Figure 5, the device includes:

区间划分模块502，被配置为确定扩散模型的时间步集合、以及所述时间步集合对应的时间步区间；The interval division module 502 is configured to determine a time step set of the diffusion model and a time step interval corresponding to the time step set;

目标时间步确定模块504，被配置为从所述时间步集合中确定第一时间步，并根据所述时间步区间确定所述第一时间步对应的目标时间步，其中，所述第一时间步为所述时间步集合中的任一时间步；The target time step determining module 504 is configured to determine a first time step from the time step set, and determine a target time step corresponding to the first time step according to the time step interval, wherein the first time step Step is any time step in the set of time steps;

第一模型预测模块506，被配置为将所述第一时间步对应的加噪图片以及所述目标时间步，输入扩散模型，获得所述加噪图片对应的预测噪声；The first model prediction module 506 is configured to input the noise-added picture corresponding to the first time step and the target time step into a diffusion model to obtain the predicted noise corresponding to the noise-added picture;

模型处理模块508，被配置为根据所述加噪图片对应的目标噪声以及所述预测噪声，处理所述扩散模型。The model processing module 508 is configured to process the diffusion model according to the target noise corresponding to the noise-added picture and the predicted noise.

可选地，所述装置，还包括：Optionally, the device also includes:

加噪模块，被配置为：A noise adding module configured to:

可选地，所述区间划分模块502，进一步被配置为：Optionally, the interval division module 502 is further configured to:

可选地，所述目标时间步确定模块504，进一步被配置为：Optionally, the target time step determination module 504 is further configured to:

可选地，所述模型处理模块508，进一步被配置为：Optionally, the model processing module 508 is further configured to:

根据所述加噪图片对应的目标噪声以及所述预测噪声计算噪声损失函数，并根据所述噪声损失函数调整所述扩散模型的网络参数，并在满足预设训练结束条件的情况下，获得所述扩散模型。Calculate the noise loss function according to the target noise corresponding to the noise-added picture and the predicted noise, and adjust the network parameters of the diffusion model according to the noise loss function, and obtain the obtained result when the preset training end condition is met. The diffusion model described above.

可选地，所述装置，还包括：Optionally, the device also includes:

去噪模块，被配置为：a denoising module configured to:

可选地，所述去噪模块，进一步被配置为：Optionally, the denoising module is further configured to:

本说明书实施例提供的该扩散模型处理装置，通过将时间步集合划分为时间步区间，后续进行扩散模型训练时，使得该扩散模型在一个时间步区间内共享时间步条件，即第一时间步均共享其对应时间步区间内的目标时间步，减少了时间步条件，极大的减轻了训练负担，提升了模型训练性能。The diffusion model processing device provided in the embodiment of this specification divides the time step set into time step intervals, and when the diffusion model is trained subsequently, the diffusion model can share the time step condition in a time step interval, that is, the first time step Both share the target time step in the corresponding time step interval, which reduces the time step conditions, greatly reduces the training burden, and improves the model training performance.

上述为本实施例的一种扩散模型处理装置的示意性方案。需要说明的是，该扩散模型处理装置的技术方案与上述的扩散模型处理方法的技术方案属于同一构思，扩散模型处理装置的技术方案未详细描述的细节内容，均可以参见上述扩散模型处理方法的技术方案的描述。The above is a schematic solution of a diffusion model processing device in this embodiment. It should be noted that the technical solution of the diffusion model processing device and the above-mentioned technical solution of the diffusion model processing method belong to the same concept, and details of the technical solution of the diffusion model processing device that are not described in detail can be found in the above-mentioned diffusion model processing method. Description of the technical solution.

与上述方法实施例相对应，本说明书还提供了图片处理装置实施例，图6示出了本说明书一个实施例提供的一种图片处理装置的结构示意图。如图6所示，该装置包括：Corresponding to the foregoing method embodiments, this specification also provides an embodiment of a picture processing device. FIG. 6 shows a schematic structural diagram of a picture processing device provided by an embodiment of this specification. As shown in Figure 6, the device includes:

第二模型预测模块602，被配置为确定目标加噪图片，将所述目标加噪图片输入扩散模型，获得所述目标加噪图片对应的预测噪声；The second model prediction module 602 is configured to determine the target noise-added picture, input the target noise-added picture into the diffusion model, and obtain the predicted noise corresponding to the target noise-added picture;

目标图片确定模块604，被配置为根据所述目标加噪图片以及所述目标加噪图片对应的预测噪声，确定去噪后的目标图片，The target picture determining module 604 is configured to determine a denoised target picture according to the target noise-added picture and the predicted noise corresponding to the target noise-added picture,

本说明书实施例提供的图片处理装置，通过减少时间步条件训练的、高性能的扩散模型，即可快速且准确的进行图片去噪，获得去噪后的目标图片，极大的提升图片去噪性能。The image processing device provided in the embodiment of this specification can quickly and accurately perform image denoising by reducing the time step condition training and high-performance diffusion model, and obtain the denoised target image, which greatly improves image denoising performance.

上述为本实施例的一种图片处理装置的示意性方案。需要说明的是，该图片处理装置的技术方案与上述的图片处理方法的技术方案属于同一构思，图片处理装置的技术方案未详细描述的细节内容，均可以参见上述图片处理方法的技术方案的描述。The foregoing is a schematic solution of an image processing apparatus in this embodiment. It should be noted that the technical solution of the image processing device and the above-mentioned technical solution of the image processing method belong to the same concept, and details of the technical solution of the image processing device that are not described in detail can be found in the description of the technical solution of the above-mentioned image processing method .

图7示出了根据本说明书一个实施例提供的一种计算设备700的结构框图。该计算设备700的部件包括但不限于存储器710和处理器720。处理器720与存储器710通过总线730相连接，数据库750用于保存数据。FIG. 7 shows a structural block diagram of a computing device 700 provided according to an embodiment of this specification. Components of the computing device 700 include, but are not limited to, memory 710 and processor 720 . The processor 720 is connected to the memory 710 through the bus 730, and the database 750 is used for saving data.

计算设备700还包括接入设备740，接入设备740使得计算设备700能够经由一个或多个网络760通信。这些网络的示例包括公用交换电话网(PSTN，Public SwitchedTelephone Network)、局域网(LAN，Local Area Network)、广域网(WAN，Wide AreaNetwork)、个域网(PAN，Personal Area Network)或诸如因特网的通信网络的组合。接入设备740可以包括有线或无线的任何类型的网络接口(例如，网络接口卡(NIC，networkinterface controller))中的一个或多个，诸如IEEE802.11无线局域网(WLAN，WirelessLocal Area Network)无线接口、全球微波互联接入(Wi-MAX，WorldwideInteroperability for Microwave Access)接口、以太网接口、通用串行总线(USB，Universal Serial Bus)接口、蜂窝网络接口、蓝牙接口、近场通信(NFC，Near FieldCommunication)接口，等等。Computing device 700 also includes an access device 740 that enables computing device 700 to communicate via one or more networks 760 . Examples of these networks include a public switched telephone network (PSTN, Public Switched Telephone Network), a local area network (LAN, Local Area Network), a wide area network (WAN, Wide Area Network), a personal area network (PAN, Personal Area Network) or a communication network such as the Internet The combination. The access device 740 may include one or more of wired or wireless network interfaces of any type (for example, a network interface card (NIC, network interface controller)), such as an IEEE802.11 wireless local area network (WLAN, Wireless Local Area Network) wireless interface , Worldwide Interoperability for Microwave Access (Wi-MAX, Worldwide Interoperability for Microwave Access) interface, Ethernet interface, Universal Serial Bus (USB, Universal Serial Bus) interface, cellular network interface, Bluetooth interface, near field communication (NFC, Near FieldCommunication ) interface, and so on.

在本说明书的一个实施例中，计算设备700的上述部件以及图7中未示出的其他部件也可以彼此相连接，例如通过总线。应当理解，图7所示的计算设备结构框图仅仅是出于示例的目的，而不是对本说明书范围的限制。本领域技术人员可以根据需要，增添或替换其他部件。In an embodiment of the present specification, the above-mentioned components of the computing device 700 and other components not shown in FIG. 7 may also be connected to each other, for example, through a bus. It should be understood that the structural block diagram of the computing device shown in FIG. 7 is only for the purpose of illustration, rather than limiting the scope of this description. Those skilled in the art can add or replace other components as needed.

计算设备700可以是任何类型的静止或移动计算设备，包括移动计算机或移动计算设备(例如，平板计算机、个人数字助理、膝上型计算机、笔记本计算机、上网本等)、移动电话(例如，智能手机)、可佩戴的计算设备(例如，智能手表、智能眼镜等)或其他类型的移动设备，或者诸如台式计算机或个人计算机(PC，Personal Computer)的静止计算设备。计算设备700还可以是移动式或静止式的服务器。Computing device 700 can be any type of stationary or mobile computing device, including mobile computers or mobile computing devices (e.g., tablet computers, personal digital assistants, laptop computers, notebook computers, netbooks, etc.), mobile telephones (e.g., smartphones), ), wearable computing devices (eg, smart watches, smart glasses, etc.) or other types of mobile devices, or stationary computing devices such as desktop computers or personal computers (PC, Personal Computer). Computing device 700 may also be a mobile or stationary server.

其中，处理器720用于执行如下计算机可执行指令，该计算机可执行指令被处理器执行时实现上述扩散模型处理方法或者图片处理方法的步骤。上述为本实施例的一种计算设备的示意性方案。需要说明的是，该计算设备的技术方案与上述的扩散模型处理方法或者图片处理方法的技术方案属于同一构思，计算设备的技术方案未详细描述的细节内容，均可以参见上述扩散模型处理方法或者图片处理方法的技术方案的描述。Wherein, the processor 720 is configured to execute the following computer-executable instructions. When the computer-executable instructions are executed by the processor, the steps of the above-mentioned diffusion model processing method or image processing method are implemented. The foregoing is a schematic solution of a computing device in this embodiment. It should be noted that the technical solution of the computing device belongs to the same idea as the above-mentioned technical solution of the diffusion model processing method or the image processing method. A description of the technical solution of the image processing method.

本说明书一实施例还提供一种计算机可读存储介质，其存储有计算机可执行指令，该计算机可执行指令被处理器执行时实现上述扩散模型处理方法或者图片处理方法的步骤。An embodiment of the present specification further provides a computer-readable storage medium, which stores computer-executable instructions, and when the computer-executable instructions are executed by a processor, the steps of the above-mentioned diffusion model processing method or image processing method are implemented.

上述为本实施例的一种计算机可读存储介质的示意性方案。需要说明的是，该存储介质的技术方案与上述的扩散模型处理方法或者图片处理方法的技术方案属于同一构思，存储介质的技术方案未详细描述的细节内容，均可以参见上述扩散模型处理方法或者图片处理方法的技术方案的描述。The foregoing is a schematic solution of a computer-readable storage medium in this embodiment. It should be noted that the technical solution of the storage medium belongs to the same idea as the above-mentioned technical solution of the diffusion model processing method or the image processing method, and details not described in detail in the technical solution of the storage medium can be referred to above A description of the technical solution of the image processing method.

本说明书一实施例还提供一种计算机程序，其中，当所述计算机程序在计算机中执行时，令计算机执行上述扩散模型处理方法或者图片处理方法的步骤。An embodiment of the present specification also provides a computer program, wherein when the computer program is executed in a computer, the computer is instructed to execute the steps of the above-mentioned diffusion model processing method or image processing method.

上述为本实施例的一种计算机程序的示意性方案。需要说明的是，该计算机程序的技术方案与上述的扩散模型处理方法或者图片处理方法的技术方案属于同一构思，计算机程序的技术方案未详细描述的细节内容，均可以参见上述扩散模型处理方法或者图片处理方法的技术方案的描述。The foregoing is a schematic solution of a computer program in this embodiment. It should be noted that the technical solution of the computer program belongs to the same idea as the technical solution of the above-mentioned diffusion model processing method or image processing method, and details not described in detail in the technical solution of the computer program can be found in the above-mentioned diffusion model processing method or A description of the technical solution of the image processing method.

上述对本说明书特定实施例进行了描述。其它实施例在所附权利要求书的范围内。在一些情况下，在权利要求书中记载的动作或步骤可以按照不同于实施例中的顺序来执行并且仍然可以实现期望的结果。另外，在附图中描绘的过程不一定要求示出的特定顺序或者连续顺序才能实现期望的结果。在某些实施方式中，多任务处理和并行处理也是可以的或者可能是有利的。The foregoing describes specific embodiments of this specification. Other implementations are within the scope of the following claims. In some cases, the actions or steps recited in the claims can be performed in an order different from that in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. Multitasking and parallel processing are also possible or may be advantageous in certain embodiments.

所述计算机指令包括计算机程序代码，所述计算机程序代码可以为源代码形式、对象代码形式、可执行文件或某些中间形式等。所述计算机可读介质可以包括：能够携带所述计算机程序代码的任何实体或装置、记录介质、U盘、移动硬盘、磁碟、光盘、计算机存储器、只读存储器(ROM，Read-Only Memory)、随机存取存储器(RAM，Random Access Memory)、电载波信号、电信信号以及软件分发介质等。需要说明的是，所述计算机可读介质包含的内容可以根据司法管辖区内立法和专利实践的要求进行适当的增减，例如在某些司法管辖区，根据立法和专利实践，计算机可读介质不包括电载波信号和电信信号。The computer instructions include computer program code, which may be in source code form, object code form, executable file or some intermediate form or the like. The computer-readable medium may include: any entity or device capable of carrying the computer program code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer memory, and a read-only memory (ROM, Read-Only Memory) , Random Access Memory (RAM, Random Access Memory), electrical carrier signal, telecommunication signal, and software distribution medium, etc. It should be noted that the content contained in the computer-readable medium may be appropriately increased or decreased according to the requirements of legislation and patent practice in the jurisdiction. For example, in some jurisdictions, computer-readable media Excludes electrical carrier signals and telecommunication signals.

需要说明的是，对于前述的各方法实施例，为了简便描述，故将其都表述为一系列的动作组合，但是本领域技术人员应该知悉，本说明书实施例并不受所描述的动作顺序的限制，因为依据本说明书实施例，某些步骤可以采用其它顺序或者同时进行。其次，本领域技术人员也应该知悉，说明书中所描述的实施例均属于优选实施例，所涉及的动作和模块并不一定都是本说明书实施例所必须的。It should be noted that, for the sake of simplicity of description, the aforementioned method embodiments are expressed as a series of action combinations, but those skilled in the art should know that the embodiments of this specification are not limited by the described action sequences. Because according to the embodiment of the present specification, certain steps may be performed in other orders or simultaneously. Secondly, those skilled in the art should also know that the embodiments described in the specification are all preferred embodiments, and the actions and modules involved are not necessarily required by the embodiments of the specification.

在上述实施例中，对各个实施例的描述都各有侧重，某个实施例中没有详述的部分，可以参见其它实施例的相关描述。In the above-mentioned embodiments, the descriptions of each embodiment have their own emphases, and for parts not described in detail in a certain embodiment, reference may be made to relevant descriptions of other embodiments.

以上公开的本说明书优选实施例只是用于帮助阐述本说明书。可选实施例并没有详尽叙述所有的细节，也不限制该发明仅为所述的具体实施方式。显然，根据本说明书实施例的内容，可作很多的修改和变化。本说明书选取并具体描述这些实施例，是为了更好地解释本说明书实施例的原理和实际应用，从而使所属技术领域技术人员能很好地理解和利用本说明书。本说明书仅受权利要求书及其全部范围和等效物的限制。The preferred embodiments of the present specification disclosed above are only for helping to explain the present specification. Alternative embodiments are not exhaustive in all detail, nor are the inventions limited to specific implementations described. Obviously, many modifications and changes can be made according to the contents of the embodiments of this specification. This specification selects and specifically describes these embodiments in order to better explain the principles and practical applications of the embodiments of this specification, so that those skilled in the art can well understand and use this specification. This specification is to be limited only by the claims, along with their full scope and equivalents.

Claims

1. A diffusion model processing method comprising:

determining a time step set of a diffusion model and a time step interval corresponding to the time step set;

determining a first time step from the time step set, and determining a target time step corresponding to the first time step according to the time step interval, wherein the first time step is any time step in the time step set;

inputting the noise adding picture corresponding to the first time step and the target time step into a diffusion model to obtain the prediction noise corresponding to the noise adding picture;

and processing the diffusion model according to the target noise corresponding to the noise-added picture and the prediction noise.

2. The diffusion model processing method according to claim 1, wherein before inputting the noise-added picture corresponding to the first time step and the target time step into a diffusion model to obtain the prediction noise corresponding to the noise-added picture, the method further comprises:

determining an initial picture and target noise corresponding to the first time step;

and determining a noise adding picture corresponding to the first time step and target noise corresponding to the noise adding picture according to the initial picture and the target noise.

3. The diffusion model processing method according to claim 1, wherein the determining the time step set of the diffusion model and the time step interval corresponding to the time step set includes:

determining a time step set of a diffusion model, and dividing time step intervals in the time step set according to preset dividing conditions to obtain time step intervals corresponding to the time step set.

4. The diffusion model processing method according to claim 1, wherein the determining the target time step corresponding to the first time step according to the time step interval includes:

and determining an interval endpoint of the time step interval, and determining a target time step corresponding to the first time step according to the interval endpoint.

5. The diffusion model processing method according to claim 4, wherein determining the interval end point of the time step interval, and determining the target time step corresponding to the first time step according to the interval end point, comprises:

and determining a section left end point of the time step section, and determining the section left end point as a target time step corresponding to the first time step, wherein a section right end point of the time step section is a left end point included in a next time step section.

6. The diffusion model processing method according to claim 4, wherein determining the interval end point of the time step interval, and determining the time step corresponding to the first time step according to the interval end point, comprises:

and determining a right end point of the interval of the time step interval, and determining the right end point of the interval as a target time step corresponding to the first time step, wherein a left end point of the interval of the time step interval is a right end point included in the interval of the last time step.

7. The diffusion model processing method according to claim 1, wherein the processing the diffusion model according to the target noise corresponding to the noisy picture and the prediction noise includes:

and calculating a noise loss function according to the target noise corresponding to the noise-added picture and the predicted noise, adjusting network parameters of the diffusion model according to the noise loss function, and obtaining the diffusion model under the condition that a preset training ending condition is met.

8. The method for processing a diffusion model according to any one of claims 1 to 7, wherein after the processing the diffusion model according to the target noise corresponding to the noisy picture and the prediction noise, the method further comprises:

Determining a target noise adding picture, and inputting the target noise adding picture into a diffusion model to obtain prediction noise corresponding to the target noise adding picture;

and determining the denoised target picture according to the target denoised picture and the prediction noise corresponding to the target denoised picture.

9. The diffusion model processing method according to claim 8, the determining a target noisy picture, comprising:

and determining a noisy video frame set, and determining any video frame in the video frame set as a target noisy picture.

10. A diffusion model processing apparatus comprising:

the interval dividing module is configured to determine a time step set of the diffusion model and a time step interval corresponding to the time step set;

the target time step determining module is configured to determine a first time step from the time step set, and determine a target time step corresponding to the first time step according to the time step interval, wherein the first time step is any time step in the time step set;

the first model prediction module is configured to input a noise-added picture corresponding to the first time step and the target time step into a diffusion model to obtain prediction noise corresponding to the noise-added picture;

And the model processing module is configured to process the diffusion model according to the target noise corresponding to the noise-added picture and the prediction noise.

11. A picture processing method, comprising:

determining a denoised target picture according to the target denoised picture and the prediction noise corresponding to the target denoised picture, wherein the diffusion model is obtained by the diffusion model processing method according to any one of the claims 1-9.

12. A picture processing apparatus comprising:

the second model prediction module is configured to determine a target noise-added picture, input the target noise-added picture into a diffusion model and obtain prediction noise corresponding to the target noise-added picture;

a target picture determining module configured to determine a denoised target picture according to the target denoised picture and a prediction noise corresponding to the target denoised picture,

wherein the diffusion model is obtained by the diffusion model processing method according to any one of the preceding claims 1 to 9.

13. A computing device, comprising:

A memory and a processor;

the memory is configured to store computer executable instructions that, when executed by a processor, implement the steps of the diffusion model processing method of any one of claims 1 to 9 or the picture processing method of claim 11.

14. A computer-readable storage medium storing computer-executable instructions which, when executed by a processor, implement the steps of the diffusion model processing method of any one of claims 1 to 9 or the picture processing method of claim 11.