CN115908187A

CN115908187A - Image characteristic analysis and generation method based on rapid denoising diffusion probability model

Info

Publication number: CN115908187A
Application number: CN202211560705.XA
Authority: CN
Inventors: 吕金虎; 阚哿; 王田; 刘克新; 高庆
Original assignee: Beihang University
Current assignee: Beihang University
Priority date: 2022-12-07
Filing date: 2022-12-07
Publication date: 2023-04-04

Abstract

The invention discloses an image characteristic analysis and generation method based on a rapid denoising diffusion probability model, which comprises the following steps: preprocessing an image generation data set to obtain a training data set; constructing an original denoising diffusion probability model, wherein the original denoising diffusion probability model comprises a fractional predictor, and introducing a condition generator and a corresponding discriminator to obtain a rapid denoising diffusion probability model; training a score predictor by a score matching method, freezing the trained score predictor, and training a condition generator and a discriminator by adopting an alternative training method for generating confrontation; and updating the trained fraction predictor and the trained condition generator to a rapid denoising and diffusing probability model, and inputting a Gaussian white noise picture into the rapid denoising and diffusing probability model to obtain a real high-quality output image. The method can ensure high-quality generating capacity and obviously improve the efficiency of generating samples.

Description

Image characteristic analysis and generation method based on rapid denoising diffusion probability model

Technical Field

The invention belongs to the field of image feature analysis and generation of computer vision, and relates to an image feature analysis and generation method based on a rapid denoising diffusion probability model.

Background

With the advent of modern deep learning, dramatic efforts have been made in the field of computer vision. With the birth of the convolutional neural network and the invention of the residual error network structure, the effect of deep learning on the tasks of classification, detection, segmentation and the like of pictures with various scales is far better than that of the traditional method. In the field of image generation, a depth generation model such as a countermeasure network or a variation self-encoder is generated, and a high-resolution picture can be generated. However, the generation effect is still limited by the too simple implicit generation model. Recently, the denoising diffusion probability model can produce higher-quality samples, but a Markov generation process with the length close to 1000 steps needs to be constructed, so that high time and calculation cost are needed.

Disclosure of Invention

The invention aims to solve the problems that the generation quality of an implicit generation model is restricted and the sampling efficiency of a denoising diffusion probability model is low in the conventional depth generation model. The invention provides an image characteristic analysis and generation method based on a rapid denoising diffusion probability model, which introduces a condition generator and a corresponding discriminator model on the basis of an original denoising diffusion probability model, trains a supplementary condition generator by using a generation countermeasure mode, and enables a combined model to realize the image generation effect with high quality and high efficiency.

The technical scheme adopted by the invention for solving the technical problem is as follows:

an image feature analysis and generation method based on a fast denoising diffusion probability model comprises the following steps:

s1: preprocessing the image generation data set to obtain a training data set;

s2: constructing an original denoising diffusion probability model, wherein the original denoising diffusion probability model comprises a fraction predictor, and introducing a condition generator and a corresponding discriminator to obtain a rapid denoising diffusion probability model;

s3: training a score predictor by a score matching method, freezing the trained score predictor, and training a condition generator and a discriminator by adopting an alternative training method for generating confrontation;

s4: and updating the trained fraction predictor and the trained condition generator to a rapid denoising and diffusing probability model, and inputting a Gaussian white noise picture into the rapid denoising and diffusing probability model to obtain a real high-quality output image.

Further, in the step S2, the original denoising diffusion probability model includes a forward gaussian diffusion kernel and a reverse gaussian denoising kernel, where the forward gaussian diffusion kernel is

Forward diffusion process x for picture plus noise implementation ₀ ...x _t ...x _T Wherein->

Denotes a Gaussian distribution,. Beta _t For a predetermined noise scale, time T =1 ₀ For noiseless pictures, intermediate state x _t For noisy pictures at time t, process end point x _T Is a Gaussian white noise picture; the inverse Gaussian noise elimination kernel p _θ (x _t-1 |x _t ) De-noising generation process x for implementing in reverse direction _T ...x _t ...x ₀ Is concretely provided with

/>

In the formula, σ _t The standard deviation of a Gaussian denoising kernel; i is an identity matrix;

f _θ (x _t t) represents a denoised prediction; e is the same as _θ (x _t T) is a score predictor；μ _θ (x _t And t) is the mean value.

Further, in the step S2, the condition generator is

Introducing the method into a reverse denoising generation process to obtain a reverse Gaussian denoising kernel introduced into a condition generator:

in the formula, mu _θ,ω (x _t T) is the average value after the condition generator is introduced, and u represents the Gaussian noise obtained by sampling.

Further, the fraction predictor e _θ The realization of the method is based on a U-net neural network structure, is a symmetrical structure of an encoder and a decoder, and adopts jump connection to share characteristic information at the corresponding stages of the encoder and the decoder; the encoder part of U-net will input the noisy picture x via the concatenated ResBlock _t Reducing dimension and extracting multi-scale visual characteristics, and outputting a picture x containing noise according to the multi-scale visual characteristics by a cascade ResBlock of a decoder part _t A noise part of (1); coding the time t into a 256-dimensional vector by using a time embedding mode of a Transformer neural network structure, and inputting the 256-dimensional vector into each ResBlock of the U-net network, so that the U-net can depend on the time t explicitly;

condition generator G _ω The realization of the method is based on a U-net neural network structure adopted by a fractional predictor, which is a symmetrical structure of a coder-decoder, and a three-layer full-connection network is additionally adopted to map the noise U into 256-dimensional vectors which are input into each ResBlock of the U-net, so that the U-net can also explicitly depend on the noise U; extracting the multi-scale visual features of denoising prediction at one side of the encoder, and outputting the optimized denoising prediction at one side of the decoder according to the multi-scale features;

discriminator D _ψ The implementation of (1) is based on a U-net neural network structure, adopts ResBlock depending on time t, and inputs a noise-containing picture x at the current time t _t And picture x at the next instant t-1 _t-1 And the output part obtains the discrimination score of the real value through the convolution nerve layer.

Further, the step S3 of training the score predictor by the score matching method includes the following sub-steps:

(1) Randomly selecting a batch of noiseless pictures from the training data set to be recorded as x ₀ ～q(x ₀ ) From [1, T ]]Taking a batch of integers of the average sampling as time t-U (t), and taking a batch of Gaussian noise from the standard Gaussian distribution with the same dimension as the picture to be recorded as

(2) The variables t, x obtained above ₀ Substituting the element into a weightless score matching square error function, and calculating the mean value of the function as an objective function L (theta) of the optimization;

(3) Optimizing a primary neural network parameter theta by back-propagating an objective function L (theta);

(4) And repeating the steps 1) -3) until the objective function converges or the maximum optimization times is reached.

6. The method as claimed in claim 5, wherein the score matching objective function is

In the formula (I), the compound is shown in the specification,

representing averaging the function according to the variables. />

Further, in the step S3, the training condition generator and the discriminator are trained by an alternative training method for generating the countermeasure network, and the method includes the following substeps:

(1) Will be trained wellIs the fraction matcher e _θ Freezing as a basis for subsequent processes;

(2) Randomly selecting a batch of noiseless pictures from the training data set to be recorded as x ₀ ～q(x ₀ ) From [1, T ]]Sampling a batch of integers at medium average as time t-U (t) according to x ₀ Conditional distribution q (x) from gaussians _t-1 |x ₀ ) Middle sampling a batch x _t-1 According to x _t-1 Diffusion of kernel q (x) from forward Gaussian _t |x _t-1 ) Middle sampling a batch x _t According to x _t From inverse Gaussian denoised kernel p after introducing conditional generator _θ,ω (x _t-1 |x _t ) X predicted by intermediate sampling _t-1 ；

(3) The variables t, x obtained above ₀ ,x _t-1 ,x _t ,x _t-1 Substituting into a function on the discriminator parameter ψ that generates a confrontation target function, calculating the mean of the function as the optimization target of the discriminator;

(4) Optimizing the primary discriminator parameter ψ by back-propagating an objective function on the discriminator parameter ψ;

(5) The variables t, x obtained above ₀ ,x _t-1 ,x _t ,x _t-1 Substituting into a function of a condition generator parameter omega for generating a confrontation objective function, and calculating the mean value of the function as an optimization objective of the condition generator;

(6) Optimizing a primary condition generator parameter omega by back-propagating an objective function with respect to the condition generator parameter omega;

(7) And (4) repeating the steps (2) to (6) until the generation of the confrontation objective function converges or the maximum optimization times is reached.

Further, the generation countermeasure objective function of the discriminator parameter ψ and the condition generator parameter ω is

Compared with the prior art, the invention has the beneficial effects that:

(1) The method is based on a denoising diffusion probability model, and has stronger image generation capacity than common generation models such as a generation countermeasure network and a variational self-encoder;

(2) Aiming at the problems of long sampling time and low efficiency of a denoising diffusion probability model in image generation, a condition generator is introduced as an improvement, so that the provided fast denoising diffusion probability model keeps the high-quality generation capability of the denoising diffusion probability model and greatly improves the sampling generation efficiency;

(3) The rapid denoising diffusion probability model has the capability of analyzing the characteristics of a noise-containing image in the process of generating the image, and can achieve the purpose of controlling the generated characteristics by adjusting the characteristics of the noise.

Drawings

FIG. 1 is a schematic diagram of generation confrontation training of a fast denoising diffusion probability model;

FIG. 2 is a CIFAR10 picture randomly generated by a fast denoising diffusion probability model.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

The invention provides an image characteristic analysis and generation method based on a rapid denoising diffusion probability model, which comprises the following steps:

step 1: preprocessing data;

and carrying out preprocessing operations such as size resetting, normalization, horizontal turning and the like on the image generation data set by using a deep learning framework to obtain a training data set suitable for training a neural network model. Wherein the image generation dataset is selected from a collected dataset or a socially published image dataset, such as CIFAR10 and CelebA; the data set is scaled to 32 x 32, 64 x 64 or 128 x 128 resolution using a deep learning toolkit, then the data of the data set is normalized to [ -1,1], and finally random horizontal flipping is performed on the data set samples as image pre-processing.

Specifically, in this embodiment, a public data set CIFAR10 is selected as an example, the pictures in the CIFAR10 are scaled to 32 × 32 resolution, each pixel is normalized to [ -1,1], and a random horizontal flipping preprocessing is performed to obtain a training data set suitable for training a neural network model.

Step 2: theoretical construction of a rapid denoising diffusion probability model;

in theory, the original denoising diffusion probability model constructs a trainable random process to learn how to denoise the noise step by step into data samples. Specifically, the original model first constructs a forward diffusion process x with the length of T and continuously adding noise to the data picture ₀ ...x _t ...x _T Wherein the starting point x of the diffusion process ₀ For noiseless pictures in the data set, intermediate state x _t For the random variable of the noisy picture after adding noise, the end point x of the diffusion process _T Close to gaussian white noise. The forward Gaussian diffusion kernel constituting the forward process is

Wherein->

Denotes a Gaussian distribution,. Beta _t T = 1.. And T is a preset noise scale. The forward gaussian diffusion kernel describes the property of the forward process to add noise continuously. From the Gaussian nature of the forward process, the conditional posterior distribution->

Wherein +>

This means that we can get noisy samples at time t by means of direct sampling, e.g.

Where the gaussian noise e can be sampled. Then, an original denoising diffusion probability model constructs a denoising generation process x along reverse time _T ...x _t ...x ₀ . The denoising process is based on a trainable reverse denoising kernel x _t-1 ～p _θ (x _t-1 |x _t ) Composition of, de-noising kernel p _θ (x _t-1 |x _t ) Has the following Gaussian form

In the formula, the mean value μ _θ (x _t T) is the current state x _t And denoising prediction f _θ (x _t T) and de-noising prediction can be performed by a fractional predictor e _θ (x _t And t) is represented by

Sigma in the formula _t The standard deviation of the Gaussian denoised kernel is set to be beta _t . In de-noising kernels, the fraction predictor e _θ (x _t T) by analyzing the noisy picture x _t Predicting the noisy variable x _t Containing noise e _t And then f is caused to _θ (x _t T) capable of predicting a denoised sample x ₀ 。

The original denoising diffusion probability model usually needs 1000 steps of denoising generation process to obtain a high-quality sample, and if the denoising generation process is reduced to 4 steps, the generated sample is fuzzy and has low quality. We find that the reason is that in a 4-step-only denoising process, the fractional predictor e _θ (x _t T) the noisy variable x cannot be predicted well _t Containing noise e _t So that the denoising prediction f _θ (x _t T) poor quality, resulting in failure to use only 4 stepsThe noise process can obtain a high-quality de-noised sample x ₀ . Therefore, the present invention predicts the denoising of f _θ (x _t T) input condition generator

(where u is the sampled Gaussian noise with a ratio variable x _t Lower dimension, usually chosen as 100 dimension), in denoising prediction f _θ (x _t And t) obtaining higher-quality and more real denoising prediction on the basis of the image characteristics contained in the t), and further more accurately predicting the mean value. This corresponds to an inverse gaussian denoising kernel after the generator is introduced as follows: />

In the formula, mu _θ,ω (x _t T) is the mean value after introducing the condition generator, and the theoretical prediction is more than the original mean value mu _θ (x _t And t) is more accurate.

And step 3: a rapid denoising diffusion probability model is realized through a deep learning framework;

original denoising diffusion probability model utilizes fraction predictor epsilon _θ The invention realizes a trainable generation and denoising process, and the rapid denoising diffusion probability model provided by the invention newly introduces a condition generator G _ω Constructing a trainable fast generation denoising process and introducing a corresponding discriminator D _ψ To train condition generator G _ω . In implementation, the present invention employs a neural network to represent the fractional predictor ε _θ Condition generator G _ω And a discriminator D _ψ And constructing a corresponding neural network module according to the resolution or the scale of the data set.

Specifically, the score predictor ∈ _θ Based on a U-net neural network structure, is a symmetric structure of an encoder-decoder, and is realized in the pairs of the encoder and the decoderAnd carrying out characteristic information sharing by adopting the jump connection at an application stage. The encoder part of the U-net will input noisy picture x by means of a concatenated residual block (ResBlock) _t Reducing dimension and extracting multi-scale visual features, and a cascade ResBlock of a decoder part outputs a picture x containing noise according to the multi-scale visual features _t Of the noise section. But unlike ordinary U-net, the input of the fractional predictor also contains time t, so that the time t is encoded into a 256-dimensional vector by using the time embedding mode of a Transformer neural network structure and is input into each ResBlock of the U-net network, so that the U-net can depend on the time t explicitly.

Condition generator G _ω The implementation of (a) employs a similar U-net structure as the fractional predictor, but additionally adds random noise U as input. We therefore map the noise U into a 256-dimensional vector with an additional three-layer fully-connected network and input it into each ResBlock of U-net so that U-net can explicitly rely on the noise U. Condition generator G _ω Denoising prediction f obtained by fraction predictor _θ As an input, the encoder side extracts the multi-scale visual features of the denoising prediction, and the decoder side outputs higher-quality and more accurate denoising prediction according to the multi-scale features.

Discriminator D _ψ Is implemented similar to the encoder portion of the fractional predictor U-net, with ResBlock being dependent on time t. But input as the noisy picture x at the current time t _t And picture x at the next instant t-1 _t-1 I.e. function D _ψ (x _t-1 ,x _t T). We will x _t-1 And x _t The input and output parts are connected together to form a neural network input with the channel number of 6, and the real-value discrimination scoring is obtained by the convolution neural layer.

And 4, step 4: training a rapid denoising diffusion probability model;

the training of the fast denoising diffusion probability model is divided into two stages. Fractional predictor network E of primary denoising diffusion probability model trained in first stage _θ Using the score matching training process

1) Randomly selecting a batch of noiseless pictures from the training data set to be recorded as x ₀ ～q(x ₀ )，From [1, T ]]Taking a batch of integers of the average sampling as time t-U (t), and taking a batch of Gaussian noise from the standard Gaussian distribution with the same dimension as the picture to be recorded as

2) The variables t, x obtained above ₀ Substituting the epsilon into a weightless score matching square error function, and calculating the mean value of the function to be used as an objective function L (theta) of the optimization;

3) Optimizing a primary neural network parameter theta by back-propagating an objective function L (theta);

4) And repeating the steps 1) -3) until the objective function converges or the maximum optimization times is reached.

Wherein the weightless score matching objective function is specifically

Expectation in the formula

The operation of averaging the function according to the variables in step 2) is shown.

The second stage is in the matching unit epsilon of the freezing training _θ Under the condition, training a condition generator network G introduced by a rapid denoising diffusion probability model _ω And arbiter network D _ψ An alternating training process is employed to generate the countermeasures as follows

1) Matching the first stage trained score matcher epsilon _θ Freezing as a basis for subsequent processes;

2) Randomly selecting a batch of noiseless pictures from the training data set to be recorded as x ₀ ～q(x ₀ ) From [1, T ]]Sampling a batch of integers at medium average as time t-U (t) according to x ₀ Conditional distribution q (x) from gaussians _t-1 |x ₀ ) Middle sampling a batch x _t-1 According to x _t-1 Diffusion of kernel q (x) from forward Gaussian _t |x _t-1 ) Middle sampling a batch x _t According to x _t From introduction toInverse Gaussian denoising kernel p after condition generator _θ,ω (x _t-1 |x _t ) X predicted by intermediate sampling _t-1 ；

3) The variables t, x obtained above ₀ ,x _t-1 ,x _t ,x _t-1 Substituting into a function on the discriminator parameter ψ that generates a confrontation objective function, calculating the mean of the function as the optimization objective of the discriminator;

4) Optimizing the primary discriminator parameter ψ by back-propagating an objective function on the discriminator parameter ψ;

5) The variables t, x obtained above ₀ ,x _t-1 ,x _t ,x _t-1 Substituting into a function of a condition generator parameter omega for generating a confrontation objective function, and calculating the mean value of the function as an optimization objective of the condition generator;

6) Optimizing the primary condition generator parameter omega by back-propagating an objective function with respect to the condition generator parameter omega;

7) Repeating the steps 2) -6) until the generation of the confrontation objective function converges or the maximum optimization times is reached.

Wherein the generation countermeasure objective function with respect to the discriminator parameter ψ and the condition generator parameter ω is

The correlation probability distribution in the formula is Gaussian or even distribution, and direct sampling can be realized, see step 2. Intuitively, as shown in FIG. 1, we will refer to the current state x _t Input frozen fraction predictor e _θ From the calculation, a coarse denoising prediction x' _o Then the roughness is predicted to x' _o Input condition generator G _ω Obtaining a refined denoising prediction x' ₀ ', then inputting a new inverse Gaussian noise-removing kernel p _θ,ω (x _t-1 |x _t ) Obtaining predicted x _t-1 . We introduce a discriminator D _ψ To guide the condition generator G _ω Training the arbiter to resolve true x _t-1 And predicted x _t-1 Then training the condition generator to give better de-noising prediction and then predicting x _t-1 With true x _t-1 And (4) approaching.

And 5: generating a new sample by using a rapid denoising diffusion probability model;

the fraction predictor which is trained in the first stage belongs to _θ And a second-stage trained condition generator G _ω Substituting a new inverse Gaussian noise-removal kernel p _θ,ω (x _t-1 |x _t ) And forming a rapid denoising generation process:

in the formula, p (x) _T ) Represents a standard Gaussian distribution

p _θ (x _0:T ) Representing the joint distribution of the process.

At each denoising step t of the new generation denoising process, the fraction predictor is epsilon _θ For analysing noisy pictures x _t Giving a preliminary de-noising prediction x 'from the features' _o The preliminary denoising prediction quality is poor. Next condition generator G _ω Predicting x 'with poor denoising' _o Conversion to high quality denoised prediction x' ₀ ', such that the kernel p is denoised _θ,ω (x _t-1 |x _t ) With a more accurate mean. Random Gaussian noise can be converted into a real and high-quality image sample through the generation and denoising process, and the denoising step number is reduced to T =4.

Comparative example

We used IS and FID as quality of production indicators on the CIFAR10 dataset. The fast denoising diffusion probability model provided by the invention has an IS score of 8.51 and an FID score of 11.41 on CIFAR10, which IS superior to a recently proposed baseline model. The comparative effect on CIFAR10 is shown in table 1. The randomly generated picture is shown in fig. 2.

TABLE 1 CIFAR10 generated task effect comparison

The comparative generative models were:

NCSN model (see Generation modeling by modeling gradients of the data distribution, advances in Neural Information Processing Systems, 2019.)

DiffusebaE model (see Diffusebae: effect, controllable and high-fidelity generation from low-dimensional relationships. Arxiv, abs/2201.00308, 2022.)

AutoGAN model (see Autogan: new architecture search for genetic additive network. Proceedings of the IEEE International Conference on Computer Vision, pp.3223-3233, 2019.)

SNGAN model (see Spectral normalization for genetic additive networks in International Conference on Learning responses, 2018.)

Glow model (see Glow: genetic flow with interactive 1X1 volumes in Advances in neural information processing systems, 2018.)

PixelCNN model (see Pixel recovery neural networks. International Conference on Machine Learning, 2016.)

NVAE model (see NVAE: A deep scientific automatic encoder in Advances in neural information processing systems, 2020.)

IGEBM model (see Implicitl generation and modification with energy based modifications in Advances in Neural Information Processing Systems, 2019.)

VAEBM model (see Vaebm: A systematic between variant automatic codes and energy-based models in International Conference on Learning retrieval, 2021.)

Claims

1. An image feature analysis and generation method based on a rapid denoising diffusion probability model is characterized by comprising the following steps:

s1: preprocessing an image generation data set to obtain a training data set;

s3: training a score predictor by a score matching method, freezing the trained score predictor, and training a condition generator and a discriminator by adopting an alternative training method for generating countermeasures;

s4: and updating the trained fractional predictor and the trained condition generator to the rapid denoising diffusion probability model, and inputting a Gaussian white noise picture into the rapid denoising diffusion probability model to obtain a real high-quality output image.

2. The method for analyzing and generating image features based on fast denoising diffusion probability model according to claim 1, wherein in step S2, the original denoising diffusion probability model comprises a forward gaussian diffusion kernel and an inverse gaussian denoising kernel, wherein the forward gaussian diffusion kernel is

Denotes a Gaussian distribution,. Beta _t For a predetermined noise scale, time T =1 ₀ For noiseless pictures, intermediate state x _t For noisy pictures at time t, process end point x _T Is a Gaussian white noise picture; the inverse Gaussian noise elimination kernel p _θ (x _t-1 |x _t ) For effecting de-noising in the reverse directionGenerating Process x _T ...x _t ...x ₀ Is concretely provided with

f _θ (x _t and t) represents denoising prediction; e is the same as _θ (x _t T) is a score predictor; mu.s _θ (x _t And t) is the mean value.

3. The method for analyzing and generating image features based on fast denoising diffusion probability model according to claim 2, wherein in step S2, the condition generator is

4. According to claimThe method for analyzing and generating image characteristics based on the rapid denoising diffusion probability model in claim 3 is characterized in that the fractional predictor is epsilon _θ The realization of the method is based on a U-net neural network structure, is a symmetrical structure of an encoder and a decoder, and adopts jump connection to share characteristic information at the corresponding stages of the encoder and the decoder; the encoder part of U-net will input the noisy picture x via the concatenated ResBlock _t Reducing dimension and extracting multi-scale visual characteristics, and outputting a picture x containing noise according to the multi-scale visual characteristics by a cascade ResBlock of a decoder part _t A noise part of (1); coding the time t into a 256-dimensional vector by using a time embedding mode of a Transformer neural network structure, and inputting the 256-dimensional vector into each ResBlock of the U-net network, so that the U-net can depend on the time t explicitly;

condition generator G _ω The implementation of the method is based on a U-net neural network structure adopted by a fractional predictor, the U-net neural network structure is a symmetric structure of a coder-decoder, a three-layer full-connection network is additionally adopted to map noise U into 256-dimensional vectors, and the 256-dimensional vectors are input into each ResBlock of the U-net, so that the U-net can also explicitly depend on the noise U; extracting the multi-scale visual features of denoising prediction at one side of the encoder, and outputting the optimized denoising prediction at one side of the decoder according to the multi-scale features;

5. The method for analyzing and generating image features based on fast denoising diffusion probability model as claimed in claim 4, wherein the step S3, training the score predictor by score matching method, comprises the following sub-steps:

In the formula (I), the compound is shown in the specification,

representing averaging the function according to the variables.

7. The image feature analysis and generation method based on the fast denoising diffusion probability model as claimed in claim 6, wherein the step S3, training the condition generator and the discriminator by using an alternative training method for generating a confrontation network, comprises the following sub-steps:

(1) Matching the trained score with the E _θ Freezing as a basis for subsequent processes;

(2) Randomly selecting a batch of noiseless pictures from the training data set to be recorded as x ₀ ～q(x ₀ ) From [1, T ]]Sampling a batch of integers at medium average as time t-U (t) according to x ₀ Conditional distribution q (x) from gaussians _t-1 |x ₀ ) Middle sampling a batch x _t-1 According to x _t-1 Diffusion of kernel q (x) from forward Gaussian _t |x _t-1 ) Middle sampling a batch x _t According to x _t From inverse Gaussian denoise kernel p after introducing condition generator _θ,ω (x _t-1 |x _t ) X predicted by intermediate sampling _t-1 ；

(3) The variables t, x obtained above ₀ ,x _t-1 ,x _t ,x _t-1 Substituting into a function on the discriminator parameter ψ that generates a confrontation objective function, calculating the mean of the function as the optimization objective of the discriminator;

8. The method as claimed in claim 7, wherein the generation countermeasure objective function of the discriminator parameter ψ and the condition generator parameter ω is

/>