CN115908187A - Image characteristic analysis and generation method based on rapid denoising diffusion probability model - Google Patents

Image characteristic analysis and generation method based on rapid denoising diffusion probability model Download PDF

Info

Publication number
CN115908187A
CN115908187A CN202211560705.XA CN202211560705A CN115908187A CN 115908187 A CN115908187 A CN 115908187A CN 202211560705 A CN202211560705 A CN 202211560705A CN 115908187 A CN115908187 A CN 115908187A
Authority
CN
China
Prior art keywords
denoising
probability model
diffusion
condition generator
gaussian
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211560705.XA
Other languages
Chinese (zh)
Inventor
吕金虎
阚哿
王田
刘克新
高庆
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beihang University
Original Assignee
Beihang University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beihang University filed Critical Beihang University
Priority to CN202211560705.XA priority Critical patent/CN115908187A/en
Publication of CN115908187A publication Critical patent/CN115908187A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Image Analysis (AREA)

Abstract

The invention discloses an image characteristic analysis and generation method based on a rapid denoising diffusion probability model, which comprises the following steps: preprocessing an image generation data set to obtain a training data set; constructing an original denoising diffusion probability model, wherein the original denoising diffusion probability model comprises a fractional predictor, and introducing a condition generator and a corresponding discriminator to obtain a rapid denoising diffusion probability model; training a score predictor by a score matching method, freezing the trained score predictor, and training a condition generator and a discriminator by adopting an alternative training method for generating confrontation; and updating the trained fraction predictor and the trained condition generator to a rapid denoising and diffusing probability model, and inputting a Gaussian white noise picture into the rapid denoising and diffusing probability model to obtain a real high-quality output image. The method can ensure high-quality generating capacity and obviously improve the efficiency of generating samples.

Description

Image characteristic analysis and generation method based on rapid denoising diffusion probability model
Technical Field
The invention belongs to the field of image feature analysis and generation of computer vision, and relates to an image feature analysis and generation method based on a rapid denoising diffusion probability model.
Background
With the advent of modern deep learning, dramatic efforts have been made in the field of computer vision. With the birth of the convolutional neural network and the invention of the residual error network structure, the effect of deep learning on the tasks of classification, detection, segmentation and the like of pictures with various scales is far better than that of the traditional method. In the field of image generation, a depth generation model such as a countermeasure network or a variation self-encoder is generated, and a high-resolution picture can be generated. However, the generation effect is still limited by the too simple implicit generation model. Recently, the denoising diffusion probability model can produce higher-quality samples, but a Markov generation process with the length close to 1000 steps needs to be constructed, so that high time and calculation cost are needed.
Disclosure of Invention
The invention aims to solve the problems that the generation quality of an implicit generation model is restricted and the sampling efficiency of a denoising diffusion probability model is low in the conventional depth generation model. The invention provides an image characteristic analysis and generation method based on a rapid denoising diffusion probability model, which introduces a condition generator and a corresponding discriminator model on the basis of an original denoising diffusion probability model, trains a supplementary condition generator by using a generation countermeasure mode, and enables a combined model to realize the image generation effect with high quality and high efficiency.
The technical scheme adopted by the invention for solving the technical problem is as follows:
an image feature analysis and generation method based on a fast denoising diffusion probability model comprises the following steps:
s1: preprocessing the image generation data set to obtain a training data set;
s2: constructing an original denoising diffusion probability model, wherein the original denoising diffusion probability model comprises a fraction predictor, and introducing a condition generator and a corresponding discriminator to obtain a rapid denoising diffusion probability model;
s3: training a score predictor by a score matching method, freezing the trained score predictor, and training a condition generator and a discriminator by adopting an alternative training method for generating confrontation;
s4: and updating the trained fraction predictor and the trained condition generator to a rapid denoising and diffusing probability model, and inputting a Gaussian white noise picture into the rapid denoising and diffusing probability model to obtain a real high-quality output image.
Further, in the step S2, the original denoising diffusion probability model includes a forward gaussian diffusion kernel and a reverse gaussian denoising kernel, where the forward gaussian diffusion kernel is
Figure BDA0003984497890000011
Forward diffusion process x for picture plus noise implementation 0 ...x t ...x T Wherein->
Figure BDA0003984497890000012
Denotes a Gaussian distribution,. Beta t For a predetermined noise scale, time T =1 0 For noiseless pictures, intermediate state x t For noisy pictures at time t, process end point x T Is a Gaussian white noise picture; the inverse Gaussian noise elimination kernel p θ (x t-1 |x t ) De-noising generation process x for implementing in reverse direction T ...x t ...x 0 Is concretely provided with
Figure BDA0003984497890000021
/>
Figure BDA0003984497890000022
In the formula, σ t The standard deviation of a Gaussian denoising kernel; i is an identity matrix;
Figure BDA0003984497890000023
f θ (x t t) represents a denoised prediction; e is the same as θ (x t T) is a score predictor;μ θ (x t And t) is the mean value.
Further, in the step S2, the condition generator is
Figure BDA0003984497890000024
Introducing the method into a reverse denoising generation process to obtain a reverse Gaussian denoising kernel introduced into a condition generator:
Figure BDA0003984497890000025
Figure BDA0003984497890000026
in the formula, mu θ,ω (x t T) is the average value after the condition generator is introduced, and u represents the Gaussian noise obtained by sampling.
Further, the fraction predictor e θ The realization of the method is based on a U-net neural network structure, is a symmetrical structure of an encoder and a decoder, and adopts jump connection to share characteristic information at the corresponding stages of the encoder and the decoder; the encoder part of U-net will input the noisy picture x via the concatenated ResBlock t Reducing dimension and extracting multi-scale visual characteristics, and outputting a picture x containing noise according to the multi-scale visual characteristics by a cascade ResBlock of a decoder part t A noise part of (1); coding the time t into a 256-dimensional vector by using a time embedding mode of a Transformer neural network structure, and inputting the 256-dimensional vector into each ResBlock of the U-net network, so that the U-net can depend on the time t explicitly;
condition generator G ω The realization of the method is based on a U-net neural network structure adopted by a fractional predictor, which is a symmetrical structure of a coder-decoder, and a three-layer full-connection network is additionally adopted to map the noise U into 256-dimensional vectors which are input into each ResBlock of the U-net, so that the U-net can also explicitly depend on the noise U; extracting the multi-scale visual features of denoising prediction at one side of the encoder, and outputting the optimized denoising prediction at one side of the decoder according to the multi-scale features;
discriminator D ψ The implementation of (1) is based on a U-net neural network structure, adopts ResBlock depending on time t, and inputs a noise-containing picture x at the current time t t And picture x at the next instant t-1 t-1 And the output part obtains the discrimination score of the real value through the convolution nerve layer.
Further, the step S3 of training the score predictor by the score matching method includes the following sub-steps:
(1) Randomly selecting a batch of noiseless pictures from the training data set to be recorded as x 0 ~q(x 0 ) From [1, T ]]Taking a batch of integers of the average sampling as time t-U (t), and taking a batch of Gaussian noise from the standard Gaussian distribution with the same dimension as the picture to be recorded as
Figure BDA0003984497890000031
(2) The variables t, x obtained above 0 Substituting the element into a weightless score matching square error function, and calculating the mean value of the function as an objective function L (theta) of the optimization;
(3) Optimizing a primary neural network parameter theta by back-propagating an objective function L (theta);
(4) And repeating the steps 1) -3) until the objective function converges or the maximum optimization times is reached.
6. The method as claimed in claim 5, wherein the score matching objective function is
Figure BDA0003984497890000032
In the formula (I), the compound is shown in the specification,
Figure BDA0003984497890000033
representing averaging the function according to the variables. />
Further, in the step S3, the training condition generator and the discriminator are trained by an alternative training method for generating the countermeasure network, and the method includes the following substeps:
(1) Will be trained wellIs the fraction matcher e θ Freezing as a basis for subsequent processes;
(2) Randomly selecting a batch of noiseless pictures from the training data set to be recorded as x 0 ~q(x 0 ) From [1, T ]]Sampling a batch of integers at medium average as time t-U (t) according to x 0 Conditional distribution q (x) from gaussians t-1 |x 0 ) Middle sampling a batch x t-1 According to x t-1 Diffusion of kernel q (x) from forward Gaussian t |x t-1 ) Middle sampling a batch x t According to x t From inverse Gaussian denoised kernel p after introducing conditional generator θ,ω (x t-1 |x t ) X predicted by intermediate sampling t-1
(3) The variables t, x obtained above 0 ,x t-1 ,x t ,x t-1 Substituting into a function on the discriminator parameter ψ that generates a confrontation target function, calculating the mean of the function as the optimization target of the discriminator;
(4) Optimizing the primary discriminator parameter ψ by back-propagating an objective function on the discriminator parameter ψ;
(5) The variables t, x obtained above 0 ,x t-1 ,x t ,x t-1 Substituting into a function of a condition generator parameter omega for generating a confrontation objective function, and calculating the mean value of the function as an optimization objective of the condition generator;
(6) Optimizing a primary condition generator parameter omega by back-propagating an objective function with respect to the condition generator parameter omega;
(7) And (4) repeating the steps (2) to (6) until the generation of the confrontation objective function converges or the maximum optimization times is reached.
Further, the generation countermeasure objective function of the discriminator parameter ψ and the condition generator parameter ω is
Figure BDA0003984497890000041
Figure BDA0003984497890000042
Compared with the prior art, the invention has the beneficial effects that:
(1) The method is based on a denoising diffusion probability model, and has stronger image generation capacity than common generation models such as a generation countermeasure network and a variational self-encoder;
(2) Aiming at the problems of long sampling time and low efficiency of a denoising diffusion probability model in image generation, a condition generator is introduced as an improvement, so that the provided fast denoising diffusion probability model keeps the high-quality generation capability of the denoising diffusion probability model and greatly improves the sampling generation efficiency;
(3) The rapid denoising diffusion probability model has the capability of analyzing the characteristics of a noise-containing image in the process of generating the image, and can achieve the purpose of controlling the generated characteristics by adjusting the characteristics of the noise.
Drawings
FIG. 1 is a schematic diagram of generation confrontation training of a fast denoising diffusion probability model;
FIG. 2 is a CIFAR10 picture randomly generated by a fast denoising diffusion probability model.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
The invention provides an image characteristic analysis and generation method based on a rapid denoising diffusion probability model, which comprises the following steps:
step 1: preprocessing data;
and carrying out preprocessing operations such as size resetting, normalization, horizontal turning and the like on the image generation data set by using a deep learning framework to obtain a training data set suitable for training a neural network model. Wherein the image generation dataset is selected from a collected dataset or a socially published image dataset, such as CIFAR10 and CelebA; the data set is scaled to 32 x 32, 64 x 64 or 128 x 128 resolution using a deep learning toolkit, then the data of the data set is normalized to [ -1,1], and finally random horizontal flipping is performed on the data set samples as image pre-processing.
Specifically, in this embodiment, a public data set CIFAR10 is selected as an example, the pictures in the CIFAR10 are scaled to 32 × 32 resolution, each pixel is normalized to [ -1,1], and a random horizontal flipping preprocessing is performed to obtain a training data set suitable for training a neural network model.
Step 2: theoretical construction of a rapid denoising diffusion probability model;
in theory, the original denoising diffusion probability model constructs a trainable random process to learn how to denoise the noise step by step into data samples. Specifically, the original model first constructs a forward diffusion process x with the length of T and continuously adding noise to the data picture 0 ...x t ...x T Wherein the starting point x of the diffusion process 0 For noiseless pictures in the data set, intermediate state x t For the random variable of the noisy picture after adding noise, the end point x of the diffusion process T Close to gaussian white noise. The forward Gaussian diffusion kernel constituting the forward process is
Figure BDA0003984497890000051
Wherein->
Figure BDA0003984497890000052
Denotes a Gaussian distribution,. Beta t T = 1.. And T is a preset noise scale. The forward gaussian diffusion kernel describes the property of the forward process to add noise continuously. From the Gaussian nature of the forward process, the conditional posterior distribution->
Figure BDA0003984497890000053
Wherein +>
Figure BDA0003984497890000054
This means that we can get noisy samples at time t by means of direct sampling, e.g.
Figure BDA0003984497890000055
Where the gaussian noise e can be sampled. Then, an original denoising diffusion probability model constructs a denoising generation process x along reverse time T ...x t ...x 0 . The denoising process is based on a trainable reverse denoising kernel x t-1 ~p θ (x t-1 |x t ) Composition of, de-noising kernel p θ (x t-1 |x t ) Has the following Gaussian form
Figure BDA0003984497890000056
Figure BDA0003984497890000057
In the formula, the mean value μ θ (x t T) is the current state x t And denoising prediction f θ (x t T) and de-noising prediction can be performed by a fractional predictor e θ (x t And t) is represented by
Figure BDA0003984497890000058
Sigma in the formula t The standard deviation of the Gaussian denoised kernel is set to be beta t . In de-noising kernels, the fraction predictor e θ (x t T) by analyzing the noisy picture x t Predicting the noisy variable x t Containing noise e t And then f is caused to θ (x t T) capable of predicting a denoised sample x 0
The original denoising diffusion probability model usually needs 1000 steps of denoising generation process to obtain a high-quality sample, and if the denoising generation process is reduced to 4 steps, the generated sample is fuzzy and has low quality. We find that the reason is that in a 4-step-only denoising process, the fractional predictor e θ (x t T) the noisy variable x cannot be predicted well t Containing noise e t So that the denoising prediction f θ (x t T) poor quality, resulting in failure to use only 4 stepsThe noise process can obtain a high-quality de-noised sample x 0 . Therefore, the present invention predicts the denoising of f θ (x t T) input condition generator
Figure BDA0003984497890000059
(where u is the sampled Gaussian noise with a ratio variable x t Lower dimension, usually chosen as 100 dimension), in denoising prediction f θ (x t And t) obtaining higher-quality and more real denoising prediction on the basis of the image characteristics contained in the t), and further more accurately predicting the mean value. This corresponds to an inverse gaussian denoising kernel after the generator is introduced as follows: />
Figure BDA0003984497890000061
Figure BDA0003984497890000062
In the formula, mu θ,ω (x t T) is the mean value after introducing the condition generator, and the theoretical prediction is more than the original mean value mu θ (x t And t) is more accurate.
And step 3: a rapid denoising diffusion probability model is realized through a deep learning framework;
original denoising diffusion probability model utilizes fraction predictor epsilon θ The invention realizes a trainable generation and denoising process, and the rapid denoising diffusion probability model provided by the invention newly introduces a condition generator G ω Constructing a trainable fast generation denoising process and introducing a corresponding discriminator D ψ To train condition generator G ω . In implementation, the present invention employs a neural network to represent the fractional predictor ε θ Condition generator G ω And a discriminator D ψ And constructing a corresponding neural network module according to the resolution or the scale of the data set.
Specifically, the score predictor ∈ θ Based on a U-net neural network structure, is a symmetric structure of an encoder-decoder, and is realized in the pairs of the encoder and the decoderAnd carrying out characteristic information sharing by adopting the jump connection at an application stage. The encoder part of the U-net will input noisy picture x by means of a concatenated residual block (ResBlock) t Reducing dimension and extracting multi-scale visual features, and a cascade ResBlock of a decoder part outputs a picture x containing noise according to the multi-scale visual features t Of the noise section. But unlike ordinary U-net, the input of the fractional predictor also contains time t, so that the time t is encoded into a 256-dimensional vector by using the time embedding mode of a Transformer neural network structure and is input into each ResBlock of the U-net network, so that the U-net can depend on the time t explicitly.
Condition generator G ω The implementation of (a) employs a similar U-net structure as the fractional predictor, but additionally adds random noise U as input. We therefore map the noise U into a 256-dimensional vector with an additional three-layer fully-connected network and input it into each ResBlock of U-net so that U-net can explicitly rely on the noise U. Condition generator G ω Denoising prediction f obtained by fraction predictor θ As an input, the encoder side extracts the multi-scale visual features of the denoising prediction, and the decoder side outputs higher-quality and more accurate denoising prediction according to the multi-scale features.
Discriminator D ψ Is implemented similar to the encoder portion of the fractional predictor U-net, with ResBlock being dependent on time t. But input as the noisy picture x at the current time t t And picture x at the next instant t-1 t-1 I.e. function D ψ (x t-1 ,x t T). We will x t-1 And x t The input and output parts are connected together to form a neural network input with the channel number of 6, and the real-value discrimination scoring is obtained by the convolution neural layer.
And 4, step 4: training a rapid denoising diffusion probability model;
the training of the fast denoising diffusion probability model is divided into two stages. Fractional predictor network E of primary denoising diffusion probability model trained in first stage θ Using the score matching training process
1) Randomly selecting a batch of noiseless pictures from the training data set to be recorded as x 0 ~q(x 0 ),From [1, T ]]Taking a batch of integers of the average sampling as time t-U (t), and taking a batch of Gaussian noise from the standard Gaussian distribution with the same dimension as the picture to be recorded as
Figure BDA0003984497890000071
2) The variables t, x obtained above 0 Substituting the epsilon into a weightless score matching square error function, and calculating the mean value of the function to be used as an objective function L (theta) of the optimization;
3) Optimizing a primary neural network parameter theta by back-propagating an objective function L (theta);
4) And repeating the steps 1) -3) until the objective function converges or the maximum optimization times is reached.
Wherein the weightless score matching objective function is specifically
Figure BDA0003984497890000072
Expectation in the formula
Figure BDA0003984497890000073
The operation of averaging the function according to the variables in step 2) is shown.
The second stage is in the matching unit epsilon of the freezing training θ Under the condition, training a condition generator network G introduced by a rapid denoising diffusion probability model ω And arbiter network D ψ An alternating training process is employed to generate the countermeasures as follows
1) Matching the first stage trained score matcher epsilon θ Freezing as a basis for subsequent processes;
2) Randomly selecting a batch of noiseless pictures from the training data set to be recorded as x 0 ~q(x 0 ) From [1, T ]]Sampling a batch of integers at medium average as time t-U (t) according to x 0 Conditional distribution q (x) from gaussians t-1 |x 0 ) Middle sampling a batch x t-1 According to x t-1 Diffusion of kernel q (x) from forward Gaussian t |x t-1 ) Middle sampling a batch x t According to x t From introduction toInverse Gaussian denoising kernel p after condition generator θ,ω (x t-1 |x t ) X predicted by intermediate sampling t-1
3) The variables t, x obtained above 0 ,x t-1 ,x t ,x t-1 Substituting into a function on the discriminator parameter ψ that generates a confrontation objective function, calculating the mean of the function as the optimization objective of the discriminator;
4) Optimizing the primary discriminator parameter ψ by back-propagating an objective function on the discriminator parameter ψ;
5) The variables t, x obtained above 0 ,x t-1 ,x t ,x t-1 Substituting into a function of a condition generator parameter omega for generating a confrontation objective function, and calculating the mean value of the function as an optimization objective of the condition generator;
6) Optimizing the primary condition generator parameter omega by back-propagating an objective function with respect to the condition generator parameter omega;
7) Repeating the steps 2) -6) until the generation of the confrontation objective function converges or the maximum optimization times is reached.
Wherein the generation countermeasure objective function with respect to the discriminator parameter ψ and the condition generator parameter ω is
Figure BDA0003984497890000081
Figure BDA0003984497890000082
The correlation probability distribution in the formula is Gaussian or even distribution, and direct sampling can be realized, see step 2. Intuitively, as shown in FIG. 1, we will refer to the current state x t Input frozen fraction predictor e θ From the calculation, a coarse denoising prediction x' o Then the roughness is predicted to x' o Input condition generator G ω Obtaining a refined denoising prediction x' 0 ', then inputting a new inverse Gaussian noise-removing kernel p θ,ω (x t-1 |x t ) Obtaining predicted x t-1 . We introduce a discriminator D ψ To guide the condition generator G ω Training the arbiter to resolve true x t-1 And predicted x t-1 Then training the condition generator to give better de-noising prediction and then predicting x t-1 With true x t-1 And (4) approaching.
And 5: generating a new sample by using a rapid denoising diffusion probability model;
the fraction predictor which is trained in the first stage belongs to θ And a second-stage trained condition generator G ω Substituting a new inverse Gaussian noise-removal kernel p θ,ω (x t-1 |x t ) And forming a rapid denoising generation process:
Figure BDA0003984497890000083
in the formula, p (x) T ) Represents a standard Gaussian distribution
Figure BDA0003984497890000084
p θ (x 0:T ) Representing the joint distribution of the process.
At each denoising step t of the new generation denoising process, the fraction predictor is epsilon θ For analysing noisy pictures x t Giving a preliminary de-noising prediction x 'from the features' o The preliminary denoising prediction quality is poor. Next condition generator G ω Predicting x 'with poor denoising' o Conversion to high quality denoised prediction x' 0 ', such that the kernel p is denoised θ,ω (x t-1 |x t ) With a more accurate mean. Random Gaussian noise can be converted into a real and high-quality image sample through the generation and denoising process, and the denoising step number is reduced to T =4.
Comparative example
We used IS and FID as quality of production indicators on the CIFAR10 dataset. The fast denoising diffusion probability model provided by the invention has an IS score of 8.51 and an FID score of 11.41 on CIFAR10, which IS superior to a recently proposed baseline model. The comparative effect on CIFAR10 is shown in table 1. The randomly generated picture is shown in fig. 2.
TABLE 1 CIFAR10 generated task effect comparison
Figure BDA0003984497890000091
The comparative generative models were:
NCSN model (see Generation modeling by modeling gradients of the data distribution, advances in Neural Information Processing Systems, 2019.)
DiffusebaE model (see Diffusebae: effect, controllable and high-fidelity generation from low-dimensional relationships. Arxiv, abs/2201.00308, 2022.)
AutoGAN model (see Autogan: new architecture search for genetic additive network. Proceedings of the IEEE International Conference on Computer Vision, pp.3223-3233, 2019.)
SNGAN model (see Spectral normalization for genetic additive networks in International Conference on Learning responses, 2018.)
Glow model (see Glow: genetic flow with interactive 1X1 volumes in Advances in neural information processing systems, 2018.)
PixelCNN model (see Pixel recovery neural networks. International Conference on Machine Learning, 2016.)
NVAE model (see NVAE: A deep scientific automatic encoder in Advances in neural information processing systems, 2020.)
IGEBM model (see Implicitl generation and modification with energy based modifications in Advances in Neural Information Processing Systems, 2019.)
VAEBM model (see Vaebm: A systematic between variant automatic codes and energy-based models in International Conference on Learning retrieval, 2021.)

Claims (8)

1. An image feature analysis and generation method based on a rapid denoising diffusion probability model is characterized by comprising the following steps:
s1: preprocessing an image generation data set to obtain a training data set;
s2: constructing an original denoising diffusion probability model, wherein the original denoising diffusion probability model comprises a fraction predictor, and introducing a condition generator and a corresponding discriminator to obtain a rapid denoising diffusion probability model;
s3: training a score predictor by a score matching method, freezing the trained score predictor, and training a condition generator and a discriminator by adopting an alternative training method for generating countermeasures;
s4: and updating the trained fractional predictor and the trained condition generator to the rapid denoising diffusion probability model, and inputting a Gaussian white noise picture into the rapid denoising diffusion probability model to obtain a real high-quality output image.
2. The method for analyzing and generating image features based on fast denoising diffusion probability model according to claim 1, wherein in step S2, the original denoising diffusion probability model comprises a forward gaussian diffusion kernel and an inverse gaussian denoising kernel, wherein the forward gaussian diffusion kernel is
Figure FDA0003984497880000011
Forward diffusion process x for picture plus noise implementation 0 ...x t ...x T Wherein->
Figure FDA0003984497880000012
Denotes a Gaussian distribution,. Beta t For a predetermined noise scale, time T =1 0 For noiseless pictures, intermediate state x t For noisy pictures at time t, process end point x T Is a Gaussian white noise picture; the inverse Gaussian noise elimination kernel p θ (x t-1 |x t ) For effecting de-noising in the reverse directionGenerating Process x T ...x t ...x 0 Is concretely provided with
Figure FDA0003984497880000013
Figure FDA0003984497880000014
In the formula, σ t The standard deviation of a Gaussian denoising kernel; i is an identity matrix;
Figure FDA0003984497880000015
f θ (x t and t) represents denoising prediction; e is the same as θ (x t T) is a score predictor; mu.s θ (x t And t) is the mean value.
3. The method for analyzing and generating image features based on fast denoising diffusion probability model according to claim 2, wherein in step S2, the condition generator is
Figure FDA0003984497880000016
Introducing the method into a reverse denoising generation process to obtain a reverse Gaussian denoising kernel introduced into a condition generator:
Figure FDA0003984497880000017
Figure FDA0003984497880000018
in the formula, mu θ,ω (x t T) is the average value after the condition generator is introduced, and u represents the Gaussian noise obtained by sampling.
4. According to claimThe method for analyzing and generating image characteristics based on the rapid denoising diffusion probability model in claim 3 is characterized in that the fractional predictor is epsilon θ The realization of the method is based on a U-net neural network structure, is a symmetrical structure of an encoder and a decoder, and adopts jump connection to share characteristic information at the corresponding stages of the encoder and the decoder; the encoder part of U-net will input the noisy picture x via the concatenated ResBlock t Reducing dimension and extracting multi-scale visual characteristics, and outputting a picture x containing noise according to the multi-scale visual characteristics by a cascade ResBlock of a decoder part t A noise part of (1); coding the time t into a 256-dimensional vector by using a time embedding mode of a Transformer neural network structure, and inputting the 256-dimensional vector into each ResBlock of the U-net network, so that the U-net can depend on the time t explicitly;
condition generator G ω The implementation of the method is based on a U-net neural network structure adopted by a fractional predictor, the U-net neural network structure is a symmetric structure of a coder-decoder, a three-layer full-connection network is additionally adopted to map noise U into 256-dimensional vectors, and the 256-dimensional vectors are input into each ResBlock of the U-net, so that the U-net can also explicitly depend on the noise U; extracting the multi-scale visual features of denoising prediction at one side of the encoder, and outputting the optimized denoising prediction at one side of the decoder according to the multi-scale features;
discriminator D ψ The implementation of (1) is based on a U-net neural network structure, adopts ResBlock depending on time t, and inputs a noise-containing picture x at the current time t t And picture x at the next instant t-1 t-1 And the output part obtains the discrimination score of the real value through the convolution nerve layer.
5. The method for analyzing and generating image features based on fast denoising diffusion probability model as claimed in claim 4, wherein the step S3, training the score predictor by score matching method, comprises the following sub-steps:
(1) Randomly selecting a batch of noiseless pictures from the training data set to be recorded as x 0 ~q(x 0 ) From [1, T ]]Taking a batch of integers of the average sampling as time t-U (t), and taking a batch of Gaussian noise from the standard Gaussian distribution with the same dimension as the picture to be recorded as
Figure FDA0003984497880000021
(2) The variables t, x obtained above 0 Substituting the element into a weightless score matching square error function, and calculating the mean value of the function as an objective function L (theta) of the optimization;
(3) Optimizing a primary neural network parameter theta by back-propagating an objective function L (theta);
(4) And repeating the steps 1) -3) until the objective function converges or the maximum optimization times is reached.
6. The method as claimed in claim 5, wherein the score matching objective function is
Figure FDA0003984497880000022
In the formula (I), the compound is shown in the specification,
Figure FDA0003984497880000023
representing averaging the function according to the variables.
7. The image feature analysis and generation method based on the fast denoising diffusion probability model as claimed in claim 6, wherein the step S3, training the condition generator and the discriminator by using an alternative training method for generating a confrontation network, comprises the following sub-steps:
(1) Matching the trained score with the E θ Freezing as a basis for subsequent processes;
(2) Randomly selecting a batch of noiseless pictures from the training data set to be recorded as x 0 ~q(x 0 ) From [1, T ]]Sampling a batch of integers at medium average as time t-U (t) according to x 0 Conditional distribution q (x) from gaussians t-1 |x 0 ) Middle sampling a batch x t-1 According to x t-1 Diffusion of kernel q (x) from forward Gaussian t |x t-1 ) Middle sampling a batch x t According to x t From inverse Gaussian denoise kernel p after introducing condition generator θ,ω (x t-1 |x t ) X predicted by intermediate sampling t-1
(3) The variables t, x obtained above 0 ,x t-1 ,x t ,x t-1 Substituting into a function on the discriminator parameter ψ that generates a confrontation objective function, calculating the mean of the function as the optimization objective of the discriminator;
(4) Optimizing the primary discriminator parameter ψ by back-propagating an objective function on the discriminator parameter ψ;
(5) The variables t, x obtained above 0 ,x t-1 ,x t ,x t-1 Substituting into a function of a condition generator parameter omega for generating a confrontation objective function, and calculating the mean value of the function as an optimization objective of the condition generator;
(6) Optimizing a primary condition generator parameter omega by back-propagating an objective function with respect to the condition generator parameter omega;
(7) And (4) repeating the steps (2) to (6) until the generation of the confrontation objective function converges or the maximum optimization times is reached.
8. The method as claimed in claim 7, wherein the generation countermeasure objective function of the discriminator parameter ψ and the condition generator parameter ω is
Figure FDA0003984497880000031
Figure FDA0003984497880000032
/>
CN202211560705.XA 2022-12-07 2022-12-07 Image characteristic analysis and generation method based on rapid denoising diffusion probability model Pending CN115908187A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211560705.XA CN115908187A (en) 2022-12-07 2022-12-07 Image characteristic analysis and generation method based on rapid denoising diffusion probability model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211560705.XA CN115908187A (en) 2022-12-07 2022-12-07 Image characteristic analysis and generation method based on rapid denoising diffusion probability model

Publications (1)

Publication Number Publication Date
CN115908187A true CN115908187A (en) 2023-04-04

Family

ID=86494628

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211560705.XA Pending CN115908187A (en) 2022-12-07 2022-12-07 Image characteristic analysis and generation method based on rapid denoising diffusion probability model

Country Status (1)

Country Link
CN (1) CN115908187A (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116310660A (en) * 2023-05-24 2023-06-23 深圳须弥云图空间科技有限公司 Enhanced sample generation method and device
CN116630634A (en) * 2023-05-29 2023-08-22 北京医准智能科技有限公司 Image processing method, device, equipment and storage medium
CN116645260A (en) * 2023-07-27 2023-08-25 中国海洋大学 Digital watermark attack method based on conditional diffusion model
CN116664450A (en) * 2023-07-26 2023-08-29 国网浙江省电力有限公司信息通信分公司 Diffusion model-based image enhancement method, device, equipment and storage medium
CN116664605A (en) * 2023-08-01 2023-08-29 昆明理工大学 Medical image tumor segmentation method based on diffusion model and multi-mode fusion
CN116758098A (en) * 2023-08-07 2023-09-15 哈尔滨工业大学(深圳)(哈尔滨工业大学深圳科技创新研究院) Hypothalamic nucleus segmentation method and model construction method of magnetic resonance image
CN116824499A (en) * 2023-06-28 2023-09-29 北京建筑大学 Insect pest detection method, system, equipment and storage medium based on SWT model
CN117788344A (en) * 2024-02-26 2024-03-29 北京飞渡科技股份有限公司 Building texture image restoration method based on diffusion model

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116310660A (en) * 2023-05-24 2023-06-23 深圳须弥云图空间科技有限公司 Enhanced sample generation method and device
CN116630634A (en) * 2023-05-29 2023-08-22 北京医准智能科技有限公司 Image processing method, device, equipment and storage medium
CN116630634B (en) * 2023-05-29 2024-01-30 浙江医准智能科技有限公司 Image processing method, device, equipment and storage medium
CN116824499A (en) * 2023-06-28 2023-09-29 北京建筑大学 Insect pest detection method, system, equipment and storage medium based on SWT model
CN116664450A (en) * 2023-07-26 2023-08-29 国网浙江省电力有限公司信息通信分公司 Diffusion model-based image enhancement method, device, equipment and storage medium
CN116645260A (en) * 2023-07-27 2023-08-25 中国海洋大学 Digital watermark attack method based on conditional diffusion model
CN116645260B (en) * 2023-07-27 2024-02-02 中国海洋大学 Digital watermark attack method based on conditional diffusion model
CN116664605A (en) * 2023-08-01 2023-08-29 昆明理工大学 Medical image tumor segmentation method based on diffusion model and multi-mode fusion
CN116664605B (en) * 2023-08-01 2023-10-10 昆明理工大学 Medical image tumor segmentation method based on diffusion model and multi-mode fusion
CN116758098A (en) * 2023-08-07 2023-09-15 哈尔滨工业大学(深圳)(哈尔滨工业大学深圳科技创新研究院) Hypothalamic nucleus segmentation method and model construction method of magnetic resonance image
CN117788344A (en) * 2024-02-26 2024-03-29 北京飞渡科技股份有限公司 Building texture image restoration method based on diffusion model
CN117788344B (en) * 2024-02-26 2024-05-07 北京飞渡科技股份有限公司 Building texture image restoration method based on diffusion model

Similar Documents

Publication Publication Date Title
CN115908187A (en) Image characteristic analysis and generation method based on rapid denoising diffusion probability model
CN107273800B (en) Attention mechanism-based motion recognition method for convolutional recurrent neural network
CN110111366B (en) End-to-end optical flow estimation method based on multistage loss
CN109035172B (en) Non-local mean ultrasonic image denoising method based on deep learning
CN113191969A (en) Unsupervised image rain removing method based on attention confrontation generation network
CN110570443B (en) Image linear target extraction method based on structural constraint condition generation model
CN114038055A (en) Image generation method based on contrast learning and generation countermeasure network
CN113240683B (en) Attention mechanism-based lightweight semantic segmentation model construction method
CN113870335A (en) Monocular depth estimation method based on multi-scale feature fusion
CN115393396B (en) Unmanned aerial vehicle target tracking method based on mask pre-training
Niu et al. Effective image restoration for semantic segmentation
CN112950480A (en) Super-resolution reconstruction method integrating multiple receptive fields and dense residual attention
CN112766360A (en) Time sequence classification method and system based on time sequence bidimensionalization and width learning
CN114882368A (en) Non-equilibrium hyperspectral image classification method
CN115775316A (en) Image semantic segmentation method based on multi-scale attention mechanism
CN114821050A (en) Named image segmentation method based on transformer
CN115047423A (en) Comparison learning unsupervised pre-training-fine tuning type radar target identification method
CN111612803B (en) Vehicle image semantic segmentation method based on image definition
Zhang et al. A parallel and serial denoising network
CN111986210B (en) Medical image small focus segmentation method
CN116704585A (en) Face recognition method based on quality perception
CN111598115B (en) SAR image fusion method based on cross cortical neural network model
CN115797827A (en) ViT human body behavior identification method based on double-current network architecture
CN112884773B (en) Target segmentation model based on target attention consistency under background transformation
CN114359786A (en) Lip language identification method based on improved space-time convolutional network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination