CN115908187A - Image characteristic analysis and generation method based on rapid denoising diffusion probability model - Google Patents
Image characteristic analysis and generation method based on rapid denoising diffusion probability model Download PDFInfo
- Publication number
- CN115908187A CN115908187A CN202211560705.XA CN202211560705A CN115908187A CN 115908187 A CN115908187 A CN 115908187A CN 202211560705 A CN202211560705 A CN 202211560705A CN 115908187 A CN115908187 A CN 115908187A
- Authority
- CN
- China
- Prior art keywords
- denoising
- probability model
- diffusion
- condition generator
- gaussian
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Image Analysis (AREA)
Abstract
The invention discloses an image characteristic analysis and generation method based on a rapid denoising diffusion probability model, which comprises the following steps: preprocessing an image generation data set to obtain a training data set; constructing an original denoising diffusion probability model, wherein the original denoising diffusion probability model comprises a fractional predictor, and introducing a condition generator and a corresponding discriminator to obtain a rapid denoising diffusion probability model; training a score predictor by a score matching method, freezing the trained score predictor, and training a condition generator and a discriminator by adopting an alternative training method for generating confrontation; and updating the trained fraction predictor and the trained condition generator to a rapid denoising and diffusing probability model, and inputting a Gaussian white noise picture into the rapid denoising and diffusing probability model to obtain a real high-quality output image. The method can ensure high-quality generating capacity and obviously improve the efficiency of generating samples.
Description
Technical Field
The invention belongs to the field of image feature analysis and generation of computer vision, and relates to an image feature analysis and generation method based on a rapid denoising diffusion probability model.
Background
With the advent of modern deep learning, dramatic efforts have been made in the field of computer vision. With the birth of the convolutional neural network and the invention of the residual error network structure, the effect of deep learning on the tasks of classification, detection, segmentation and the like of pictures with various scales is far better than that of the traditional method. In the field of image generation, a depth generation model such as a countermeasure network or a variation self-encoder is generated, and a high-resolution picture can be generated. However, the generation effect is still limited by the too simple implicit generation model. Recently, the denoising diffusion probability model can produce higher-quality samples, but a Markov generation process with the length close to 1000 steps needs to be constructed, so that high time and calculation cost are needed.
Disclosure of Invention
The invention aims to solve the problems that the generation quality of an implicit generation model is restricted and the sampling efficiency of a denoising diffusion probability model is low in the conventional depth generation model. The invention provides an image characteristic analysis and generation method based on a rapid denoising diffusion probability model, which introduces a condition generator and a corresponding discriminator model on the basis of an original denoising diffusion probability model, trains a supplementary condition generator by using a generation countermeasure mode, and enables a combined model to realize the image generation effect with high quality and high efficiency.
The technical scheme adopted by the invention for solving the technical problem is as follows:
an image feature analysis and generation method based on a fast denoising diffusion probability model comprises the following steps:
s1: preprocessing the image generation data set to obtain a training data set;
s2: constructing an original denoising diffusion probability model, wherein the original denoising diffusion probability model comprises a fraction predictor, and introducing a condition generator and a corresponding discriminator to obtain a rapid denoising diffusion probability model;
s3: training a score predictor by a score matching method, freezing the trained score predictor, and training a condition generator and a discriminator by adopting an alternative training method for generating confrontation;
s4: and updating the trained fraction predictor and the trained condition generator to a rapid denoising and diffusing probability model, and inputting a Gaussian white noise picture into the rapid denoising and diffusing probability model to obtain a real high-quality output image.
Further, in the step S2, the original denoising diffusion probability model includes a forward gaussian diffusion kernel and a reverse gaussian denoising kernel, where the forward gaussian diffusion kernel isForward diffusion process x for picture plus noise implementation 0 ...x t ...x T Wherein->Denotes a Gaussian distribution,. Beta t For a predetermined noise scale, time T =1 0 For noiseless pictures, intermediate state x t For noisy pictures at time t, process end point x T Is a Gaussian white noise picture; the inverse Gaussian noise elimination kernel p θ (x t-1 |x t ) De-noising generation process x for implementing in reverse direction T ...x t ...x 0 Is concretely provided with
In the formula, σ t The standard deviation of a Gaussian denoising kernel; i is an identity matrix;f θ (x t t) represents a denoised prediction; e is the same as θ (x t T) is a score predictor;μ θ (x t And t) is the mean value.
Further, in the step S2, the condition generator isIntroducing the method into a reverse denoising generation process to obtain a reverse Gaussian denoising kernel introduced into a condition generator:
in the formula, mu θ,ω (x t T) is the average value after the condition generator is introduced, and u represents the Gaussian noise obtained by sampling.
Further, the fraction predictor e θ The realization of the method is based on a U-net neural network structure, is a symmetrical structure of an encoder and a decoder, and adopts jump connection to share characteristic information at the corresponding stages of the encoder and the decoder; the encoder part of U-net will input the noisy picture x via the concatenated ResBlock t Reducing dimension and extracting multi-scale visual characteristics, and outputting a picture x containing noise according to the multi-scale visual characteristics by a cascade ResBlock of a decoder part t A noise part of (1); coding the time t into a 256-dimensional vector by using a time embedding mode of a Transformer neural network structure, and inputting the 256-dimensional vector into each ResBlock of the U-net network, so that the U-net can depend on the time t explicitly;
condition generator G ω The realization of the method is based on a U-net neural network structure adopted by a fractional predictor, which is a symmetrical structure of a coder-decoder, and a three-layer full-connection network is additionally adopted to map the noise U into 256-dimensional vectors which are input into each ResBlock of the U-net, so that the U-net can also explicitly depend on the noise U; extracting the multi-scale visual features of denoising prediction at one side of the encoder, and outputting the optimized denoising prediction at one side of the decoder according to the multi-scale features;
discriminator D ψ The implementation of (1) is based on a U-net neural network structure, adopts ResBlock depending on time t, and inputs a noise-containing picture x at the current time t t And picture x at the next instant t-1 t-1 And the output part obtains the discrimination score of the real value through the convolution nerve layer.
Further, the step S3 of training the score predictor by the score matching method includes the following sub-steps:
(1) Randomly selecting a batch of noiseless pictures from the training data set to be recorded as x 0 ~q(x 0 ) From [1, T ]]Taking a batch of integers of the average sampling as time t-U (t), and taking a batch of Gaussian noise from the standard Gaussian distribution with the same dimension as the picture to be recorded as
(2) The variables t, x obtained above 0 Substituting the element into a weightless score matching square error function, and calculating the mean value of the function as an objective function L (theta) of the optimization;
(3) Optimizing a primary neural network parameter theta by back-propagating an objective function L (theta);
(4) And repeating the steps 1) -3) until the objective function converges or the maximum optimization times is reached.
6. The method as claimed in claim 5, wherein the score matching objective function is
In the formula (I), the compound is shown in the specification,representing averaging the function according to the variables. />
Further, in the step S3, the training condition generator and the discriminator are trained by an alternative training method for generating the countermeasure network, and the method includes the following substeps:
(1) Will be trained wellIs the fraction matcher e θ Freezing as a basis for subsequent processes;
(2) Randomly selecting a batch of noiseless pictures from the training data set to be recorded as x 0 ~q(x 0 ) From [1, T ]]Sampling a batch of integers at medium average as time t-U (t) according to x 0 Conditional distribution q (x) from gaussians t-1 |x 0 ) Middle sampling a batch x t-1 According to x t-1 Diffusion of kernel q (x) from forward Gaussian t |x t-1 ) Middle sampling a batch x t According to x t From inverse Gaussian denoised kernel p after introducing conditional generator θ,ω (x t-1 |x t ) X predicted by intermediate sampling t-1 ;
(3) The variables t, x obtained above 0 ,x t-1 ,x t ,x t-1 Substituting into a function on the discriminator parameter ψ that generates a confrontation target function, calculating the mean of the function as the optimization target of the discriminator;
(4) Optimizing the primary discriminator parameter ψ by back-propagating an objective function on the discriminator parameter ψ;
(5) The variables t, x obtained above 0 ,x t-1 ,x t ,x t-1 Substituting into a function of a condition generator parameter omega for generating a confrontation objective function, and calculating the mean value of the function as an optimization objective of the condition generator;
(6) Optimizing a primary condition generator parameter omega by back-propagating an objective function with respect to the condition generator parameter omega;
(7) And (4) repeating the steps (2) to (6) until the generation of the confrontation objective function converges or the maximum optimization times is reached.
Further, the generation countermeasure objective function of the discriminator parameter ψ and the condition generator parameter ω is
Compared with the prior art, the invention has the beneficial effects that:
(1) The method is based on a denoising diffusion probability model, and has stronger image generation capacity than common generation models such as a generation countermeasure network and a variational self-encoder;
(2) Aiming at the problems of long sampling time and low efficiency of a denoising diffusion probability model in image generation, a condition generator is introduced as an improvement, so that the provided fast denoising diffusion probability model keeps the high-quality generation capability of the denoising diffusion probability model and greatly improves the sampling generation efficiency;
(3) The rapid denoising diffusion probability model has the capability of analyzing the characteristics of a noise-containing image in the process of generating the image, and can achieve the purpose of controlling the generated characteristics by adjusting the characteristics of the noise.
Drawings
FIG. 1 is a schematic diagram of generation confrontation training of a fast denoising diffusion probability model;
FIG. 2 is a CIFAR10 picture randomly generated by a fast denoising diffusion probability model.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
The invention provides an image characteristic analysis and generation method based on a rapid denoising diffusion probability model, which comprises the following steps:
step 1: preprocessing data;
and carrying out preprocessing operations such as size resetting, normalization, horizontal turning and the like on the image generation data set by using a deep learning framework to obtain a training data set suitable for training a neural network model. Wherein the image generation dataset is selected from a collected dataset or a socially published image dataset, such as CIFAR10 and CelebA; the data set is scaled to 32 x 32, 64 x 64 or 128 x 128 resolution using a deep learning toolkit, then the data of the data set is normalized to [ -1,1], and finally random horizontal flipping is performed on the data set samples as image pre-processing.
Specifically, in this embodiment, a public data set CIFAR10 is selected as an example, the pictures in the CIFAR10 are scaled to 32 × 32 resolution, each pixel is normalized to [ -1,1], and a random horizontal flipping preprocessing is performed to obtain a training data set suitable for training a neural network model.
Step 2: theoretical construction of a rapid denoising diffusion probability model;
in theory, the original denoising diffusion probability model constructs a trainable random process to learn how to denoise the noise step by step into data samples. Specifically, the original model first constructs a forward diffusion process x with the length of T and continuously adding noise to the data picture 0 ...x t ...x T Wherein the starting point x of the diffusion process 0 For noiseless pictures in the data set, intermediate state x t For the random variable of the noisy picture after adding noise, the end point x of the diffusion process T Close to gaussian white noise. The forward Gaussian diffusion kernel constituting the forward process isWherein->Denotes a Gaussian distribution,. Beta t T = 1.. And T is a preset noise scale. The forward gaussian diffusion kernel describes the property of the forward process to add noise continuously. From the Gaussian nature of the forward process, the conditional posterior distribution->Wherein +>This means that we can get noisy samples at time t by means of direct sampling, e.g.Where the gaussian noise e can be sampled. Then, an original denoising diffusion probability model constructs a denoising generation process x along reverse time T ...x t ...x 0 . The denoising process is based on a trainable reverse denoising kernel x t-1 ~p θ (x t-1 |x t ) Composition of, de-noising kernel p θ (x t-1 |x t ) Has the following Gaussian form
In the formula, the mean value μ θ (x t T) is the current state x t And denoising prediction f θ (x t T) and de-noising prediction can be performed by a fractional predictor e θ (x t And t) is represented bySigma in the formula t The standard deviation of the Gaussian denoised kernel is set to be beta t . In de-noising kernels, the fraction predictor e θ (x t T) by analyzing the noisy picture x t Predicting the noisy variable x t Containing noise e t And then f is caused to θ (x t T) capable of predicting a denoised sample x 0 。
The original denoising diffusion probability model usually needs 1000 steps of denoising generation process to obtain a high-quality sample, and if the denoising generation process is reduced to 4 steps, the generated sample is fuzzy and has low quality. We find that the reason is that in a 4-step-only denoising process, the fractional predictor e θ (x t T) the noisy variable x cannot be predicted well t Containing noise e t So that the denoising prediction f θ (x t T) poor quality, resulting in failure to use only 4 stepsThe noise process can obtain a high-quality de-noised sample x 0 . Therefore, the present invention predicts the denoising of f θ (x t T) input condition generator(where u is the sampled Gaussian noise with a ratio variable x t Lower dimension, usually chosen as 100 dimension), in denoising prediction f θ (x t And t) obtaining higher-quality and more real denoising prediction on the basis of the image characteristics contained in the t), and further more accurately predicting the mean value. This corresponds to an inverse gaussian denoising kernel after the generator is introduced as follows: />
In the formula, mu θ,ω (x t T) is the mean value after introducing the condition generator, and the theoretical prediction is more than the original mean value mu θ (x t And t) is more accurate.
And step 3: a rapid denoising diffusion probability model is realized through a deep learning framework;
original denoising diffusion probability model utilizes fraction predictor epsilon θ The invention realizes a trainable generation and denoising process, and the rapid denoising diffusion probability model provided by the invention newly introduces a condition generator G ω Constructing a trainable fast generation denoising process and introducing a corresponding discriminator D ψ To train condition generator G ω . In implementation, the present invention employs a neural network to represent the fractional predictor ε θ Condition generator G ω And a discriminator D ψ And constructing a corresponding neural network module according to the resolution or the scale of the data set.
Specifically, the score predictor ∈ θ Based on a U-net neural network structure, is a symmetric structure of an encoder-decoder, and is realized in the pairs of the encoder and the decoderAnd carrying out characteristic information sharing by adopting the jump connection at an application stage. The encoder part of the U-net will input noisy picture x by means of a concatenated residual block (ResBlock) t Reducing dimension and extracting multi-scale visual features, and a cascade ResBlock of a decoder part outputs a picture x containing noise according to the multi-scale visual features t Of the noise section. But unlike ordinary U-net, the input of the fractional predictor also contains time t, so that the time t is encoded into a 256-dimensional vector by using the time embedding mode of a Transformer neural network structure and is input into each ResBlock of the U-net network, so that the U-net can depend on the time t explicitly.
Condition generator G ω The implementation of (a) employs a similar U-net structure as the fractional predictor, but additionally adds random noise U as input. We therefore map the noise U into a 256-dimensional vector with an additional three-layer fully-connected network and input it into each ResBlock of U-net so that U-net can explicitly rely on the noise U. Condition generator G ω Denoising prediction f obtained by fraction predictor θ As an input, the encoder side extracts the multi-scale visual features of the denoising prediction, and the decoder side outputs higher-quality and more accurate denoising prediction according to the multi-scale features.
Discriminator D ψ Is implemented similar to the encoder portion of the fractional predictor U-net, with ResBlock being dependent on time t. But input as the noisy picture x at the current time t t And picture x at the next instant t-1 t-1 I.e. function D ψ (x t-1 ,x t T). We will x t-1 And x t The input and output parts are connected together to form a neural network input with the channel number of 6, and the real-value discrimination scoring is obtained by the convolution neural layer.
And 4, step 4: training a rapid denoising diffusion probability model;
the training of the fast denoising diffusion probability model is divided into two stages. Fractional predictor network E of primary denoising diffusion probability model trained in first stage θ Using the score matching training process
1) Randomly selecting a batch of noiseless pictures from the training data set to be recorded as x 0 ~q(x 0 ),From [1, T ]]Taking a batch of integers of the average sampling as time t-U (t), and taking a batch of Gaussian noise from the standard Gaussian distribution with the same dimension as the picture to be recorded as
2) The variables t, x obtained above 0 Substituting the epsilon into a weightless score matching square error function, and calculating the mean value of the function to be used as an objective function L (theta) of the optimization;
3) Optimizing a primary neural network parameter theta by back-propagating an objective function L (theta);
4) And repeating the steps 1) -3) until the objective function converges or the maximum optimization times is reached.
Wherein the weightless score matching objective function is specifically
Expectation in the formulaThe operation of averaging the function according to the variables in step 2) is shown.
The second stage is in the matching unit epsilon of the freezing training θ Under the condition, training a condition generator network G introduced by a rapid denoising diffusion probability model ω And arbiter network D ψ An alternating training process is employed to generate the countermeasures as follows
1) Matching the first stage trained score matcher epsilon θ Freezing as a basis for subsequent processes;
2) Randomly selecting a batch of noiseless pictures from the training data set to be recorded as x 0 ~q(x 0 ) From [1, T ]]Sampling a batch of integers at medium average as time t-U (t) according to x 0 Conditional distribution q (x) from gaussians t-1 |x 0 ) Middle sampling a batch x t-1 According to x t-1 Diffusion of kernel q (x) from forward Gaussian t |x t-1 ) Middle sampling a batch x t According to x t From introduction toInverse Gaussian denoising kernel p after condition generator θ,ω (x t-1 |x t ) X predicted by intermediate sampling t-1 ;
3) The variables t, x obtained above 0 ,x t-1 ,x t ,x t-1 Substituting into a function on the discriminator parameter ψ that generates a confrontation objective function, calculating the mean of the function as the optimization objective of the discriminator;
4) Optimizing the primary discriminator parameter ψ by back-propagating an objective function on the discriminator parameter ψ;
5) The variables t, x obtained above 0 ,x t-1 ,x t ,x t-1 Substituting into a function of a condition generator parameter omega for generating a confrontation objective function, and calculating the mean value of the function as an optimization objective of the condition generator;
6) Optimizing the primary condition generator parameter omega by back-propagating an objective function with respect to the condition generator parameter omega;
7) Repeating the steps 2) -6) until the generation of the confrontation objective function converges or the maximum optimization times is reached.
Wherein the generation countermeasure objective function with respect to the discriminator parameter ψ and the condition generator parameter ω is
The correlation probability distribution in the formula is Gaussian or even distribution, and direct sampling can be realized, see step 2. Intuitively, as shown in FIG. 1, we will refer to the current state x t Input frozen fraction predictor e θ From the calculation, a coarse denoising prediction x' o Then the roughness is predicted to x' o Input condition generator G ω Obtaining a refined denoising prediction x' 0 ', then inputting a new inverse Gaussian noise-removing kernel p θ,ω (x t-1 |x t ) Obtaining predicted x t-1 . We introduce a discriminator D ψ To guide the condition generator G ω Training the arbiter to resolve true x t-1 And predicted x t-1 Then training the condition generator to give better de-noising prediction and then predicting x t-1 With true x t-1 And (4) approaching.
And 5: generating a new sample by using a rapid denoising diffusion probability model;
the fraction predictor which is trained in the first stage belongs to θ And a second-stage trained condition generator G ω Substituting a new inverse Gaussian noise-removal kernel p θ,ω (x t-1 |x t ) And forming a rapid denoising generation process:
in the formula, p (x) T ) Represents a standard Gaussian distributionp θ (x 0:T ) Representing the joint distribution of the process.
At each denoising step t of the new generation denoising process, the fraction predictor is epsilon θ For analysing noisy pictures x t Giving a preliminary de-noising prediction x 'from the features' o The preliminary denoising prediction quality is poor. Next condition generator G ω Predicting x 'with poor denoising' o Conversion to high quality denoised prediction x' 0 ', such that the kernel p is denoised θ,ω (x t-1 |x t ) With a more accurate mean. Random Gaussian noise can be converted into a real and high-quality image sample through the generation and denoising process, and the denoising step number is reduced to T =4.
Comparative example
We used IS and FID as quality of production indicators on the CIFAR10 dataset. The fast denoising diffusion probability model provided by the invention has an IS score of 8.51 and an FID score of 11.41 on CIFAR10, which IS superior to a recently proposed baseline model. The comparative effect on CIFAR10 is shown in table 1. The randomly generated picture is shown in fig. 2.
TABLE 1 CIFAR10 generated task effect comparison
The comparative generative models were:
NCSN model (see Generation modeling by modeling gradients of the data distribution, advances in Neural Information Processing Systems, 2019.)
DiffusebaE model (see Diffusebae: effect, controllable and high-fidelity generation from low-dimensional relationships. Arxiv, abs/2201.00308, 2022.)
AutoGAN model (see Autogan: new architecture search for genetic additive network. Proceedings of the IEEE International Conference on Computer Vision, pp.3223-3233, 2019.)
SNGAN model (see Spectral normalization for genetic additive networks in International Conference on Learning responses, 2018.)
Glow model (see Glow: genetic flow with interactive 1X1 volumes in Advances in neural information processing systems, 2018.)
PixelCNN model (see Pixel recovery neural networks. International Conference on Machine Learning, 2016.)
NVAE model (see NVAE: A deep scientific automatic encoder in Advances in neural information processing systems, 2020.)
IGEBM model (see Implicitl generation and modification with energy based modifications in Advances in Neural Information Processing Systems, 2019.)
VAEBM model (see Vaebm: A systematic between variant automatic codes and energy-based models in International Conference on Learning retrieval, 2021.)
Claims (8)
1. An image feature analysis and generation method based on a rapid denoising diffusion probability model is characterized by comprising the following steps:
s1: preprocessing an image generation data set to obtain a training data set;
s2: constructing an original denoising diffusion probability model, wherein the original denoising diffusion probability model comprises a fraction predictor, and introducing a condition generator and a corresponding discriminator to obtain a rapid denoising diffusion probability model;
s3: training a score predictor by a score matching method, freezing the trained score predictor, and training a condition generator and a discriminator by adopting an alternative training method for generating countermeasures;
s4: and updating the trained fractional predictor and the trained condition generator to the rapid denoising diffusion probability model, and inputting a Gaussian white noise picture into the rapid denoising diffusion probability model to obtain a real high-quality output image.
2. The method for analyzing and generating image features based on fast denoising diffusion probability model according to claim 1, wherein in step S2, the original denoising diffusion probability model comprises a forward gaussian diffusion kernel and an inverse gaussian denoising kernel, wherein the forward gaussian diffusion kernel isForward diffusion process x for picture plus noise implementation 0 ...x t ...x T Wherein->Denotes a Gaussian distribution,. Beta t For a predetermined noise scale, time T =1 0 For noiseless pictures, intermediate state x t For noisy pictures at time t, process end point x T Is a Gaussian white noise picture; the inverse Gaussian noise elimination kernel p θ (x t-1 |x t ) For effecting de-noising in the reverse directionGenerating Process x T ...x t ...x 0 Is concretely provided with
3. The method for analyzing and generating image features based on fast denoising diffusion probability model according to claim 2, wherein in step S2, the condition generator isIntroducing the method into a reverse denoising generation process to obtain a reverse Gaussian denoising kernel introduced into a condition generator:
in the formula, mu θ,ω (x t T) is the average value after the condition generator is introduced, and u represents the Gaussian noise obtained by sampling.
4. According to claimThe method for analyzing and generating image characteristics based on the rapid denoising diffusion probability model in claim 3 is characterized in that the fractional predictor is epsilon θ The realization of the method is based on a U-net neural network structure, is a symmetrical structure of an encoder and a decoder, and adopts jump connection to share characteristic information at the corresponding stages of the encoder and the decoder; the encoder part of U-net will input the noisy picture x via the concatenated ResBlock t Reducing dimension and extracting multi-scale visual characteristics, and outputting a picture x containing noise according to the multi-scale visual characteristics by a cascade ResBlock of a decoder part t A noise part of (1); coding the time t into a 256-dimensional vector by using a time embedding mode of a Transformer neural network structure, and inputting the 256-dimensional vector into each ResBlock of the U-net network, so that the U-net can depend on the time t explicitly;
condition generator G ω The implementation of the method is based on a U-net neural network structure adopted by a fractional predictor, the U-net neural network structure is a symmetric structure of a coder-decoder, a three-layer full-connection network is additionally adopted to map noise U into 256-dimensional vectors, and the 256-dimensional vectors are input into each ResBlock of the U-net, so that the U-net can also explicitly depend on the noise U; extracting the multi-scale visual features of denoising prediction at one side of the encoder, and outputting the optimized denoising prediction at one side of the decoder according to the multi-scale features;
discriminator D ψ The implementation of (1) is based on a U-net neural network structure, adopts ResBlock depending on time t, and inputs a noise-containing picture x at the current time t t And picture x at the next instant t-1 t-1 And the output part obtains the discrimination score of the real value through the convolution nerve layer.
5. The method for analyzing and generating image features based on fast denoising diffusion probability model as claimed in claim 4, wherein the step S3, training the score predictor by score matching method, comprises the following sub-steps:
(1) Randomly selecting a batch of noiseless pictures from the training data set to be recorded as x 0 ~q(x 0 ) From [1, T ]]Taking a batch of integers of the average sampling as time t-U (t), and taking a batch of Gaussian noise from the standard Gaussian distribution with the same dimension as the picture to be recorded as
(2) The variables t, x obtained above 0 Substituting the element into a weightless score matching square error function, and calculating the mean value of the function as an objective function L (theta) of the optimization;
(3) Optimizing a primary neural network parameter theta by back-propagating an objective function L (theta);
(4) And repeating the steps 1) -3) until the objective function converges or the maximum optimization times is reached.
7. The image feature analysis and generation method based on the fast denoising diffusion probability model as claimed in claim 6, wherein the step S3, training the condition generator and the discriminator by using an alternative training method for generating a confrontation network, comprises the following sub-steps:
(1) Matching the trained score with the E θ Freezing as a basis for subsequent processes;
(2) Randomly selecting a batch of noiseless pictures from the training data set to be recorded as x 0 ~q(x 0 ) From [1, T ]]Sampling a batch of integers at medium average as time t-U (t) according to x 0 Conditional distribution q (x) from gaussians t-1 |x 0 ) Middle sampling a batch x t-1 According to x t-1 Diffusion of kernel q (x) from forward Gaussian t |x t-1 ) Middle sampling a batch x t According to x t From inverse Gaussian denoise kernel p after introducing condition generator θ,ω (x t-1 |x t ) X predicted by intermediate sampling t-1 ;
(3) The variables t, x obtained above 0 ,x t-1 ,x t ,x t-1 Substituting into a function on the discriminator parameter ψ that generates a confrontation objective function, calculating the mean of the function as the optimization objective of the discriminator;
(4) Optimizing the primary discriminator parameter ψ by back-propagating an objective function on the discriminator parameter ψ;
(5) The variables t, x obtained above 0 ,x t-1 ,x t ,x t-1 Substituting into a function of a condition generator parameter omega for generating a confrontation objective function, and calculating the mean value of the function as an optimization objective of the condition generator;
(6) Optimizing a primary condition generator parameter omega by back-propagating an objective function with respect to the condition generator parameter omega;
(7) And (4) repeating the steps (2) to (6) until the generation of the confrontation objective function converges or the maximum optimization times is reached.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211560705.XA CN115908187A (en) | 2022-12-07 | 2022-12-07 | Image characteristic analysis and generation method based on rapid denoising diffusion probability model |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211560705.XA CN115908187A (en) | 2022-12-07 | 2022-12-07 | Image characteristic analysis and generation method based on rapid denoising diffusion probability model |
Publications (1)
Publication Number | Publication Date |
---|---|
CN115908187A true CN115908187A (en) | 2023-04-04 |
Family
ID=86494628
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211560705.XA Pending CN115908187A (en) | 2022-12-07 | 2022-12-07 | Image characteristic analysis and generation method based on rapid denoising diffusion probability model |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115908187A (en) |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116310660A (en) * | 2023-05-24 | 2023-06-23 | 深圳须弥云图空间科技有限公司 | Enhanced sample generation method and device |
CN116630634A (en) * | 2023-05-29 | 2023-08-22 | 北京医准智能科技有限公司 | Image processing method, device, equipment and storage medium |
CN116645260A (en) * | 2023-07-27 | 2023-08-25 | 中国海洋大学 | Digital watermark attack method based on conditional diffusion model |
CN116664450A (en) * | 2023-07-26 | 2023-08-29 | 国网浙江省电力有限公司信息通信分公司 | Diffusion model-based image enhancement method, device, equipment and storage medium |
CN116664605A (en) * | 2023-08-01 | 2023-08-29 | 昆明理工大学 | Medical image tumor segmentation method based on diffusion model and multi-mode fusion |
CN116758098A (en) * | 2023-08-07 | 2023-09-15 | 哈尔滨工业大学(深圳)(哈尔滨工业大学深圳科技创新研究院) | Hypothalamic nucleus segmentation method and model construction method of magnetic resonance image |
CN116824499A (en) * | 2023-06-28 | 2023-09-29 | 北京建筑大学 | Insect pest detection method, system, equipment and storage medium based on SWT model |
CN117788344A (en) * | 2024-02-26 | 2024-03-29 | 北京飞渡科技股份有限公司 | Building texture image restoration method based on diffusion model |
-
2022
- 2022-12-07 CN CN202211560705.XA patent/CN115908187A/en active Pending
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116310660A (en) * | 2023-05-24 | 2023-06-23 | 深圳须弥云图空间科技有限公司 | Enhanced sample generation method and device |
CN116630634A (en) * | 2023-05-29 | 2023-08-22 | 北京医准智能科技有限公司 | Image processing method, device, equipment and storage medium |
CN116630634B (en) * | 2023-05-29 | 2024-01-30 | 浙江医准智能科技有限公司 | Image processing method, device, equipment and storage medium |
CN116824499A (en) * | 2023-06-28 | 2023-09-29 | 北京建筑大学 | Insect pest detection method, system, equipment and storage medium based on SWT model |
CN116664450A (en) * | 2023-07-26 | 2023-08-29 | 国网浙江省电力有限公司信息通信分公司 | Diffusion model-based image enhancement method, device, equipment and storage medium |
CN116645260A (en) * | 2023-07-27 | 2023-08-25 | 中国海洋大学 | Digital watermark attack method based on conditional diffusion model |
CN116645260B (en) * | 2023-07-27 | 2024-02-02 | 中国海洋大学 | Digital watermark attack method based on conditional diffusion model |
CN116664605A (en) * | 2023-08-01 | 2023-08-29 | 昆明理工大学 | Medical image tumor segmentation method based on diffusion model and multi-mode fusion |
CN116664605B (en) * | 2023-08-01 | 2023-10-10 | 昆明理工大学 | Medical image tumor segmentation method based on diffusion model and multi-mode fusion |
CN116758098A (en) * | 2023-08-07 | 2023-09-15 | 哈尔滨工业大学(深圳)(哈尔滨工业大学深圳科技创新研究院) | Hypothalamic nucleus segmentation method and model construction method of magnetic resonance image |
CN117788344A (en) * | 2024-02-26 | 2024-03-29 | 北京飞渡科技股份有限公司 | Building texture image restoration method based on diffusion model |
CN117788344B (en) * | 2024-02-26 | 2024-05-07 | 北京飞渡科技股份有限公司 | Building texture image restoration method based on diffusion model |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN115908187A (en) | Image characteristic analysis and generation method based on rapid denoising diffusion probability model | |
CN107273800B (en) | Attention mechanism-based motion recognition method for convolutional recurrent neural network | |
CN110111366B (en) | End-to-end optical flow estimation method based on multistage loss | |
CN109035172B (en) | Non-local mean ultrasonic image denoising method based on deep learning | |
CN113191969A (en) | Unsupervised image rain removing method based on attention confrontation generation network | |
CN110570443B (en) | Image linear target extraction method based on structural constraint condition generation model | |
CN114038055A (en) | Image generation method based on contrast learning and generation countermeasure network | |
CN113240683B (en) | Attention mechanism-based lightweight semantic segmentation model construction method | |
CN113870335A (en) | Monocular depth estimation method based on multi-scale feature fusion | |
CN115393396B (en) | Unmanned aerial vehicle target tracking method based on mask pre-training | |
Niu et al. | Effective image restoration for semantic segmentation | |
CN112950480A (en) | Super-resolution reconstruction method integrating multiple receptive fields and dense residual attention | |
CN112766360A (en) | Time sequence classification method and system based on time sequence bidimensionalization and width learning | |
CN114882368A (en) | Non-equilibrium hyperspectral image classification method | |
CN115775316A (en) | Image semantic segmentation method based on multi-scale attention mechanism | |
CN114821050A (en) | Named image segmentation method based on transformer | |
CN115047423A (en) | Comparison learning unsupervised pre-training-fine tuning type radar target identification method | |
CN111612803B (en) | Vehicle image semantic segmentation method based on image definition | |
Zhang et al. | A parallel and serial denoising network | |
CN111986210B (en) | Medical image small focus segmentation method | |
CN116704585A (en) | Face recognition method based on quality perception | |
CN111598115B (en) | SAR image fusion method based on cross cortical neural network model | |
CN115797827A (en) | ViT human body behavior identification method based on double-current network architecture | |
CN112884773B (en) | Target segmentation model based on target attention consistency under background transformation | |
CN114359786A (en) | Lip language identification method based on improved space-time convolutional network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |