CN116957964A

CN116957964A - Small sample image generation method and system based on diffusion model

Info

Publication number: CN116957964A
Application number: CN202310865420.5A
Authority: CN
Inventors: 刘艳霞; 周月; 李宇虹; 赖浩宇
Original assignee: South China University of Technology SCUT
Current assignee: South China University of Technology SCUT
Priority date: 2023-07-13
Filing date: 2023-07-13
Publication date: 2023-10-27

Abstract

The invention discloses a small sample image generation method and a system based on a diffusion model, wherein the method comprises the following steps: acquiring a target image and establishing an auxiliary data set; adding noise to the auxiliary data set for multiple times to obtain a noise image meeting isotropic Gaussian distribution; denoising inference is carried out on the noise image, and the noise of the intermediate hidden code is gradually removed by taking the original image as a training target training model; according to the pixel mean square error between the original image and the generated image and the frequency domain information loss, adjusting model parameters, and learning complex frequency by dynamic weighting of the frequency domain loss; taking a sparse number of samples of the new data set as reference images and respectively extracting semantic and structural features of the samples to assist the denoising model in generating similar images of the sample images. The invention adopts the diffusion model to realize the generation of the small sample image, predicts the generated image from noise, improves the diversity of the generated image, and simultaneously ensures the high quality of the generated image through the guidance of the additional information of the generation process.

Description

Small sample image generation method and system based on diffusion model

Technical Field

The invention relates to the technical field of image generation, in particular to a small sample image generation method and system based on a diffusion model.

Background

Deep learning is currently the leading edge solution in the field of computer vision, but a large amount of high-quality training data is the basic guarantee for deep learning to solve the problem of computer vision. In many professional fields, collecting image data sets is an extremely time-consuming and costly process, and the lack of sufficient image data is a significant problem.

In order to relieve the limit of very limited data volume on the development of deep learning, a data set can be expanded through data enhancement, wherein a method for obtaining similar images by adopting a generation model has more authenticity and reliability. Small sample image generation refers to the task of generating a new image that is realistic and diverse for this class with only a very few images.

Existing small sample image generation methods are mostly based on generation of a countermeasure network (Generative Adversarial Networks, GAN), typically employing fusion-based policies at the image or feature level to generate new images. Matching the random vector with a given real image by MatchingGAN, and mapping the fused features to a new image; f2GAN further improves MatchingGAN through fusion and filling paradigms, but the generation effect is still poor; the LofGAN provides that image features are locally fused at the feature level, and meanwhile, the quality of generated images is further improved by adding local reconstruction loss, reasonable feature fusion is difficult to be carried out on slightly complex images, and meaningless images are easy to generate; the WaveGAN adds low frequency and high frequency hopping connections to the generator, provides more perceptible information to the generator, achieves good generation quality, but is prone to aliasing artifacts. The method is realized based on GAN, and the problems that training parameters are difficult to adjust, the diversity of generated samples is insufficient and the like exist under the condition of less data quantity.

Disclosure of Invention

The invention aims to provide a small sample image generation method and system based on a diffusion model, so as to generate high-quality and rich-diversity similar images through a small number of samples.

The invention is realized at least by one of the following technical schemes.

A small sample image generation method based on a diffusion model comprises the following steps:

acquiring a target image and establishing an auxiliary data set;

adding noise to the auxiliary data set for multiple times to obtain a noise image meeting isotropic Gaussian distribution;

denoising inference is carried out on the noise image, and the noise of the intermediate hidden code is gradually removed by taking the original image as a training target training diffusion model;

according to the pixel mean square error between the original image and the generated image and the frequency domain information loss, adjusting model parameters, and learning complex frequency by dynamic weighting of the frequency domain loss;

taking the samples of the new data set with a rare quantity as reference images and respectively extracting semantic and structural characteristics of the reference images to assist the denoising model to generate similar images of the sample images of the new data set.

Further, the acquiring the target image, establishing the auxiliary data set, includes:

imaging the target by the image acquisition device as an image of the auxiliary dataset;

and performing rotation, brightness adjustment and contrast adjustment on the image of the auxiliary data set, and expanding the auxiliary data set.

Further, the adding noise to the auxiliary data set for multiple times obtains a noise image meeting isotropic Gaussian distribution, and the expression of the noise adding process is as follows:

wherein X is ₀ Is the original image, X _t Is the denoised image, t is the time step, beta _t For the variance used in the t-th noise addition,by beta _t Alpha is calculated _t ＝1-β _t 、/>

Further, the denoising deducing the noise image, taking the original image as a training target training model to gradually remove the noise of the intermediate hidden code, includes:

step-by-step denoising the noise image, wherein the denoising process expression is as follows:

wherein X is _t Is the image after noise addition, X _t-1 Is X _t And (3) obtaining an intermediate result after one denoising, wherein t is a time step, and the dimension of the intermediate result is kept unchanged all the time in the denoising process, and the original image is used as a denoising target of the noise image.

Further, according to the pixel mean square error between the original image and the generated image and the frequency domain information loss, the model parameters are adjusted, and the complex frequency is emphasized and learned through the dynamic weighting of the frequency domain loss, which comprises the following steps:

the mean square error expression is:

wherein X is ₀ Is the original image, X _θ (X _t T) is the image predicted by the model, t is the time step, β _t For the variance used in the t-th noise addition,and->Can pass beta _t The calculation results, the calculation expression is as follows: alpha _t ＝1-β _t 、/>

Further, the frequency domain information loss expression is:

wherein the image size is MxN, X ₀ (u, v) is the spatial frequency value, X, of the original image in the spectrum coordinate system _θ (u, v) is the spatial frequency value of the predicted image in the spectrum coordinate system, w (u, v) is the spectrum weight matrix of the image, the value of w (u, v) is dynamically determined according to the current loss of each frequency in the training process, low weight is given to simple frequencies, high weight is given to complex frequencies, the training gravity center in the frequency domain is biased towards complex frequencies, and the overall objective function L is L _mse And L is equal to _fre And (3) summing.

Further, from the conditional distribution p (X ₀ The image is sampled in C), the sampling process is expressed as:

p _θ (X ₀ |C)＝∫p _θ (X _0：T |C)dX _1：T

wherein C is the sampling condition.

Further, the step of taking the samples of the sparse new data set as reference images and respectively extracting semantic and structural features thereof to assist the denoising model to generate similar images of the sample images comprises the following steps:

adding noise to the reference image for t-1 step, and then up-sampling and down-sampling phi _C Acquisition of perceptual information C of a reference image _ref Through C _ref Instruction image X _t Denoising generated image X _t-1 Generating an image related to the reference image content;

the reference image is subjected to high-pass filtering phi after being noisy by t-1 step _s Acquiring structural information S of a reference image _ref Through S _ref Instruction image X _t Denoising generated image X _t-1 Is generated to be structurally similar to the reference image.

In C _ref And S is _ref As sampling condition C.

Further, the new data set refers to a data set formed by image samples which do not appear in the training set, samples in the new data set are sequentially selected as reference images in the denoising process in the process of sampling images by the diffusion model, and information contained in the reference images is used as denoising condition constraint.

The system for realizing the small sample image generation method based on the diffusion model comprises the following steps:

the first module is used for acquiring a target image and establishing an auxiliary data set;

the second module is used for adding noise to the auxiliary data set for a plurality of times to obtain a noise image meeting isotropic Gaussian distribution;

the third module is used for carrying out denoising inference on the noise image, and gradually removing the noise of the intermediate hidden code by taking the original image as a training target training model;

a fourth module, configured to adjust model parameters according to a pixel mean square error between the original image and the generated image and a frequency domain information loss, and to weight and learn a complex frequency through dynamic weighting of the frequency domain loss;

and a fifth module, taking the samples of the sparse new data set as reference images and extracting semantic and structural features of the reference images respectively to assist the denoising model to generate similar images of the sample images.

Compared with the prior art, the invention has the beneficial effects that: according to the invention, a parameterized Markov chain is learned in a model training stage, so that the problem of mode collapse possibly caused by countermeasure training is avoided, and compared with the generation of a countermeasure network, the parameterized Markov chain is more stable and the parameters are easy to adjust; the noise is taken as an initial sample in the generating process to be step by step denoised, and the content and the structure of the generated image are respectively regulated through the low-frequency information and the high-frequency information of the reference image, so that the high quality and the diversity of the generated image are ensured.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flowchart of a method for generating a small sample image based on a diffusion model according to an embodiment of the present invention;

FIG. 2 is a schematic general structure diagram of a small sample image generating method based on a diffusion model according to an embodiment of the present invention;

fig. 3 is a schematic diagram of hollow domain constraint and frequency domain constraint in an embodiment of the present invention.

Detailed Description

The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.

Example 1

As shown in fig. 1 and 2, the embodiment of the invention discloses a small sample image generation method based on a diffusion model, which comprises the following steps:

s1, acquiring a target image and establishing an auxiliary data set.

The image acquisition equipment (such as a camera) is used for imaging an object with few numbers (such as an industrial part with defects), and the obtained image is subjected to the operation of traditional data enhancement such as rotation, brightness adjustment and the like without changing the image type label to be primarily expanded, and then is used as an auxiliary data set after being sorted according to types.

S2, adding noise to the auxiliary data set for multiple times to obtain a noise image meeting isotropic Gaussian distribution.

And taking the auxiliary data set as a training set, wherein all images in the training set are required to complete the forward process of the diffusion model. The image in the auxiliary dataset is data point X sampled from the real data distribution ₀ ，X ₀ Q (X). In the forward direction of the diffusion model, gaussian noise is added to the sampled samples for T times continuously to obtain a series of noise samples X ₁ ，X ₂ …X _T ，X _T To make a noise on the image obtained after T times, the noise adding step length is calculated by varianceAnd (5) controlling. The expression of the noise adding process is as follows:

wherein X is ₀ Is the original image, X _t Is the denoised image, t is the time step, beta _t For the variance used in the t-th noise addition,can pass beta _t The calculation results, the calculation expression is as follows: alpha _t ＝1-β _t 、/>

X _t Can be represented as X ₀ And the linear combination of the noise e added per time step:

in the noise adding process, along with the increase of t, X _t Gradually losing discernable features and approaching pure noise. When T → infinity, X _T Approximately an isotropic gaussian distribution.

S3, denoising inference is carried out on the noise image, and the noise of the intermediate hidden code is gradually removed by taking the original image as a training target training model.

Denoising inferences are intended to be derived from noisy image X _T Middle restoration original image X ₀ . Direct prediction of the inverse of the diffusion process q (X _t-1 |X _t ) Is difficult, and therefore diffusion model learning parameterized gaussian transforms p _θ (X _t-1 |X _t ). The denoising process and the adding process have the same functional formThis can be expressed as:

mu is added to _θ Represented as X _t And the original image predicted value, the denoising process is as follows:

wherein X is _θ Representing a Unet network with identical input and output dimensions, X _θ The method is used for predicting the generation result of each step in the back diffusion process; z-N (0,I) shows that each generation step has randomness, which is beneficial to the diversification of the generated images.

And S4, adjusting model parameters according to the pixel mean square error between the original image and the generated image and the frequency domain information loss, and emphasizing the learning of the complex frequency through dynamic weighting of the frequency domain loss.

Referring to fig. 3, in the training process of the diffusion model, parameters of the diffusion model are optimized through spatial and frequency constraints between the original real image and the image generated by the diffusion model. Mean square error L of original image and predicted image _mse Limiting pixel loss of generated image in space domain while passing frequency domain information difference L of original image and predicted image _fre The image space is supplemented with frequency information, so that the model can pay attention to the generation effect of the space domain and the frequency domain at the same time.

The mean square error expression is:

The frequency domain information error expression is:

wherein the image size is MxN, X ₀ (u, v) is the spatial frequency value, X, of the original image in the spectrum coordinate system _θ (u, v) is a spatial frequency value of the predicted image in a spectrum coordinate system, wherein w (u, v) is a spectrum weight matrix of the image, the value of w (u, v) is dynamically determined according to the current loss of each frequency in the training process, low weight is given to simple frequencies, high weight is given to complex frequencies, and the training gravity center in the frequency domain is biased towards the complex frequencies.

The overall objective function L is L _mse And L is equal to _fre And (3) summing.

The training process of the diffusion model has been completed so far. The diffusion model learns a parameterized Markov chain in a training stage, so that the problem of mode collapse possibly caused by countermeasure training in the countermeasure model is avoided, and compared with the countermeasure network generated in the training process, the method has the advantages that parameters are more stable and easy to adjust.

S5, taking samples of a sparse number of new data sets as reference images, and respectively extracting semantic and structural features of the reference images to assist the denoising model in generating similar images of the sample images.

A new dataset refers to a dataset of image samples that do not appear in the training set, and the number of samples in the new dataset may be less than hundred. In the process of sampling images by using the diffusion model, sequentially selecting samples in a new data set as reference images in the denoising process, and taking information contained in the reference images as denoising condition constraints.

To generate an image sharing high-level semantics and structural features with a given reference image, one can derive from a conditional distribution p (X ₀ I C) samples the image. The sampling process is expressed as:

p _θ (X ₀ |C)＝∫p _θ (X _0：T |C)dX _1：T

wherein C is the sampling condition.

Adding noise to the reference image for t-1 step, and then up-sampling and down-sampling phi _C Acquisition of perceptual information C of a reference image _ref Through C _ref Instruction image X _t Denoising generated image X _t-1 Generates an image related to the reference image content.

C _ref Can pass the range of the parameter range _C Adjust the range of the noise removal _C Step T, the larger the action range, C _ref The stronger the constraint effect of the generated image, the more similar the information such as the content, the color scheme and the like of the generated image is to the reference image.

S _ref Can pass the range of the parameter range _s Adjust the range of the noise removal _s T step, the larger the action range is, S _ref The stronger the constraint effect of the generated image, the more similar the information such as the overall structure of the generated image, the target contour and the like is to the reference image.

At C _ref And S is equal to _ref Within the range of action of (1), each conversion p in the sampling process _θ (X _t-1 |X _t C) are all affected by the condition C, C being C _ref And S is _ref If the number of denoising steps exceedsC (C) _ref Or S _ref In which the range of one term is C equal to C _ref Or S _ref The sampling process has flexible editability by applying to C _ref And S is _ref The quality and diversity of the generated images can be adjusted by regulation and control.

Example 2

The embodiment provides a system for realizing the small sample image generation method based on the diffusion model, which comprises the following steps:

and a fifth module, configured to take a sparse number of samples of the new dataset as reference images and extract semantic and structural features of the samples, respectively, to assist the denoising model in generating similar images of the sample images.

Example 3

The embodiment provides a device for realizing the small sample image generation system based on the diffusion model, which comprises:

at least one processor;

at least one memory for storing at least one program;

the at least one program, when executed by the at least one processor, causes the at least one processor to implement the method illustrated in fig. 1.

The small sample image generating device based on the diffusion model can execute the small sample image generating method based on the diffusion model, can execute any combination implementation steps of the method embodiment, and has corresponding functions and beneficial effects.

The present embodiment also discloses a computer program product or a computer program comprising computer instructions stored in a computer readable storage medium. The computer instructions may be read from a computer-readable storage medium by a processor of a computer device, and executed by the processor, to cause the computer device to perform the method shown in fig. 1.

The embodiment also provides a storage medium, which stores instructions or a program for executing the small sample image generating method based on the diffusion model shown in fig. 1, and when the instructions or the program are run, any combination of the executable method embodiments implements the steps, and the method has the corresponding functions and beneficial effects.

In some alternative embodiments, the functions/operations noted in the block diagrams may occur out of the order noted in the operational flow diagrams. For example, two blocks shown in succession may in fact be executed substantially concurrently or the blocks may sometimes be executed in the reverse order, depending upon the functionality/acts involved. Furthermore, the embodiments presented and described in the flowcharts of the present invention are provided by way of example in order to provide a more thorough understanding of the technology. The disclosed methods are not limited to the operations and logic flows presented herein. Alternative embodiments are contemplated in which the order of various operations is changed, and in which sub-operations described as part of a larger operation are performed independently.

Furthermore, while the invention is described in the context of functional modules, it should be appreciated that, unless otherwise indicated, one or more of the described functions and/or features may be integrated in a single physical device and/or software module or one or more functions and/or features may be implemented in separate physical devices or software modules. It will also be appreciated that a detailed discussion of the actual implementation of each module is not necessary to an understanding of the present invention. Rather, the actual implementation of the various functional modules in the apparatus disclosed herein will be apparent to those skilled in the art from consideration of their attributes, functions and internal relationships. Accordingly, one of ordinary skill in the art can implement the invention as set forth in the claims without undue experimentation. It is also to be understood that the specific concepts disclosed are merely illustrative and are not intended to be limiting upon the scope of the invention, which is to be defined in the appended claims and their full scope of equivalents.

Claims

1. The small sample image generation method based on the diffusion model is characterized by comprising the following steps of:

acquiring a target image and establishing an auxiliary data set;

2. The method of generating a small sample image based on a diffusion model according to claim 1, wherein the acquiring the target image, creating the auxiliary data set, comprises:

3. The method for generating a small sample image based on a diffusion model according to claim 1, wherein the adding noise to the auxiliary data set for a plurality of times obtains a noise image satisfying an isotropic gaussian distribution, and the noise adding process expression is as follows:

wherein X is ₀ Is the original image, X _t Is the denoised image, t is the time step, beta _t For the variance used in the t-th noise addition,by beta _t And (3) calculating to obtain: alpha _t ＝1-β _t 、/>

4. The method for generating a small sample image based on a diffusion model according to claim 1, wherein the denoising inference is performed on the noise image, the noise of the intermediate hidden code is gradually removed by taking the original image as a training target training model, and the method comprises the following steps:

5. The method for generating a small sample image based on a diffusion model according to claim 1, wherein the model parameters are adjusted according to a pixel mean square error between an original image and a generated image and a frequency domain information loss, a complex frequency is learned by dynamic weighting of the frequency domain loss, and a mean square error expression is:

6. The method for generating a small sample image based on a diffusion model according to claim 1, wherein the model parameters are adjusted according to a pixel mean square error between an original image and a generated image and a frequency domain information loss, a complex frequency is emphasized by dynamic weighting of the frequency domain loss, and the expression of the frequency domain information loss is:

wherein the image size is MxN, X ₀ (u, v) is the spatial frequency value, X, of the original image in the spectrum coordinate system _θ (u, v) is the spatial frequency value of the predicted image in the spectrum coordinate system, w (u, v) is the spectrum weight matrix of the image, and the value of w (u, v) is dynamically determined according to the current loss of each frequency in the training process, forSimple frequencies give low weights, complex frequencies give high weights, bias the training center of gravity in the frequency domain towards complex frequencies,

7. The method for generating a small sample image based on a diffusion model according to claim 1, wherein the image is generated from a conditional distribution p (X ₀ The image is sampled in C), the sampling process is expressed as:

p _θ (X ₀ |C)＝∫p _θ (X _0：T |C)dX _1：T

wherein C is the sampling condition.

8. The method for generating a small sample image based on a diffusion model according to claim 1, wherein the steps of taking a sparse number of samples of the new data set as reference images and extracting semantic and structural features thereof respectively to assist the denoising model in generating a homogeneous image of the sample image comprise:

the reference image is subjected to high-pass filtering phi after being noisy by t-1 step _s Acquiring structural information S of a reference image _ref Through S _ref Instruction image X _t Denoising generated image X _t-1 Generating an image that is structurally similar to the reference image;

in C _ref And S is _ref As sampling condition C.

9. The method for generating small sample images based on diffusion models according to claim 1, wherein the new data set refers to a data set formed by image samples which do not appear in a training set, samples in the new data set are sequentially selected as reference images in a denoising process in the process of sampling images by the diffusion models, and information contained in the reference images is used as a denoising condition constraint.

10. A system for implementing a diffusion model-based small sample image generation method of claim 1, comprising: