CN116862902A

CN116862902A - Method for generating defects based on DreamBooth fine-tuning Stable Diffusion model

Info

Publication number: CN116862902A
Application number: CN202310949827.6A
Authority: CN
Inventors: 陈宇; 翁浩东
Original assignee: Xiamen Weitu Software Technology Co ltd
Current assignee: Xiamen Weitu Software Technology Co ltd
Priority date: 2023-07-31
Filing date: 2023-07-31
Publication date: 2023-10-10

Abstract

The invention discloses a method for generating defects based on a DreamBooth fine-tuning Stable Diffusion model, which comprises the following steps: collecting rare defect training samples for training; configuring a training parameter of the DreamBooth, and setting a rare tag identifier associated with a rare defect training sample as a tag for training; training a Stable diffration model by adopting a defect training sample and a streamBooth configured with a rare tag identifier, so that the Stable diffration model learns rare characteristics of the rare defect training sample in the training process, wherein the rare characteristics are associated with the rare tag identifier; and inputting the rare label identifier and the prompt text of the category to which the target belongs into a trained Stable distribution model to obtain a defect picture with rare characteristics. According to the invention, a Stable Diffusion model is applied to an industrial defect picture, rare defects in industrial production are artificially generated, and a training model can be carried out through a small number of abnormal samples, so that the Stable Diffusion model conforming to an industrial environment is trained, and a database for enriching AI training is generated by the rare defect samples.

Description

Method for generating defects based on DreamBooth fine-tuning Stable Diffusion model

Technical Field

The invention relates to the technical field of industrial product defect detection, in particular to a method for generating defects based on a DreamBooth fine-tuning Stable Diffusion model.

Background

In the intelligent quality inspection task of industrial defects, the AI model is difficult to train due to less defect data.

The Stable Diffusion model is a Diffusion model based on tension (a large text-to-image model is an AI animation generation tool, namely a potential Diffusion model (Latent Diffusion Model) of text to images), and the text-to-image model is characterized in that a text condition is introduced into a UNet (UNet is also a full convolution neural network model in essence, and the UNet is named from the architecture shape of the UNet, namely the model is wholly in a U shape, the birth of the model is for solving the problem of semantic segmentation of medical images, but the development of the model in later years also proves that the model is a omnipotent player in a semantic segmentation task, and perhaps is an excellent place of an excellent network architecture) to realize the generation of images based on text. The core of Stable Diffusion is derived from the task of the patent Diffusion, the conventional Diffusion model is a pixel-based generation model, the patent Diffusion is a patent-based generation model, an Auto Encoder (AE) is adopted to compress an image into a patent Space (hidden Space), then the Diffusion model is used to generate the tiles of the image, and finally the generated image can be obtained by a decoder module of the auto encoder.

The advantage of the solution-based Diffusion model is that it is computationally efficient, since the solution space of an image is smaller than the image pixel space, which is also a core advantage of Stable Diffusion. The parametric quantity of the araneogram model is larger, the pixel-based method is limited to only generating images with the size of 64x64 by calculation, such as DALL-E2 of OpenAI and an image model of Google, and then the resolution of the images is improved to 256x256 and 1024x1024 by a super-resolution model; while the last-based Stable Diffusion is operated in the last space, it can directly generate 256x256 and 512x512 or even higher resolution images.

As shown in fig. 1, the main structure of the Stable diffration model mainly includes three models:

autoencoder: an encoder (encoder) compresses an image to a layer space, and a decoder (decoder) decodes the layer into an image, using autoencoerkl self-encoding as a structure of the encoder and the decoder;

CLIP text encoder: extracting text parts of an input text by using a FrozenCLIPEmbedder text coder, and sending the text parts into a UNet of a diffusion model as a condition in a cross-section (attention mechanism) mode;

UNet: as shown in fig. 2, UNet is used as the main body of the diffusion model to implement the text generation under text guidance. Applying noise with different intensities to implicit vectors of an input image, inputting the noisy implicit vectors to UNet to output estimated noise, comparing the estimated noise with a real noise information label to calculate KL divergence, and updating UNet model parameters through a back propagation algorithm; after the text vector context is introduced, the UNet takes the text vector context as a condition in training, and the attention mechanism is utilized to better guide the image to generate in the text vector direction.

While the Stable distribution model shows dramatic effects when trained, in actual use it is necessary to use it to generate some personalized new results, such as pictures of the user himself or of the pet in the home. Since the Stable Diffusion model does not see these pictures during training, the original model does not achieve this. The Stable dispersion model can realize a text-generated graph and a rich image generation scene of the graph-generated graph, but when a real specific real object appears in the image, such as a special product and a defect in industrial production, the most advanced text-generated image model is difficult to keep key visual characteristics, namely the most advanced text-generated image model lacks the capability of imitating or reproducing the appearance of a main body in a given reference set, the expressive nature of an output domain of the model is limited, namely the original Stable dispersion cannot stably output a specified object, the generated result is equivalent to dice or card throwing, and the main body between the generation is not related, so that the Stable dispersion model needs to be finely tuned.

As shown in fig. 3, streambooth is a picture personalized study derived from google, and is also an algorithm for model training, and by adding Prior Preservation Loss (prior loss) to the original picture generation network, through simple training of several pictures, the latent space can be resampled, and a specified object can be output. The effect of this personalized customization output makes streambooth one of the most popular functions in the current AI image generation field.

In the industrial production process, most of the products are normal products, but along with the increase of the production quantity, a plurality of abnormal defect pictures are often generated, different from common defects, the abnormal defect pictures are very rare and even different, so that great challenges and difficulties are brought to a defect detection scheme based on deep learning by using j.

Disclosure of Invention

The invention aims to provide a method for generating defects based on a DreamBooth trimming Stable Diffusion model, which uses the DreamBooth to train and trim the Stable Diffusion model and is used for generating industrial defect picture auxiliary AI model training rich sample diversity.

In order to achieve the technical purpose, the invention adopts the following technical scheme:

a method for generating defects based on a DreamBooth fine-tuning Stable Diffusion model comprises the following steps:

s1, collecting rare defect training samples for training, wherein the rare defect training samples are industrial defect pictures with rare characteristics;

s2, configuring training parameters of the DreamBooth, and setting rare tag identifiers associated with rare defect training samples as tags for training;

s3, training a Stable Diffuse model by adopting a defect training sample and a DreamBooth configured with a rare tag identifier, so that the Stable Diffuse model learns rare features of the rare defect training sample in the training process, wherein the rare features are associated with the rare tag identifier;

s4, inputting the rare label identifier and the prompt text of the category to which the target belongs into a trained Stable distribution model to obtain a defect picture with rare characteristics.

Further, the configuring the training parameter of the streambooth, setting the rare tag identifier associated with the rare defect training sample as a tag for training, includes: a new defect description vocabulary is set and embedded into the text encoding space as a rare tag identifier.

Further, the industrial defect picture with rare characteristics comprises multiple angle defect pictures of the same industrial product.

Further, the collected rare defect training samples for training are stored in the same folder.

After the scheme is adopted, the invention has the following beneficial effects:

according to the invention, a training method of the DreamBooth is adopted, and the rare tag identifier is set, so that the Stable distribution model can learn the rare characteristics of rare defect training samples in training, thereby enriching the database of AI training, and achieving the purpose of outputting more various industrial defect pictures.

In short, the invention applies the Stable Diffusion model to the pictures of industrial defects, artificially generates rare defects in industrial production, trains the model by using a small number of abnormal samples (rare defect training samples), trains the Stable Diffusion model conforming to the industrial environment, and generates rare defect samples to enrich the AI training database.

Drawings

In order to more clearly illustrate the embodiments of the invention or the technical solutions of the prior art, the drawings which are used in the description of the embodiments or the prior art will be briefly described, it being obvious that the drawings in the description below are only some embodiments of the invention, and that other variants can be obtained according to these drawings without the aid of inventive efforts to a person skilled in the art.

FIG. 1 is a schematic diagram of a Stable Diffusion architecture;

fig. 2 is a diagram of UNet structure;

FIG. 3 is a schematic diagram of a Dreambooth architecture;

fig. 4 is a flowchart of a method for generating defects based on a streambooth fine-tuning Stable diffration model according to the present invention.

Fig. 5 and fig. 6 are defect pictures generated by adopting a method for generating defects based on a streambooth fine-tuning Stable Diffusion model.

Fig. 7 and 8 are true acquired defect pictures.

Detailed Description

The technical solutions of the embodiments of the present invention will be clearly described below with reference to the drawings in the embodiments of the present invention, and it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

As shown in fig. 4, the embodiment of the invention provides a method for generating defects based on a streambooth fine-tuning Stable diffration model, which specifically includes the following steps:

s1, collecting rare defect training samples for training, storing the collected rare defect training samples for training into the same folder, wherein the industrial defect pictures with rare characteristics comprise multiple angle defect pictures of the same industrial, and the more angles and pictures are selected in an actual test, the better the effect of the generated defect pictures is; the rare defect training sample is an industrial defect picture with rare characteristics;

s2, configuring training parameters of a DreamBooth, setting rare tag identifiers associated with rare defect training samples as tags for training, specifically setting new defect description vocabulary and embedding the new defect description vocabulary into a text coding space as the rare tag identifiers; the method is characterized in that the DreamBooth uses fine-turning of a small amount of entity object images, namely fine-turning is also called fine tuning as a common training method, which means that the learning rate is reduced, and a model is weakly adjusted, so that the function of truly recovering a real object in an image is realized, the original Stable dispersion model can memorize and retain the image entity, the main body characteristics and even the theme style of the entity in the original image in a text are identified, and the method is a new text-to-image 'individuation' (which can adapt to the specific image generation requirement of a user) Diffusion model; rare tag identifiers (Rare-token Identifiers) are used to tell the model that it is hoped to learn some new complementary special information, then the question is what word or words should be used to represent its specificity? If the words such as "Special" or "unique" are not good because the Stable Diffuse model has seen the words, training the Stable Diffuse model to associate the words with your personalized new target requires learning the tag vocabulary such as "Special" and "unique" independent of the existing target, i.e., learning to decouple the old target and re-couple the new target. Rare tags are thus set as identifiers for new targets and such tags are embedded in the text encoding space, thereby avoiding interference with old tags that are originally present.

S3, training a Stable Diffuse model by adopting a defect training sample and a DreamBooth configured with a rare tag identifier, so that the Stable Diffuse model learns rare characteristics of the rare defect training sample in the training process, wherein the rare characteristics are associated with the rare tag identifier;

As shown in fig. 5 and 6, in order to use the Stable diffration model to generate a defect picture (taking industrial weld defects as an example) very close to the actual defect picture shown in fig. 7 and 8 after fine tuning in this embodiment, the invention can realize the artificial generation of a richer industrial defect picture based on the Stable diffration model.

In the description of the present specification, reference to the terms "one embodiment," "some embodiments," "examples," "particular examples," or "an alternative embodiment," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiments or examples. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.

The above-described embodiments do not limit the scope of the present invention. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the above embodiments should be included in the scope of the present invention.

Claims

1. A method for generating defects based on a DreamBooth fine-tuning Stable Diffusion model is characterized by comprising the following steps:

2. The method for generating defects based on a streambooth fine-tuning Stable Diffusion model as claimed in claim 1, wherein the method comprises the steps of: the configuring the training parameters of the streambooth, setting the rare tag identifier associated with the rare defect training sample as a tag for training, includes: a new defect description vocabulary is set and embedded into the text encoding space as a rare tag identifier.

3. The method for generating defects based on a streambooth fine-tuning Stable Diffusion model as claimed in claim 1, wherein the method comprises the steps of: the industrial defect pictures with rare characteristics comprise multiple angle defect pictures of the same industrial product.

4. The method for generating defects based on a streambooth fine-tuning Stable Diffusion model as claimed in claim 1, wherein the method comprises the steps of: the collected rare defect training samples for training are stored in the same folder.