CN116862902A - Method for generating defects based on DreamBooth fine-tuning Stable Diffusion model - Google Patents

Method for generating defects based on DreamBooth fine-tuning Stable Diffusion model Download PDF

Info

Publication number
CN116862902A
CN116862902A CN202310949827.6A CN202310949827A CN116862902A CN 116862902 A CN116862902 A CN 116862902A CN 202310949827 A CN202310949827 A CN 202310949827A CN 116862902 A CN116862902 A CN 116862902A
Authority
CN
China
Prior art keywords
rare
training
defect
model
stable
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310949827.6A
Other languages
Chinese (zh)
Inventor
陈宇
翁浩东
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xiamen Weitu Software Technology Co ltd
Original Assignee
Xiamen Weitu Software Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xiamen Weitu Software Technology Co ltd filed Critical Xiamen Weitu Software Technology Co ltd
Priority to CN202310949827.6A priority Critical patent/CN116862902A/en
Publication of CN116862902A publication Critical patent/CN116862902A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • G06T7/0004Industrial image inspection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Quality & Reliability (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a method for generating defects based on a DreamBooth fine-tuning Stable Diffusion model, which comprises the following steps: collecting rare defect training samples for training; configuring a training parameter of the DreamBooth, and setting a rare tag identifier associated with a rare defect training sample as a tag for training; training a Stable diffration model by adopting a defect training sample and a streamBooth configured with a rare tag identifier, so that the Stable diffration model learns rare characteristics of the rare defect training sample in the training process, wherein the rare characteristics are associated with the rare tag identifier; and inputting the rare label identifier and the prompt text of the category to which the target belongs into a trained Stable distribution model to obtain a defect picture with rare characteristics. According to the invention, a Stable Diffusion model is applied to an industrial defect picture, rare defects in industrial production are artificially generated, and a training model can be carried out through a small number of abnormal samples, so that the Stable Diffusion model conforming to an industrial environment is trained, and a database for enriching AI training is generated by the rare defect samples.

Description

Method for generating defects based on DreamBooth fine-tuning Stable Diffusion model
Technical Field
The invention relates to the technical field of industrial product defect detection, in particular to a method for generating defects based on a DreamBooth fine-tuning Stable Diffusion model.
Background
In the intelligent quality inspection task of industrial defects, the AI model is difficult to train due to less defect data.
The Stable Diffusion model is a Diffusion model based on tension (a large text-to-image model is an AI animation generation tool, namely a potential Diffusion model (Latent Diffusion Model) of text to images), and the text-to-image model is characterized in that a text condition is introduced into a UNet (UNet is also a full convolution neural network model in essence, and the UNet is named from the architecture shape of the UNet, namely the model is wholly in a U shape, the birth of the model is for solving the problem of semantic segmentation of medical images, but the development of the model in later years also proves that the model is a omnipotent player in a semantic segmentation task, and perhaps is an excellent place of an excellent network architecture) to realize the generation of images based on text. The core of Stable Diffusion is derived from the task of the patent Diffusion, the conventional Diffusion model is a pixel-based generation model, the patent Diffusion is a patent-based generation model, an Auto Encoder (AE) is adopted to compress an image into a patent Space (hidden Space), then the Diffusion model is used to generate the tiles of the image, and finally the generated image can be obtained by a decoder module of the auto encoder.
The advantage of the solution-based Diffusion model is that it is computationally efficient, since the solution space of an image is smaller than the image pixel space, which is also a core advantage of Stable Diffusion. The parametric quantity of the araneogram model is larger, the pixel-based method is limited to only generating images with the size of 64x64 by calculation, such as DALL-E2 of OpenAI and an image model of Google, and then the resolution of the images is improved to 256x256 and 1024x1024 by a super-resolution model; while the last-based Stable Diffusion is operated in the last space, it can directly generate 256x256 and 512x512 or even higher resolution images.
As shown in fig. 1, the main structure of the Stable diffration model mainly includes three models:
autoencoder: an encoder (encoder) compresses an image to a layer space, and a decoder (decoder) decodes the layer into an image, using autoencoerkl self-encoding as a structure of the encoder and the decoder;
CLIP text encoder: extracting text parts of an input text by using a FrozenCLIPEmbedder text coder, and sending the text parts into a UNet of a diffusion model as a condition in a cross-section (attention mechanism) mode;
UNet: as shown in fig. 2, UNet is used as the main body of the diffusion model to implement the text generation under text guidance. Applying noise with different intensities to implicit vectors of an input image, inputting the noisy implicit vectors to UNet to output estimated noise, comparing the estimated noise with a real noise information label to calculate KL divergence, and updating UNet model parameters through a back propagation algorithm; after the text vector context is introduced, the UNet takes the text vector context as a condition in training, and the attention mechanism is utilized to better guide the image to generate in the text vector direction.
While the Stable distribution model shows dramatic effects when trained, in actual use it is necessary to use it to generate some personalized new results, such as pictures of the user himself or of the pet in the home. Since the Stable Diffusion model does not see these pictures during training, the original model does not achieve this. The Stable dispersion model can realize a text-generated graph and a rich image generation scene of the graph-generated graph, but when a real specific real object appears in the image, such as a special product and a defect in industrial production, the most advanced text-generated image model is difficult to keep key visual characteristics, namely the most advanced text-generated image model lacks the capability of imitating or reproducing the appearance of a main body in a given reference set, the expressive nature of an output domain of the model is limited, namely the original Stable dispersion cannot stably output a specified object, the generated result is equivalent to dice or card throwing, and the main body between the generation is not related, so that the Stable dispersion model needs to be finely tuned.
As shown in fig. 3, streambooth is a picture personalized study derived from google, and is also an algorithm for model training, and by adding Prior Preservation Loss (prior loss) to the original picture generation network, through simple training of several pictures, the latent space can be resampled, and a specified object can be output. The effect of this personalized customization output makes streambooth one of the most popular functions in the current AI image generation field.
In the industrial production process, most of the products are normal products, but along with the increase of the production quantity, a plurality of abnormal defect pictures are often generated, different from common defects, the abnormal defect pictures are very rare and even different, so that great challenges and difficulties are brought to a defect detection scheme based on deep learning by using j.
Disclosure of Invention
The invention aims to provide a method for generating defects based on a DreamBooth trimming Stable Diffusion model, which uses the DreamBooth to train and trim the Stable Diffusion model and is used for generating industrial defect picture auxiliary AI model training rich sample diversity.
In order to achieve the technical purpose, the invention adopts the following technical scheme:
a method for generating defects based on a DreamBooth fine-tuning Stable Diffusion model comprises the following steps:
s1, collecting rare defect training samples for training, wherein the rare defect training samples are industrial defect pictures with rare characteristics;
s2, configuring training parameters of the DreamBooth, and setting rare tag identifiers associated with rare defect training samples as tags for training;
s3, training a Stable Diffuse model by adopting a defect training sample and a DreamBooth configured with a rare tag identifier, so that the Stable Diffuse model learns rare features of the rare defect training sample in the training process, wherein the rare features are associated with the rare tag identifier;
s4, inputting the rare label identifier and the prompt text of the category to which the target belongs into a trained Stable distribution model to obtain a defect picture with rare characteristics.
Further, the configuring the training parameter of the streambooth, setting the rare tag identifier associated with the rare defect training sample as a tag for training, includes: a new defect description vocabulary is set and embedded into the text encoding space as a rare tag identifier.
Further, the industrial defect picture with rare characteristics comprises multiple angle defect pictures of the same industrial product.
Further, the collected rare defect training samples for training are stored in the same folder.
After the scheme is adopted, the invention has the following beneficial effects:
according to the invention, a training method of the DreamBooth is adopted, and the rare tag identifier is set, so that the Stable distribution model can learn the rare characteristics of rare defect training samples in training, thereby enriching the database of AI training, and achieving the purpose of outputting more various industrial defect pictures.
In short, the invention applies the Stable Diffusion model to the pictures of industrial defects, artificially generates rare defects in industrial production, trains the model by using a small number of abnormal samples (rare defect training samples), trains the Stable Diffusion model conforming to the industrial environment, and generates rare defect samples to enrich the AI training database.
Drawings
In order to more clearly illustrate the embodiments of the invention or the technical solutions of the prior art, the drawings which are used in the description of the embodiments or the prior art will be briefly described, it being obvious that the drawings in the description below are only some embodiments of the invention, and that other variants can be obtained according to these drawings without the aid of inventive efforts to a person skilled in the art.
FIG. 1 is a schematic diagram of a Stable Diffusion architecture;
fig. 2 is a diagram of UNet structure;
FIG. 3 is a schematic diagram of a Dreambooth architecture;
fig. 4 is a flowchart of a method for generating defects based on a streambooth fine-tuning Stable diffration model according to the present invention.
Fig. 5 and fig. 6 are defect pictures generated by adopting a method for generating defects based on a streambooth fine-tuning Stable Diffusion model.
Fig. 7 and 8 are true acquired defect pictures.
Detailed Description
The technical solutions of the embodiments of the present invention will be clearly described below with reference to the drawings in the embodiments of the present invention, and it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
As shown in fig. 4, the embodiment of the invention provides a method for generating defects based on a streambooth fine-tuning Stable diffration model, which specifically includes the following steps:
s1, collecting rare defect training samples for training, storing the collected rare defect training samples for training into the same folder, wherein the industrial defect pictures with rare characteristics comprise multiple angle defect pictures of the same industrial, and the more angles and pictures are selected in an actual test, the better the effect of the generated defect pictures is; the rare defect training sample is an industrial defect picture with rare characteristics;
s2, configuring training parameters of a DreamBooth, setting rare tag identifiers associated with rare defect training samples as tags for training, specifically setting new defect description vocabulary and embedding the new defect description vocabulary into a text coding space as the rare tag identifiers; the method is characterized in that the DreamBooth uses fine-turning of a small amount of entity object images, namely fine-turning is also called fine tuning as a common training method, which means that the learning rate is reduced, and a model is weakly adjusted, so that the function of truly recovering a real object in an image is realized, the original Stable dispersion model can memorize and retain the image entity, the main body characteristics and even the theme style of the entity in the original image in a text are identified, and the method is a new text-to-image 'individuation' (which can adapt to the specific image generation requirement of a user) Diffusion model; rare tag identifiers (Rare-token Identifiers) are used to tell the model that it is hoped to learn some new complementary special information, then the question is what word or words should be used to represent its specificity? If the words such as "Special" or "unique" are not good because the Stable Diffuse model has seen the words, training the Stable Diffuse model to associate the words with your personalized new target requires learning the tag vocabulary such as "Special" and "unique" independent of the existing target, i.e., learning to decouple the old target and re-couple the new target. Rare tags are thus set as identifiers for new targets and such tags are embedded in the text encoding space, thereby avoiding interference with old tags that are originally present.
S3, training a Stable Diffuse model by adopting a defect training sample and a DreamBooth configured with a rare tag identifier, so that the Stable Diffuse model learns rare characteristics of the rare defect training sample in the training process, wherein the rare characteristics are associated with the rare tag identifier;
s4, inputting the rare label identifier and the prompt text of the category to which the target belongs into a trained Stable distribution model to obtain a defect picture with rare characteristics.
As shown in fig. 5 and 6, in order to use the Stable diffration model to generate a defect picture (taking industrial weld defects as an example) very close to the actual defect picture shown in fig. 7 and 8 after fine tuning in this embodiment, the invention can realize the artificial generation of a richer industrial defect picture based on the Stable diffration model.
In the description of the present specification, reference to the terms "one embodiment," "some embodiments," "examples," "particular examples," or "an alternative embodiment," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiments or examples. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
The above-described embodiments do not limit the scope of the present invention. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the above embodiments should be included in the scope of the present invention.

Claims (4)

1. A method for generating defects based on a DreamBooth fine-tuning Stable Diffusion model is characterized by comprising the following steps:
s1, collecting rare defect training samples for training, wherein the rare defect training samples are industrial defect pictures with rare characteristics;
s2, configuring training parameters of the DreamBooth, and setting rare tag identifiers associated with rare defect training samples as tags for training;
s3, training a Stable Diffuse model by adopting a defect training sample and a DreamBooth configured with a rare tag identifier, so that the Stable Diffuse model learns rare features of the rare defect training sample in the training process, wherein the rare features are associated with the rare tag identifier;
s4, inputting the rare label identifier and the prompt text of the category to which the target belongs into a trained Stable distribution model to obtain a defect picture with rare characteristics.
2. The method for generating defects based on a streambooth fine-tuning Stable Diffusion model as claimed in claim 1, wherein the method comprises the steps of: the configuring the training parameters of the streambooth, setting the rare tag identifier associated with the rare defect training sample as a tag for training, includes: a new defect description vocabulary is set and embedded into the text encoding space as a rare tag identifier.
3. The method for generating defects based on a streambooth fine-tuning Stable Diffusion model as claimed in claim 1, wherein the method comprises the steps of: the industrial defect pictures with rare characteristics comprise multiple angle defect pictures of the same industrial product.
4. The method for generating defects based on a streambooth fine-tuning Stable Diffusion model as claimed in claim 1, wherein the method comprises the steps of: the collected rare defect training samples for training are stored in the same folder.
CN202310949827.6A 2023-07-31 2023-07-31 Method for generating defects based on DreamBooth fine-tuning Stable Diffusion model Pending CN116862902A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310949827.6A CN116862902A (en) 2023-07-31 2023-07-31 Method for generating defects based on DreamBooth fine-tuning Stable Diffusion model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310949827.6A CN116862902A (en) 2023-07-31 2023-07-31 Method for generating defects based on DreamBooth fine-tuning Stable Diffusion model

Publications (1)

Publication Number Publication Date
CN116862902A true CN116862902A (en) 2023-10-10

Family

ID=88232276

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310949827.6A Pending CN116862902A (en) 2023-07-31 2023-07-31 Method for generating defects based on DreamBooth fine-tuning Stable Diffusion model

Country Status (1)

Country Link
CN (1) CN116862902A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117058490A (en) * 2023-10-12 2023-11-14 成都数智创新精益科技有限公司 Model training method, defect image generation method and related devices
CN117456039A (en) * 2023-12-25 2024-01-26 深圳墨世科技有限公司 AIGC magic head portrait generation method, device and equipment based on joint training
CN117649351A (en) * 2024-01-30 2024-03-05 武汉大学 Diffusion model-based industrial defect image simulation method and device

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117058490A (en) * 2023-10-12 2023-11-14 成都数智创新精益科技有限公司 Model training method, defect image generation method and related devices
CN117456039A (en) * 2023-12-25 2024-01-26 深圳墨世科技有限公司 AIGC magic head portrait generation method, device and equipment based on joint training
CN117456039B (en) * 2023-12-25 2024-02-27 深圳墨世科技有限公司 AIGC magic head portrait generation method, device and equipment based on joint training
CN117649351A (en) * 2024-01-30 2024-03-05 武汉大学 Diffusion model-based industrial defect image simulation method and device
CN117649351B (en) * 2024-01-30 2024-04-19 武汉大学 Diffusion model-based industrial defect image simulation method and device

Similar Documents

Publication Publication Date Title
CN116862902A (en) Method for generating defects based on DreamBooth fine-tuning Stable Diffusion model
CN112927712A (en) Video generation method and device and electronic equipment
CN113901894A (en) Video generation method, device, server and storage medium
CN112420014A (en) Virtual face construction method and device, computer equipment and computer readable medium
CN111666831A (en) Decoupling representation learning-based speaking face video generation method
CN108962216A (en) A kind of processing method and processing device, equipment and the storage medium of video of speaking
CN115330912B (en) Training method for generating human face speaking video based on audio and image driving
CN117173504A (en) Training method, training device, training equipment and training storage medium for text-generated graph model
CN115050087B (en) Method and device for decoupling identity and expression of key points of human face
CN113077537A (en) Video generation method, storage medium and equipment
CN115761075A (en) Face image generation method, device, equipment, medium and product
Wang et al. Integrated speech and gesture synthesis
CN116561265A (en) Personalized dialogue generation method, model training method and device
Elgaar et al. Multi-speaker and multi-domain emotional voice conversion using factorized hierarchical variational autoencoder
CN114283783A (en) Speech synthesis method, model training method, device and storage medium
CN115984933A (en) Training method of human face animation model, and voice data processing method and device
CN117522697A (en) Face image generation method, face image generation system and model training method
CN111653270A (en) Voice processing method and device, computer readable storage medium and electronic equipment
CN113409803B (en) Voice signal processing method, device, storage medium and equipment
Filntisis et al. Video-realistic expressive audio-visual speech synthesis for the Greek language
CN113178200A (en) Voice conversion method, device, server and storage medium
CN114155321B (en) Face animation generation method based on self-supervision and mixed density network
CN114360491B (en) Speech synthesis method, device, electronic equipment and computer readable storage medium
CN116074574A (en) Video processing method, device, equipment and storage medium
CN115273856A (en) Voice recognition method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination