CN116630457A - Training method and device for picture generation model, electronic equipment and storage medium - Google Patents

Training method and device for picture generation model, electronic equipment and storage medium Download PDF

Info

Publication number
CN116630457A
CN116630457A CN202310610571.6A CN202310610571A CN116630457A CN 116630457 A CN116630457 A CN 116630457A CN 202310610571 A CN202310610571 A CN 202310610571A CN 116630457 A CN116630457 A CN 116630457A
Authority
CN
China
Prior art keywords
image feature
noise
feature vector
diffusion
value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310610571.6A
Other languages
Chinese (zh)
Inventor
舒畅
陈又新
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN202310610571.6A priority Critical patent/CN116630457A/en
Publication of CN116630457A publication Critical patent/CN116630457A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T11/002D [Two Dimensional] image generation
    • G06T11/001Texturing; Colouring; Generation of texture or colour
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/094Adversarial learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Biomedical Technology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Multimedia (AREA)
  • Image Processing (AREA)

Abstract

The invention relates to the technical field of artificial intelligence, which applies a deep learning technology, can be applied to the medical field, and is beneficial to improving the accuracy of medical image generation. Relates to a training method and device for a picture generation model, electronic equipment and a storage medium. The method comprises the following steps: acquiring training data; extracting image feature vectors of a training set and a positive sample set; performing t layers of diffusion noise adding on the extracted image feature vectors through a diffusion noise adding module, and obtaining noise distribution values and undetermined image feature vectors after each layer of diffusion; calculating a noise loss value of each layer of diffusion through a loss function, comparing the loss values and summing to calculate a total loss value; and adjusting parameters of the picture generation model according to the total loss value to obtain the standard picture generation model after training. According to the invention, on the basis of the diffusion model, the x generated by each diffusion is added for comparison learning, and the comparison loss is calculated, so that the convergence of the model and the effect of generating pictures are facilitated.

Description

Training method and device for picture generation model, electronic equipment and storage medium
Technical Field
The invention relates to the technical field of artificial intelligence, and relates to a training method, a training device, electronic equipment and a storage medium of a picture generation model by applying a deep learning technology.
Background
The current method for generating medical image pictures in the medical field generally uses an antagonistic neural network or a self-variable encoder. The antagonism neural network is composed of a generator and a discriminator, the picture is generated by the generator, the discriminator judges whether the picture is a real picture or is generated, then the loss gradient is used for updating the parameters of the generator, and the disadvantage of the network model is that the model is difficult to converge. The self-variation encoder is a relatively traditional picture generation model, the training process is to encode the picture and then decode and restore the picture to the own picture, the process is relatively simple, and the defect is that the generated picture has insufficient diversity.
Disclosure of Invention
In view of the foregoing, it is necessary to provide a training method for a picture generation model, which is mainly aimed at improving the picture generation quality.
In order to achieve the above object, the present invention provides a training method of a picture generation model, which is applied to a picture generation model training device, the device includes a preprocessing module, an image feature encoding module, a diffusion noise adding module, a noise loss value module, a contrast loss value module and a total loss value module, and is characterized in that the method includes:
S1, training data are obtained from a database, wherein the training data comprise a training set, a positive sample set and a first real noise distribution value corresponding to the training set and a second real noise distribution value corresponding to the positive sample set, and the positive sample set is obtained by carrying out data enhancement on the training set;
s2, extracting image features of the training set and the positive sample set through an image feature encoding module to obtain a first image feature vector of the training set and a second image feature vector of the positive sample set;
s3, performing t-layer diffusion noise addition on the first image feature vector through a diffusion noise addition module, obtaining a first noise distribution value and a first image feature vector to be determined on each layer, and obtaining a second noise distribution value and a second image feature vector to be determined of the second image feature vector in a similar way;
s4, calculating the first noise distribution value and the first real noise distribution value through a preset loss function to obtain a first noise loss value, and similarly calculating to obtain a second noise loss value of the second noise distribution value and the second real noise distribution value;
s5, calculating the first image feature vector to be determined and the second image feature vector to be determined through a preset contrast loss function to obtain a contrast loss value;
S6, adding the noise loss value and the contrast loss value to obtain a total loss value, and comparing the total loss value with a preset threshold value; if the total loss value is larger than or equal to a preset threshold value, updating parameters of the picture generation model until the total loss value is smaller than the preset threshold value; and if the total loss value is smaller than a preset threshold value, outputting a standard picture generation model.
Optionally, the diffusion noise adding module performs t layers of diffusion noise adding on the first image feature vector, each layer obtains a first noise distribution value and a first image feature vector to be determined, and the same obtains a second noise distribution value and a second image feature vector to be determined of a second image feature vector, including:
performing t-layer diffusion on the first image feature vector by using a diffusion noise adding module, wherein t is a positive integer greater than 1; adding noise into each diffusion layer, and calculating t first noise distribution values and t first image feature vectors to be determined after the first image feature vectors are diffused in each diffusion layer; and calculating t second noise distribution values and t second undetermined image feature vectors of the second image feature vectors after diffusion of each diffusion layer according to the calculation mode of the first noise distribution values.
Optionally, performing t-layer diffusion on the first image feature vector by using a diffusion noise adding module, wherein t is a positive integer greater than 1; adding noise into each diffusion layer, and calculating t first noise distribution values and t first image feature vectors to be determined after the first image feature vectors are diffused in each diffusion layer, wherein the method comprises the following steps:
and performing t-layer diffusion on the first image feature vector by using the following formula:
wherein x is the input first image feature vector to be determined, t is the number of diffusion layers, beta is a manually set value, I is the variance of the added Gaussian noise, q (x) t |x t-1 ) For the noise distribution of the current picture x,representing the distribution.
Optionally, the formula for performing t-layer diffusion on the first image feature vector may be:
α t =1-βI
wherein x is 0 For the first image feature vector, x t For the first to-be-determined image feature vector to diffuse to the t-layer,representing the distribution, beta is a preset parameter, I is the variance of the added Gaussian noise, alpha t Represents the probability distribution of layer t, +.>Representation of alpha t And (5) conducting derivation.
Optionally, calculating the first noise distribution value and the first real noise distribution value through a preset loss function to obtain a first noise loss value, and similarly calculating to obtain a second noise distribution value and a second noise loss value of the second real noise distribution value, where the calculating includes:
Calculating t loss values between t first noise distribution values and the first real noise distribution values of the first image feature vector after diffusion of each diffusion layer in the diffusion noise adding module through a preset loss function, and accumulating the t loss values to obtain a first noise loss value corresponding to the first image feature vector; and similarly, calculating to obtain a second noise loss value corresponding to the second image feature vector.
Optionally, t loss values between t first noise distribution values of the first image feature vector after diffusion of each diffusion layer and the first real noise distribution value in the diffusion noise adding module are calculated through a preset loss function, and the formula is as follows:
wherein L is 1 For loss value, t is the number of diffuse layers, e t The first noise distribution value is the first real noise distribution value when the diffusion layer number is t.
Optionally, the first image feature vector to be determined and the second image feature vector to be determined are calculated to obtain a contrast loss value through a preset contrast loss function, and the formula is as follows:
wherein q is the first image feature vector to be determined, K + For the second undetermined image feature vector, K i The image feature vector is the image feature vector of all samples, wherein all samples comprise all pictures of the training set and corresponding positive sample pictures in the positive sample set, q.K is the inner product of the two vectors, T is a temperature constant, and the temperature constant is a super parameter.
In addition, in order to achieve the above object, the present invention also provides a training device for generating a model by using pictures, the device comprising:
the preprocessing module is used for acquiring training data from a database, wherein the training data comprises a training set, a positive sample set and a first real noise distribution value corresponding to the training set and a second real noise distribution value corresponding to the positive sample set, and the positive sample set is obtained by carrying out data enhancement on the training set;
the image feature coding module is used for extracting image features of the training set and the positive sample set through the image feature coding module to obtain a first image feature vector of the training set and a second image feature vector of the positive sample set;
the diffusion noise adding module is used for carrying out t layers of diffusion noise adding on the first image feature vector through the diffusion noise adding module, obtaining a first noise distribution value and a first image feature vector to be determined on each layer, and obtaining a second noise distribution value and a second image feature vector to be determined of the second image feature vector in a similar way;
the noise loss value module is used for calculating the first noise distribution value and the first real noise distribution value through a preset loss function to obtain a first noise loss value, and similarly calculating to obtain a second noise loss value of the second noise distribution value and the second real noise distribution value;
The contrast loss value module is used for calculating the first undetermined image feature vector and the second undetermined image feature vector through a preset contrast loss function to obtain a contrast loss value;
the total loss value module is used for adding the noise loss value and the comparison loss value to obtain a total loss value, and comparing the total loss value with a preset threshold value; if the total loss value is larger than or equal to a preset threshold value, updating parameters of the picture generation model until the total loss value is smaller than the preset threshold value; and if the total loss value is smaller than a preset threshold value, outputting a standard picture generation model.
In addition, to achieve the above object, the present invention also provides an electronic device including:
a memory storing at least one computer program; and
And the processor executes the program stored in the memory to realize the training method of the picture generation model.
In addition, to achieve the above object, the present invention further provides a computer readable storage medium having at least one computer program stored therein, the at least one computer program being executed by a processor in an electronic device to implement the above-described training method of the picture generation model.
According to the embodiment of the invention, the training picture is subjected to data enhancement to generate the positive sample picture, the positive sample picture and the training picture are subjected to contrast training, t layers of diffusion noise adding processing are respectively carried out, the contrast loss value and the noise adding loss value of each layer of diffusion accumulation are calculated, finally the contrast loss value and the noise adding loss value are added to obtain the total loss value, when the total loss value is smaller than a preset threshold value, a standard picture generation model is output, the potential variables learned by the model are corrected by adding noise in each step of the diffusion noise adding module, the probability of state conversion is updated, the influence of abnormal points on the model is reduced, the picture generation quality is improved, and the similarity constraint is carried out on the picture subjected to diffusion noise adding in each step by calculating the noise distribution and the loss value of the real noise distribution in the diffusion noise adding module through the noise loss function, so that the diffusion model is easier to converge.
Drawings
Fig. 1 is a flowchart of a training method of a picture generation model according to an embodiment of the present invention;
FIG. 2 is a flowchart of a training apparatus for generating a picture generation model according to an embodiment of the present invention;
fig. 3 is a schematic diagram of an internal structure of an electronic device for implementing a training method of a picture generation model according to an embodiment of the present invention;
The achievement of the objects, functional features and advantages of the present application will be further described with reference to the accompanying drawings, in conjunction with the embodiments.
Detailed Description
The present application will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present application more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
It should be noted that the description of "first", "second", etc. in this disclosure is for descriptive purposes only and is not to be construed as indicating or implying a relative importance or implying an indication of the number of technical features being indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include at least one such feature. In addition, the technical solutions of the embodiments may be combined with each other, but it is necessary to base that the technical solutions can be realized by those skilled in the art, and when the technical solutions are contradictory or cannot be realized, the combination of the technical solutions should be considered to be absent and not within the scope of protection claimed in the present application.
Referring to fig. 1, which is a schematic flow chart of a training method of a picture generation model according to an embodiment of the present invention, in an embodiment of the present invention, the training method of the picture generation model includes steps S1 to S6 as follows:
s1, training data are obtained from a database, wherein the training data comprise a training set, a positive sample set and a first real noise distribution value corresponding to the training set and a second real noise distribution value corresponding to the positive sample set, and the positive sample set is obtained by carrying out data enhancement on the training set;
in the embodiment of the invention, the training set comprises training pictures, the positive sample set is a data set formed by the positive sample pictures obtained by carrying out data enhancement on the training pictures in the training set, the first real noise distribution represents the correct noise distribution of the training set, and the second real noise distribution represents the correct noise distribution of the positive sample set;
in the embodiment of the invention, an image generation model is generally used in the synthesis and conversion of medical images, and a common image generation method is generally used for antagonizing a neural network or a self-variation encoder. The antagonism neural network is composed of a generator and a discriminator, the picture is generated by the generator, the discriminator judges whether the picture is a real picture or is generated, then the loss gradient is used for updating the parameters of the generator, and the disadvantage of the network model is that the model is difficult to converge. The self-variation encoder is a relatively traditional picture generation model, the training process is that pictures are encoded and then decoded and restored to the pictures, the process is relatively simple, and the defect is that the generated pictures are insufficient in diversity, so that the embodiment can smooth abnormal points in a random process in the model noise adding process and improve the picture generation quality by generating the pictures by using a diffusion model based on a comparison learning frame and adding comparison learning loss in each step of the diffusion model. Meanwhile, the picture similarity constraint after noise addition is added to each step of diffusion, so that a diffusion model is easier to converge;
In embodiments of the present invention, shortage of annotated medical images is one of the challenges in the field of medical image computation. Without a sufficient number of training samples, deep learning based models are likely to suffer from over-fitting problems. A common solution is to data enhance, e.g. image rotate, crop or resize, the training picture. As more training samples are introduced, these methods can help alleviate the problem of overfitting;
the picture generation model includes, but is not limited to: the device comprises a preprocessing module, an image feature encoding module, a diffusion noise adding module, a noise loss value module, a contrast loss value module and a total loss value module.
S2, extracting image features of the training set and the positive sample set through an image feature encoding module to obtain a first image feature vector of the training set and a second image feature vector of the positive sample set;
s3, performing t-layer diffusion noise addition on the first image feature vector through a diffusion noise addition module, obtaining a first noise distribution value and a first image feature vector to be determined on each layer, and obtaining a second noise distribution value and a second image feature vector to be determined of the second image feature vector in a similar way;
In the embodiment of the present invention, t layers of diffusion noise adding is performed on the first image feature vector by a diffusion noise adding module, each layer obtains a first noise distribution value and a first image feature vector to be determined, and a second noise distribution value and a second image feature vector to be determined of a second image feature vector are obtained by the same method, including:
performing t-layer diffusion on the first image feature vector by using a diffusion noise adding module, wherein t is a positive integer greater than 1; adding noise into each diffusion layer, and calculating t first noise distribution values and t first image feature vectors to be determined after the first image feature vectors are diffused in each diffusion layer; calculating t second noise distribution values and t second undetermined image feature vectors of the second image feature vectors after diffusion of each diffusion layer according to the calculation mode of the first noise distribution values;
in the embodiment of the invention, in order to reduce the influence of abnormal points on a model, a first image feature vector and a second image feature vector are added in a diffusion noise adding module for comparison learning, noise with variance I conforming to Gaussian distribution and a manually set numerical value beta are calculated and added in each diffusion step, potential variables learned by the model are corrected, the probability of state transition is updated, and the picture generation quality is improved. Wherein the value of β is increased in a linear interpolation trend, and the value of β is generally within the range of [ 0.0001-0.02 ], for example, the first image feature vector is x, the second image feature vector is y, and the x and y input diffusion noise adding modules are used as initial image feature vectors x 0 And y 0 Adding noise with variance I conforming to Gaussian distribution and manually setting a value beta to be 0.0001 in the first step of diffusion, and generating undetermined image feature vectors x1, y1 and noise distribution of x1, y1 through a formula;
in one embodiment, the diffusion model diffuses from layer t-1 to layer t by the formula:
wherein x is the input first image feature vector to be determined, t is the number of diffusion layers, beta is a preset parameter, and I is a Gaussian noise adding methodDifference, q (x t |x t-1 ) For the noise distribution of the current picture x,representing a distribution;
in one embodiment, the diffusion model is formulated to diffuse from layer 0 to layer t as:
α t =1-βI
where x0 is the input first image feature vector, xt is the first image feature vector to be determined diffused to the t layer,representing the distribution, beta is a preset parameter, I is the variance of the added Gaussian noise, alpha t Represents the probability distribution of layer t, +.>Representation of alpha t Carrying out derivation;
s4, calculating the first noise distribution value and the first real noise distribution value through a preset loss function to obtain a first noise loss value, and similarly calculating to obtain a second noise loss value of the second noise distribution value and the second real noise distribution value;
In the embodiment of the present invention, calculating the first noise distribution value and the first real noise distribution value by a preset loss function to obtain a first noise loss value includes:
t loss values between t first noise distribution values and the real noise distribution values of the first image feature vector after diffusion of each diffusion layer in the diffusion noise adding module are calculated according to a preset loss function of the noise loss value module, and the t loss values are accumulated to obtain a first noise loss value corresponding to the first image feature vector; similarly calculating to obtain a second noise loss value corresponding to the second image feature vector;
in the embodiment of the present invention, the formula of the preset loss function is:
wherein L is 1 For loss value, t is the number of diffuse layers, e t The first noise distribution value is the first real noise distribution value when the diffusion layer number is t;
and calculating the loss value of the noise distribution and the real noise distribution after diffusion noise adding of each layer in the diffusion noise adding module through the L1 loss function to carry out similarity constraint on the picture after diffusion noise adding of each step, so that the diffusion noise adding module is easier to converge.
In an embodiment, the adding the loss value of each layer of diffusion noise adding may further replace the first noise loss value of the first image feature vector and the second noise loss value of the second image feature vector with parameters that are directly used for updating noise adding in the diffusion noise adding module when each layer of diffusion noise adding loss value is calculated.
S5, calculating the loss value between the t first undetermined image feature vector and the t second undetermined image feature vector according to a preset contrast loss function of the contrast loss value module, and accumulating all the loss values to obtain a contrast loss value;
in one embodiment, the contrast loss function is an InfoNCE loss function, and the formula is:
wherein q is the first image feature vector to be determined, K + For the second undetermined image feature vector, ki is the image of all samplesThe feature vector, all samples of which comprise all pictures of a training set and positive sample pictures corresponding to a positive sample set, q.K is the inner product of two vectors, T is a temperature constant and is a super parameter;
according to the application, on the basis of diffusion, the undetermined image feature vector generated by each diffusion is added to perform contrast learning, and the diffusion noise in each step is restrained, so that the noise in diffusion is smoother, and the convergence of a model and the effect of generating a picture are facilitated.
S6, adding the noise loss value and the contrast loss value to obtain a total loss value, and comparing the total loss value with a preset threshold value; if the total loss value is larger than or equal to a preset threshold value, updating parameters of the picture generation model until the total loss value is smaller than the preset threshold value; and if the total loss value is smaller than a preset threshold value, outputting a standard picture generation model.
In the embodiment of the present invention, the formula of the total loss is l=l1+infonce;
in an embodiment, the step of generating the picture by the standard picture generation model includes: inputting the noise distribution and the image feature vector obtained by diffusion into a diffusion noise adding module reversely for t layers of noise reduction, and calculating the noise distribution and the image feature vector after noise reduction of each layer until the noise reduction reaches the last layer, wherein the standard picture generation model outputs a final picture;
the formula of the noise reduction step is as follows:
wherein t is the number of layers for noise reduction,is the distribution of noise, mu t 、∑ t The mean and variance of this distribution.
According to the invention, a positive sample picture is generated by carrying out data enhancement on a training picture, the positive sample picture and the training picture are subjected to comparison training, t layers of diffusion noise adding processing are respectively carried out, the comparison loss value and the noise adding loss value of each layer of diffusion are calculated, finally, the comparison loss value and the noise adding loss value are added to obtain a total loss value, when the total loss value is smaller than a preset threshold value, a standard picture generation model is output, the potential variables learned by the model are corrected by adding noise in each step of a diffusion noise adding module, the probability of state conversion is updated, the influence of abnormal points on the model is reduced, the picture generation quality is improved, and the similarity constraint is carried out on the picture subjected to diffusion noise adding in each step by calculating the noise distribution and the loss value of real noise distribution in the diffusion noise adding module through a noise loss function, so that the diffusion model is easier to converge.
As shown in fig. 2, a functional block diagram of the training device of the picture generation model according to the present invention is shown.
The training device 100 for generating a model by using pictures according to the present invention may be installed in an electronic device. Depending on the implemented functions, the training device of the picture generation model may include a data preprocessing module 101, an image encoding module 102, a diffusion noise adding module 103, a noise loss value module 104, a contrast loss value module 105, and a total loss value module 106, where the modules may also be referred to as units, refer to a series of computer program segments capable of being executed by a processor of an electronic device and of performing a fixed function, and stored in a memory of the electronic device.
In the present embodiment, the functions concerning the respective modules/units are as follows:
the preprocessing module 101 is configured to obtain training data from a database, where the training data includes a training set, a positive sample set, and a first real noise distribution value corresponding to the training set and a second real noise distribution value corresponding to the positive sample set, where the positive sample set is obtained by performing data enhancement on the training set;
the image feature encoding module 102 is configured to perform image feature extraction on the training set and the positive sample set through the image feature encoding module, so as to obtain a first image feature vector of the training set and a second image feature vector of the positive sample set;
The diffusion noise adding module 103 is configured to perform t-layer diffusion noise adding on the first image feature vector through the diffusion noise adding module, obtain a first noise distribution value and a first image feature vector to be determined on each layer, and obtain a second noise distribution value and a second image feature vector to be determined of the second image feature vector in a similar manner;
the noise loss value module 104 is configured to calculate the first noise distribution value and the first real noise distribution value through a preset loss function to obtain a first noise loss value, and similarly calculate the second noise loss value of the second noise distribution value and the second real noise distribution value;
the contrast loss value module 105 is configured to calculate the first to-be-determined image feature vector and the second to-be-determined image feature vector by using a preset contrast loss function to obtain a contrast loss value;
the total loss value module 106 is configured to add the noise loss value and the contrast loss value to obtain a total loss value, and compare the total loss value with a preset threshold; if the total loss value is larger than or equal to a preset threshold value, updating parameters of the picture generation model until the total loss value is smaller than the preset threshold value; and if the total loss value is smaller than a preset threshold value, outputting a standard picture generation model.
In detail, each module in the training device 100 for generating a picture model in the embodiment of the present invention adopts the same technical means as the training method for generating a picture model described in fig. 1, and can generate the same technical effects, which are not described herein.
Fig. 3 is a schematic structural diagram of an electronic device implementing the training method of the picture generation model according to the present invention.
The electronic device may comprise a processor 10, a memory 11, a communication bus 12 and a communication interface 13, and may further comprise a computer program, such as a training program of a picture generation model, stored in the memory 11 and executable on the processor 10.
The memory 11 includes at least one type of readable storage medium, including flash memory, a mobile hard disk, a multimedia card, a card memory (e.g., SD or DX memory, etc.), a magnetic memory, a magnetic disk, an optical disk, etc. The memory 11 may in some embodiments be an internal storage unit of the electronic device, such as a mobile hard disk of the electronic device. The memory 11 may in other embodiments also be an external storage device of the electronic device, such as a plug-in mobile hard disk, a smart memory card (SmartMediaCard, SMC), a secure digital (SecureDigital, SD) card, a flash card (FlashCard) or the like, provided on the electronic device. Further, the memory 11 may also include both an internal storage unit and an external storage device of the electronic device. The memory 11 may be used not only for storing application software installed in an electronic device and various types of data, such as codes of training programs of a picture generation model, but also for temporarily storing data that has been output or is to be output.
The processor 10 may be comprised of integrated circuits in some embodiments, for example, a single packaged integrated circuit, or may be comprised of multiple integrated circuits packaged with the same or different functions, including one or more central processing units (CentralProcessingUnit, CPU), microprocessors, digital processing chips, graphics processors, a combination of various control chips, and the like. The processor 10 is a control unit (control unit) of the electronic device, connects various components of the entire electronic device using various interfaces and lines, and executes various functions of the electronic device and processes data by running or executing programs or modules (e.g., a training program of a picture generation model, etc.) stored in the memory 11, and calling data stored in the memory 11.
The communication bus 12 may be a Peripheral Component Interconnect (PCI) bus, an Extended Industrial Standard Architecture (EISA) bus, or the like. The bus may be classified as an address bus, a data bus, a control bus, etc. The communication bus 12 is arranged to enable a connection communication between the memory 11 and at least one processor 10 etc. For ease of illustration, the figures are shown with only one bold line, but not with only one bus or one type of bus.
Fig. 3 shows only an electronic device with components, and it will be understood by those skilled in the art that the structure shown in fig. 3 is not limiting of the electronic device and may include fewer or more components than shown, or may combine certain components, or a different arrangement of components.
For example, although not shown, the electronic device may further include a power source (such as a battery) for supplying power to the respective components, and preferably, the power source may be logically connected to the at least one processor 10 through a power management device, so that functions of charge management, discharge management, power consumption management, and the like are implemented through the power management device. The power supply may also include one or more of any of a direct current or alternating current power supply, recharging device, power failure classification circuit, power converter or inverter, power status indicator, etc. The electronic device may further include various sensors, bluetooth modules, wi-Fi modules, etc., which are not described herein.
Optionally, the communication interface 13 may comprise a wired interface and/or a wireless interface (e.g., WI-FI interface, bluetooth interface, etc.), typically used to establish a communication connection between the electronic device and other electronic devices.
Optionally, the communication interface 13 may further comprise a user interface, which may be a Display, an input unit, such as a Keyboard (Keyboard), or a standard wired interface, a wireless interface. Alternatively, in some embodiments, the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (organic light-emitting diode) touch, or the like. The display may also be referred to as a display screen or display unit, as appropriate, for displaying information processed in the electronic device and for displaying a visual user interface.
It should be understood that the embodiments described are for illustrative purposes only and are not limited to this configuration in the scope of the patent application.
The training program of the picture generation model stored in the memory 11 in the electronic device is a combination of a plurality of computer programs, which when run in the processor 10, can implement:
training data is obtained from a database, wherein the training data comprises a training set, a positive sample set and a first real noise distribution value corresponding to the training set and a second real noise distribution value corresponding to the positive sample set, and the positive sample set is obtained after data enhancement is carried out on the training set;
Extracting image features of the training set and the positive sample set through an image feature encoding module to obtain a first image feature vector of the training set and a second image feature vector of the positive sample set;
performing t-layer diffusion noise adding on the first image feature vector through a diffusion noise adding module, obtaining a first noise distribution value and a first image feature vector to be determined on each layer, and obtaining a second noise distribution value and a second image feature vector to be determined of the second image feature vector in a similar way;
calculating the first noise distribution value and the first real noise distribution value through a preset loss function to obtain a first noise loss value, and similarly calculating to obtain a second noise loss value of the second noise distribution value and the second real noise distribution value;
calculating the first image feature vector to be determined and the second image feature vector to be determined through a preset contrast loss function to obtain a contrast loss value;
adding the noise loss value and the contrast loss value to obtain a total loss value, and comparing the total loss value with a preset threshold value; if the total loss value is larger than or equal to a preset threshold value, updating parameters of the picture generation model until the total loss value is smaller than the preset threshold value; and if the total loss value is smaller than a preset threshold value, outputting a standard picture generation model.
In particular, the specific implementation method of the processor 10 on the computer program may refer to the description of the relevant steps in the corresponding embodiment of fig. 1, which is not repeated herein.
Further, the electronic device integrated modules/units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. The computer readable medium may be non-volatile or volatile. The computer readable medium may include: any entity or device capable of carrying the computer program code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer memory, a Read-only memory (ROM).
Embodiments of the present invention may also provide a computer readable storage medium storing a computer program which, when executed by a processor of an electronic device, may implement:
training data is obtained from a database, wherein the training data comprises a training set, a positive sample set and a first real noise distribution value corresponding to the training set and a second real noise distribution value corresponding to the positive sample set, and the positive sample set is obtained after data enhancement is carried out on the training set;
Extracting image features of the training set and the positive sample set through an image feature encoding module to obtain a first image feature vector of the training set and a second image feature vector of the positive sample set;
performing t-layer diffusion noise adding on the first image feature vector through a diffusion noise adding module, obtaining a first noise distribution value and a first image feature vector to be determined on each layer, and obtaining a second noise distribution value and a second image feature vector to be determined of the second image feature vector in a similar way;
calculating the first noise distribution value and the first real noise distribution value through a preset loss function to obtain a first noise loss value, and similarly calculating to obtain a second noise loss value of the second noise distribution value and the second real noise distribution value;
calculating the first image feature vector to be determined and the second image feature vector to be determined through a preset contrast loss function to obtain a contrast loss value;
adding the noise loss value and the contrast loss value to obtain a total loss value, and comparing the total loss value with a preset threshold value; if the total loss value is larger than or equal to a preset threshold value, updating parameters of the picture generation model until the total loss value is smaller than the preset threshold value; and if the total loss value is smaller than a preset threshold value, outputting a standard picture generation model.
Further, the computer-usable storage medium may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function, and the like; the storage data area may store data created from the use of blockchain nodes, and the like.
In the several embodiments provided in the present application, it should be understood that the disclosed apparatus, device and method may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is merely a logical function division, and there may be other manners of division when actually implemented.
The modules described as separate components may or may not be physically separate, and components shown as modules may or may not be physical units, may be located in one place, or may be distributed over multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
The embodiment of the application can acquire and process the related data based on the artificial intelligence technology. Among these, artificial intelligence (ArtificialIntelligence, AI) is the theory, method, technique and application system that uses a digital computer or a digital computer-controlled machine to simulate, extend and expand human intelligence, sense the environment, acquire knowledge and use knowledge to obtain optimal results.
In addition, each functional module in the embodiments of the present invention may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units can be realized in a form of hardware or a form of hardware and a form of software functional modules.
It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential characteristics thereof.
The present embodiments are, therefore, to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference signs in the claims shall not be construed as limiting the claim concerned.
The blockchain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, consensus mechanism, encryption algorithm and the like. The Blockchain (Blockchain), which is essentially a decentralised database, is a string of data blocks that are generated by cryptographic means in association, each data block containing a batch of information of network transactions for verifying the validity of the information (anti-counterfeiting) and generating the next block. The blockchain may include a blockchain underlying platform, a platform product services layer, an application services layer, and the like.
Furthermore, it is evident that the word "comprising" does not exclude other elements or steps, and that the singular does not exclude a plurality. A plurality of units or means recited in the system claims can also be implemented by means of software or hardware by means of one unit or means. The terms second, etc. are used to denote a name, but not any particular order.
Finally, it should be noted that the above-mentioned embodiments are merely for illustrating the technical solution of the present invention and not for limiting the same, and although the present invention has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications and equivalents may be made to the technical solution of the present invention without departing from the spirit and scope of the technical solution of the present invention.

Claims (10)

1. A training method for a picture generation model, the method comprising:
s1, training data are obtained from a database, wherein the training data comprise a training set, a positive sample set and a first real noise distribution value corresponding to the training set and a second real noise distribution value corresponding to the positive sample set, and the positive sample set is obtained by carrying out data enhancement on the training set;
s2, extracting image features of the training set and the positive sample set through an image feature encoding module to obtain a first image feature vector of the training set and a second image feature vector of the positive sample set;
S3, performing t-layer diffusion noise addition on the first image feature vector through a diffusion noise addition module, obtaining a first noise distribution value and a first image feature vector to be determined on each layer, and obtaining a second noise distribution value and a second image feature vector to be determined of the second image feature vector in a similar way;
s4, calculating the first noise distribution value and the first real noise distribution value through a preset loss function to obtain a first noise loss value, and similarly calculating to obtain a second noise loss value of the second noise distribution value and the second real noise distribution value;
s5, calculating the first image feature vector to be determined and the second image feature vector to be determined through a preset contrast loss function to obtain a contrast loss value;
s6, adding the noise loss value and the contrast loss value to obtain a total loss value, and comparing the total loss value with a preset threshold value; if the total loss value is larger than or equal to a preset threshold value, updating parameters of the picture generation model until the total loss value is smaller than the preset threshold value; and if the total loss value is smaller than a preset threshold value, outputting a standard picture generation model.
2. The training method of a picture generation model according to claim 1, wherein t layers of diffusion noise adding is performed on the first image feature vector by a diffusion noise adding module, each layer obtains a first noise distribution value and a first image feature vector to be determined, and a second noise distribution value and a second image feature vector to be determined of a second image feature vector are obtained in a similar manner, including:
Performing t-layer diffusion on the first image feature vector by using a diffusion noise adding module, wherein t is a positive integer greater than 1; adding noise into each diffusion layer, and calculating t first noise distribution values and t first image feature vectors to be determined after the first image feature vectors are diffused in each diffusion layer; and calculating t second noise distribution values and t second undetermined image feature vectors of the second image feature vectors after diffusion of each diffusion layer according to the calculation mode of the first noise distribution values.
3. The training method of a picture generation model according to claim 2, wherein a diffusion noise adding module is used to perform t-layer diffusion on the first image feature vector, and t is a positive integer greater than 1; adding noise into each diffusion layer, and calculating t first noise distribution values and t first image feature vectors to be determined after the first image feature vectors are diffused in each diffusion layer, wherein the method comprises the following steps:
and performing t-layer diffusion on the first image feature vector by using the following formula:
wherein x is the input first image feature vector to be determined, t is the number of diffusion layers, beta is a preset parameter, I is the variance of the added Gaussian noise, and q (x t |x t-1 ) For the noise distribution of the current picture x, Representing the distribution.
4. A training method of a picture generation model according to claim 3, wherein the formula for performing t-layer diffusion on the first image feature vector is further:
α t =1-βI
wherein x is 0 For the first image feature vector, x t For the first to-be-determined image feature vector to diffuse to the t-layer,representing the distribution, beta is a preset parameter, I is the variance of the added Gaussian noise, alpha t Represents the probability distribution of layer t, +.>Representation of alpha t And (5) conducting derivation.
5. The training method of a picture generation model according to claim 1, wherein calculating the first noise distribution value and the first real noise distribution value by a preset loss function to obtain a first noise loss value, and similarly calculating to obtain a second noise loss value of a second noise distribution value and the second real noise distribution value, includes:
calculating t loss values between t first noise distribution values and the first real noise distribution values of the first image feature vector after diffusion of each diffusion layer in the diffusion noise adding module through a preset loss function, and accumulating the t loss values to obtain a first noise loss value corresponding to the first image feature vector; and similarly, calculating to obtain a second noise loss value corresponding to the second image feature vector.
6. The training method of the picture generation model according to claim 5, wherein t loss values between t first noise distribution values and the first real noise distribution values of the first image feature vector after diffusion of each diffusion layer in the diffusion noise adding module are calculated by a preset loss function, and the formula is as follows:
L 1 =||e t (x t ,t)-E|| 2
wherein L is 1 For loss value, t is the number of diffuse layers, e t The first noise distribution value is the first real noise distribution value when the diffusion layer number is t.
7. The training method of a picture generation model according to claim 1, wherein the first undetermined image feature vector and the second undetermined image feature vector are calculated by a preset contrast loss function to obtain a contrast loss value, and the formula is as follows:
wherein q is the first image feature vector to be determined, K + For the second undetermined image feature vector, K i The image feature vector is the image feature vector of all samples, wherein all samples comprise all pictures of the training set and corresponding positive sample pictures in the positive sample set, q.K is the inner product of the two vectors, T is a temperature constant, and the temperature constant is a super parameter.
8. A picture generation model training apparatus, the apparatus comprising:
The preprocessing module is used for acquiring training data from a database, wherein the training data comprises a training set, a positive sample set and a first real noise distribution value corresponding to the training set and a second real noise distribution value corresponding to the positive sample set, and the positive sample set is obtained by carrying out data enhancement on the training set;
the image feature coding module is used for extracting image features of the training set and the positive sample set through the image feature coding module to obtain a first image feature vector of the training set and a second image feature vector of the positive sample set;
the diffusion noise adding module is used for carrying out t layers of diffusion noise adding on the first image feature vector through the diffusion noise adding module, obtaining a first noise distribution value and a first image feature vector to be determined on each layer, and obtaining a second noise distribution value and a second image feature vector to be determined of the second image feature vector in a similar way;
the noise loss value module is used for calculating the first noise distribution value and the first real noise distribution value through a preset loss function to obtain a first noise loss value, and similarly calculating to obtain a second noise loss value of the second noise distribution value and the second real noise distribution value;
The contrast loss value module is used for calculating the first undetermined image feature vector and the second undetermined image feature vector through a preset contrast loss function to obtain a contrast loss value;
the total loss value module is used for adding the noise loss value and the comparison loss value to obtain a total loss value, and comparing the total loss value with a preset threshold value; if the total loss value is larger than or equal to a preset threshold value, updating parameters of the picture generation model until the total loss value is smaller than the preset threshold value; and if the total loss value is smaller than a preset threshold value, outputting a standard picture generation model.
9. An electronic device, the electronic device comprising:
at least one processor; the method comprises the steps of,
a memory communicatively coupled to the at least one processor; wherein,,
the memory stores a computer program executable by the at least one processor, the instructions being executable by the at least one processor to enable the at least one processor to perform a training method of a picture generation model as claimed in any one of claims 1 to 7.
10. A computer readable storage medium storing a computer program, wherein the computer program when executed by a processor implements the picture generation model training method of any one of claims 1 to 7.
CN202310610571.6A 2023-05-29 2023-05-29 Training method and device for picture generation model, electronic equipment and storage medium Pending CN116630457A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310610571.6A CN116630457A (en) 2023-05-29 2023-05-29 Training method and device for picture generation model, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310610571.6A CN116630457A (en) 2023-05-29 2023-05-29 Training method and device for picture generation model, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN116630457A true CN116630457A (en) 2023-08-22

Family

ID=87596942

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310610571.6A Pending CN116630457A (en) 2023-05-29 2023-05-29 Training method and device for picture generation model, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN116630457A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116912352A (en) * 2023-09-12 2023-10-20 苏州浪潮智能科技有限公司 Picture generation method and device, electronic equipment and storage medium
CN117593215A (en) * 2024-01-19 2024-02-23 哈尔滨工业大学(深圳)(哈尔滨工业大学深圳科技创新研究院) Large-scale vision pre-training method and system for generating model enhancement
CN117951605A (en) * 2024-03-26 2024-04-30 苏州元脑智能科技有限公司 Quantization method and device for diffusion model, computer equipment and storage medium

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116912352A (en) * 2023-09-12 2023-10-20 苏州浪潮智能科技有限公司 Picture generation method and device, electronic equipment and storage medium
CN116912352B (en) * 2023-09-12 2024-01-26 苏州浪潮智能科技有限公司 Picture generation method and device, electronic equipment and storage medium
CN117593215A (en) * 2024-01-19 2024-02-23 哈尔滨工业大学(深圳)(哈尔滨工业大学深圳科技创新研究院) Large-scale vision pre-training method and system for generating model enhancement
CN117593215B (en) * 2024-01-19 2024-03-29 哈尔滨工业大学(深圳)(哈尔滨工业大学深圳科技创新研究院) Large-scale vision pre-training method and system for generating model enhancement
CN117951605A (en) * 2024-03-26 2024-04-30 苏州元脑智能科技有限公司 Quantization method and device for diffusion model, computer equipment and storage medium
CN117951605B (en) * 2024-03-26 2024-06-07 苏州元脑智能科技有限公司 Quantization method and device for diffusion model, computer equipment and storage medium

Similar Documents

Publication Publication Date Title
CN111814962B (en) Parameter acquisition method and device for identification model, electronic equipment and storage medium
CN112148577B (en) Data anomaly detection method and device, electronic equipment and storage medium
CN116630457A (en) Training method and device for picture generation model, electronic equipment and storage medium
CN112634170B (en) Method, device, computer equipment and storage medium for correcting blurred image
CN111783982B (en) Method, device, equipment and medium for acquiring attack sample
CN113657495B (en) Insurance product recommendation method, apparatus and equipment based on probability prediction model
CN113378970B (en) Sentence similarity detection method and device, electronic equipment and storage medium
CN111523094B (en) Deep learning model watermark embedding method and device, electronic equipment and storage medium
CN111274937B (en) Tumble detection method, tumble detection device, electronic equipment and computer-readable storage medium
CN112990374A (en) Image classification method, device, electronic equipment and medium
CN116383766A (en) Auxiliary diagnosis method, device, equipment and storage medium based on multi-mode data
CN111339072B (en) User behavior-based change value analysis method and device, electronic equipment and medium
CN116705304A (en) Multi-mode task processing method, device, equipment and medium based on image text
CN117392996A (en) Target voice recognition method, device, electronic equipment and storage medium
CN117649463A (en) Diffusion model-based try-on image generation method, device, equipment and medium
CN111460293B (en) Information pushing method and device and computer readable storage medium
CN113435180A (en) Text error correction method and device, electronic equipment and storage medium
CN116701635A (en) Training video text classification method, training video text classification device, training video text classification equipment and storage medium
CN116680580A (en) Information matching method and device based on multi-mode training, electronic equipment and medium
CN116564322A (en) Voice conversion method, device, equipment and storage medium
CN116630712A (en) Information classification method and device based on modal combination, electronic equipment and medium
CN115098644B (en) Image and text matching method and device, electronic equipment and storage medium
CN113705686B (en) Image classification method, device, electronic equipment and readable storage medium
CN113780473B (en) Depth model-based data processing method and device, electronic equipment and storage medium
CN115147660A (en) Image classification method, device and equipment based on incremental learning and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination