CN116664450A - Diffusion model-based image enhancement method, device, equipment and storage medium - Google Patents

Diffusion model-based image enhancement method, device, equipment and storage medium Download PDF

Info

Publication number
CN116664450A
CN116664450A CN202310922672.7A CN202310922672A CN116664450A CN 116664450 A CN116664450 A CN 116664450A CN 202310922672 A CN202310922672 A CN 202310922672A CN 116664450 A CN116664450 A CN 116664450A
Authority
CN
China
Prior art keywords
image
noise
feature map
target
preset
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310922672.7A
Other languages
Chinese (zh)
Inventor
王红凯
徐昱
毛冬
戴波
陈祖歌
黄建平
李钟煦
郑怡
饶涵宇
李高磊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
State Grid Information and Telecommunication Co Ltd
Information and Telecommunication Branch of State Grid Zhejiang Electric Power Co Ltd
PanAn Power Supply Co of State Grid Zhejiang Electric Power Co Ltd
Original Assignee
State Grid Information and Telecommunication Co Ltd
Information and Telecommunication Branch of State Grid Zhejiang Electric Power Co Ltd
PanAn Power Supply Co of State Grid Zhejiang Electric Power Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by State Grid Information and Telecommunication Co Ltd, Information and Telecommunication Branch of State Grid Zhejiang Electric Power Co Ltd, PanAn Power Supply Co of State Grid Zhejiang Electric Power Co Ltd filed Critical State Grid Information and Telecommunication Co Ltd
Priority to CN202310922672.7A priority Critical patent/CN116664450A/en
Publication of CN116664450A publication Critical patent/CN116664450A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/70Denoising; Smoothing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T9/00Image coding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Processing (AREA)
  • Compression Of Band Width Or Redundancy In Fax (AREA)

Abstract

The invention discloses an image enhancement method, device, equipment and storage medium based on a diffusion model, wherein the method comprises the following steps: obtaining a target image to be enhanced and an image enhancement instruction, and encoding to obtain an encoding feature map and a text encoding; inputting the coding feature map and the text codes into a pre-trained target image enhancement network; according to a preset noise adding rule and a preset step number, gradually adding Gaussian noise into the coding feature map to obtain a target noise image obeying Gaussian distribution, and determining the prediction noise in a result image after adding Gaussian noise in each step; based on a cross attention mechanism, carrying out image enhancement on a region corresponding to text coding in a target noise image to obtain a noise enhanced image; according to a preset noise removal rule and a preset step number, gradually removing the prediction noise of each step from the noise-added enhanced image to obtain a denoised image; and decoding the denoised image to obtain an enhanced image. The invention effectively improves the enhancement effect on the image with more missing features.

Description

Diffusion model-based image enhancement method, device, equipment and storage medium
Technical Field
The present invention relates to the field of image enhancement technologies, and in particular, to an image enhancement method, apparatus, device, and storage medium based on a diffusion model.
Background
The image is one of the most common information carriers in electronic systems, and is widely applied in the fields of medical imaging, unmanned aerial vehicle photography, security monitoring, industrial detection and the like. However, many of the original pictures acquired have limitations in terms of quality, contrast, sharpness, and detail presentation due to environmental conditions, equipment limitations, noise during acquisition, and other factors. The image enhancement technique refers to a technique of processing features in an image to improve the visual effect of the image and to improve the quality of the image.
Conventional image enhancement methods generally employ techniques such as image filtering, histogram equalization, and image sharpening to improve the quality of the image. However, these methods have limited enhancement effects on images in the face of complex scenes and specific applications. For example: in medical images, the traditional image enhancement method cannot effectively extract pathological details or accurately restore the tissue structure of the image; in unmanned aerial vehicle photography, due to the change of illumination conditions and shooting distance, the problems of blurring, noise, low contrast and the like of a shot image may exist, and the enhancement effect of the shot image is limited by adopting a traditional image enhancement method; in security monitoring, a target object cannot be accurately identified and tracked by adopting a traditional image enhancement method.
With the rapid development of deep learning and computer vision, researchers have proposed image enhancement methods based on electronic systems to overcome the above problems. In order to improve the image enhancement effect, the existing image enhancement algorithm is realized based on a neural network model, and specific implementation modes include but are not limited to the following two modes: first kind: convolutional neural networks (Convolutional Neural Networks, CNN), which use low quality images (i.e., images that require image enhancement) as input and high quality images (i.e., images that do not require image enhancement) as training targets during training, use loss functions for iterative training of the network. And when the image is enhanced, inputting the target image to be enhanced into the trained CNN, and outputting the image to be enhanced. Second kind: a generated challenge network (Generative Adversarial Networks, GAN) uses low quality images as input, high quality images as training targets, and performs iterative training in the challenge of the generator and discriminator. When the image enhancement is carried out, inputting the target image to be subjected to the image enhancement into a trained generator, and outputting the image to be subjected to the image enhancement.
However, the existing neural network model for image enhancement has poor enhancement effect on images with more missing features.
Disclosure of Invention
The invention provides an image enhancement method, device, equipment and storage medium based on a diffusion model, which solve the problem of poor image enhancement effect caused by more characteristic missing in the prior art.
In order to achieve the above purpose, the invention adopts the following technical scheme:
in a first aspect, the present invention provides a diffusion model-based image enhancement method, the method comprising:
acquiring a target image to be enhanced, and encoding the target image through an encoder to obtain an encoding feature map;
acquiring an image enhancement instruction, and encoding the image enhancement instruction through a text editor to obtain a text code; the image enhancement instruction comprises the characteristics and the positions of the image to be enhanced;
inputting the coding feature map and the text codes into a pre-trained target image enhancement network;
according to a preset noise adding rule and a preset step number, gradually adding Gaussian noise into the coding feature map to obtain a target noise image obeying Gaussian distribution, and determining the prediction noise in a result image after adding Gaussian noise in each step;
Based on a cross attention mechanism, performing image enhancement on a region corresponding to the text code in the target noise image to obtain a noise-added enhanced image;
according to a preset noise removal rule and the preset step number, the prediction noise of each step is gradually removed from the noise-added enhanced image, and a denoised image is obtained;
and decoding the denoised image through a decoder to obtain an enhanced image.
In one possible implementation, the preset noise addition rule is determined based on a diffusion process of a denoising diffusion probability model; gradually adding Gaussian noise into the coding feature map according to a preset noise adding rule and a preset step number to obtain a target noise image obeying Gaussian distribution, wherein the method specifically comprises the following steps of:
according to the diffusion process of the denoising diffusion probability model, gaussian noise is added to the coding feature map in each step of the diffusion process; the parameter value of the added Gaussian noise is determined based on a preset noise time table;
and calculating a result image after adding the Gaussian noise in each step of the diffusion process according to the coding feature map and the noise time table, and outputting the result image corresponding to the preset step number as a target noise image.
In one possible implementation manner, the calculating the result image after adding the gaussian noise at each step of the diffusion process according to the coding feature map and the noise schedule specifically includes:
calculating a result image of the diffusion process after adding the Gaussian noise at each step according to the following formula:
wherein ,for the coding feature map before adding Gaussian noise, < +.>The result of adding noise corresponding to the time from the noise adding time to the time t;
,/>
for presetting the noise schedule, < >>Comprises->A parameter value representing the addition of Gaussian noise at each step of the diffusion process, and +.>
In one possible implementation, the target noise image comprises a plurality of image channels, and the cross-attention mechanism comprises a channel attention mechanism and a spatial attention mechanism; the image enhancement is performed on the region corresponding to the text code in the target noise image based on the cross attention mechanism to obtain a noise enhanced image, and the method specifically comprises the following steps:
through the channel attention mechanism, carrying out pertinence enhancement on different image channels on the feature map corresponding to each image channel of the region corresponding to the text code in the target noise image to obtain a channel attention feature map;
And carrying out targeted enhancement of different spatial positions on the channel attention feature map through the spatial attention mechanism to obtain a noise-added enhanced image.
In a possible implementation manner, the enhancing, by the channel attention mechanism, pertinence of different image channels on the feature map corresponding to each image channel of the region corresponding to the text code in the target noise image to obtain a channel attention feature map specifically includes:
for the feature map of each image channel of the region corresponding to the text code in the target noise image, performing dimension reduction processing on the feature map according to a maximum pooling and average pooling method to obtain global features of the feature map corresponding to the image channel;
processing the global features through a multi-layer sensor to obtain the weight coefficient of the image channel;
weighting the feature images corresponding to the image channels through the weight coefficients to obtain weighted feature images;
and multiplying the weighted feature map and the image channel of the target noise image to obtain a channel attention feature map.
In one possible implementation manner, through the spatial attention mechanism, the channel attention feature map is subjected to targeted enhancement of different spatial positions, so as to obtain a noise enhanced image, which specifically includes:
Processing the channel attention feature map according to the methods of maximum pooling and average pooling to obtain a processing result;
performing connection operation on the processing result based on the corresponding image channel to obtain a connected feature map;
the connected feature images are subjected to dimension reduction into a single channel by a convolution dimension reduction processing method, so that a space feature image is obtained;
and multiplying the space feature image and the target noise image to obtain a noise-added enhanced image.
In one possible implementation, the preset noise removal rule is determined based on a reverse process of a denoising diffusion probability model; the step of gradually removing the prediction noise of each step from the noise-added enhanced image according to a preset noise removal rule and the preset step number specifically comprises the following steps:
and removing the prediction noise determined in the diffusion process corresponding to the inverse process from the noise-added enhanced image at each step of the inverse process based on the inverse process of the denoising diffusion probability model.
In one possible implementation, before the inputting the coding feature map and the text code into a pre-trained target image enhancement network, the method further includes:
Training an original image enhancement network to obtain an image enhancement network with the error value of the predicted noise and the real noise smaller than a preset loss value as a target image enhancement network.
In one possible implementation manner, the training the original image enhancement network to obtain an image enhancement network with an error value of the predicted noise and the real noise smaller than a preset loss value as the target image enhancement network specifically includes:
acquiring a high-quality image meeting a preset quality requirement, and processing the high-quality image in a downsampling mode to obtain a corresponding low-quality image;
encoding the high-quality image and the low-quality image by an encoder to obtain a high-quality encoding diagram and a low-quality encoding diagram;
gradually adding Gaussian noise into the low-quality coding diagram, and determining the prediction noise in the result image after adding Gaussian noise in each step;
and determining error values of the prediction noise and the noise true value, and changing parameters of the original image enhancement network when the error values are larger than preset loss values until the error values are smaller than the preset loss values, so as to obtain the trained target image enhancement network.
In a second aspect, the present invention provides a diffusion model-based image enhancement apparatus comprising:
the coding module is used for acquiring a target image to be enhanced, and coding the target image through the coder to obtain a coding feature map;
the text coding module is used for acquiring the image enhancement instruction, and coding the image enhancement instruction through the text editor to obtain a text code; the image enhancement instruction comprises the characteristics and the positions of the image to be enhanced;
the input module is used for inputting the coding feature map and the text codes into a pre-trained target image enhancement network;
the noise prediction module is used for gradually adding Gaussian noise into the coding feature map according to a preset noise adding rule and a preset step number to obtain a target noise image obeying Gaussian distribution, and determining the prediction noise in a result image after Gaussian noise is added in each step;
the image enhancement module is used for carrying out image enhancement on the region corresponding to the text code in the target noise image based on a cross attention mechanism to obtain a noise enhanced image;
the denoising module is used for gradually removing the prediction noise of each step from the noise-added enhanced image according to a preset noise removal rule and the preset step number to obtain a denoised image;
And the decoding module is used for decoding the denoised image through a decoder to obtain an enhanced image.
Further, the preset noise adding rule is determined based on a diffusion process of a denoising diffusion probability model; when gaussian noise is gradually added to the coding feature map according to a preset noise adding rule and a preset step number to obtain a target noise image obeying gaussian distribution, the noise prediction module is configured to execute:
according to the diffusion process of the denoising diffusion probability model, gaussian noise is added to the coding feature map in each step of the diffusion process; the parameter value of the added Gaussian noise is determined based on a preset noise time table;
and calculating a result image after adding the Gaussian noise in each step of the diffusion process according to the coding feature map and the noise time table, and outputting the result image corresponding to the preset step number as a target noise image.
Further, in calculating a resultant image of the diffusion process after adding the gaussian noise at each step according to the coding feature map and the noise schedule, the noise prediction module is specifically configured to perform:
Calculating a result image of the diffusion process after adding the Gaussian noise at each step according to the following formula:
wherein ,for the coding feature map before adding Gaussian noise, < +.>The result of adding noise corresponding to the time from the noise adding time to the time t;
,/>
for presetting the noise schedule, < >>Comprises->A parameter value representing the addition of Gaussian noise at each step of the diffusion process, and +.>
Further, the target noise image comprises a plurality of image channels, and the cross attention mechanism comprises a channel attention mechanism and a spatial attention mechanism; the image enhancement module comprises a first enhancement unit and a second enhancement unit;
the first enhancing unit is configured to enhance pertinence of different image channels on a feature map corresponding to each image channel of a region corresponding to the text code in the target noise image through the channel attention mechanism, so as to obtain a channel attention feature map;
the second enhancing unit is configured to perform targeted enhancement on different spatial positions on the channel attention feature map through the spatial attention mechanism, so as to obtain a noise enhanced image.
Further, the first enhancement unit is specifically configured to perform:
For the feature map of each image channel of the region corresponding to the text code in the target noise image, performing dimension reduction processing on the feature map according to a maximum pooling and average pooling method to obtain global features of the feature map corresponding to the image channel;
processing the global features through a multi-layer sensor to obtain the weight coefficient of the image channel;
weighting the feature images corresponding to the image channels through the weight coefficients to obtain weighted feature images;
and multiplying the weighted feature map and the image channel of the target noise image to obtain a channel attention feature map.
Further, the second enhancement unit is specifically configured to perform:
processing the channel attention feature map according to the methods of maximum pooling and average pooling to obtain a processing result;
performing connection operation on the processing result based on the corresponding image channel to obtain a connected feature map;
the connected feature images are subjected to dimension reduction into a single channel by a convolution dimension reduction processing method, so that a space feature image is obtained;
and multiplying the space feature image and the target noise image to obtain a noise-added enhanced image.
Further, the preset noise removal rule is determined based on the inverse process of the denoising diffusion probability model; the denoising module is specifically configured to perform:
and removing the prediction noise determined in the diffusion process corresponding to the inverse process from the noise-added enhanced image at each step of the inverse process based on the inverse process of the denoising diffusion probability model.
Further, the device further comprises a model training module, which is used for training the original image enhancement network before the coding feature map and the text codes are input into the pre-trained target image enhancement network, so as to obtain the image enhancement network with the error value of the prediction noise and the real noise smaller than the preset loss value as the target image enhancement network.
Further, the model training module is specifically configured to perform:
acquiring a high-quality image meeting a preset quality requirement, and processing the high-quality image in a downsampling mode to obtain a corresponding low-quality image;
encoding the high-quality image and the low-quality image by an encoder to obtain a high-quality encoding diagram and a low-quality encoding diagram;
Gradually adding Gaussian noise into the low-quality coding diagram, and determining the prediction noise in the result image after adding Gaussian noise in each step;
and determining error values of the prediction noise and the noise true value, and changing parameters of the original image enhancement network when the error values are larger than preset loss values until the error values are smaller than the preset loss values, so as to obtain the trained target image enhancement network.
In a third aspect, the present invention provides an electronic device comprising a processor and a memory, the memory storing at least one instruction, at least one program, a set of codes, or a set of instructions, the at least one instruction, the at least one program, the set of codes, or the set of instructions being loaded and executed by the processor to implement the diffusion model-based image enhancement method of any of the above.
In a fourth aspect, the present invention provides a computer readable storage medium having stored therein at least one instruction, at least one program, a set of codes, or a set of instructions, the at least one instruction, the at least one program, the set of codes, or the set of instructions being loaded and executed by a processor to implement a diffusion model-based image enhancement method as set forth in any one of the preceding claims.
According to the image enhancement method based on the diffusion model, firstly, an obtained target image to be enhanced and an obtained image enhancement instruction are respectively encoded through an encoder and a text editor, and an encoding feature diagram and a text encoding are respectively obtained; secondly, inputting the coded coding feature map and the text codes into a pre-trained target image enhancement network; then, gradually adding Gaussian noise into the coding feature map to obtain target noise images obeying the Gaussian noise respectively, and determining the prediction noise in the result image after adding the Gaussian noise in each step; then, based on a cross attention mechanism, performing image enhancement on a region corresponding to the image enhancement instruction in the target noise image to obtain a noise-enhanced image; then, corresponding to the noise adding process, gradually removing the prediction noise of each step from the noise adding enhanced image to obtain a denoised image; and finally, decoding the denoised image through a decoder to obtain an enhanced image. The invention aims at target images with more missing features, such as: the method comprises the steps that in the field of power business vision analysis, a terminal with characteristics lost and discontinuous in the power generation, transmission and distribution processes acquires images, gaussian noise is gradually introduced into a target image to attenuate useful information in the images, and the noisy images tend to be Gaussian noise; and then the original image is restored by removing noise in the image by step denoising, the detail signal and the characteristic of the whole target image are enhanced while the noise and the interference in the target image are eliminated, meanwhile, the image area corresponding to the image enhancement instruction in the target image is enhanced pertinently by combining a cross attention mechanism, the restoration of the characteristics of image texture, saturation, color and the like is supported, the enhancement effect on the target image is effectively improved, and higher-quality image data is provided for subsequent image analysis.
Drawings
FIG. 1 is a flowchart of steps of a diffusion model-based image enhancement method according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of the implementation of a denoising diffusion probability model of an image enhancement method based on a diffusion model according to an embodiment of the present invention;
fig. 3 is a schematic structural diagram of a prediction noise model of an image enhancement method based on a diffusion model according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of an implementation of image enhancement based on a cross-attention mechanism of an image enhancement method based on a diffusion model according to an embodiment of the present invention;
FIG. 5 is a technical flowchart of an image enhancement method based on a diffusion model according to an embodiment of the present invention;
fig. 6 is a block diagram of an image enhancement device based on a diffusion model according to an embodiment of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
The terms "first" and "second" are used below for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include one or more such feature. In the description of the embodiments of the present disclosure, unless otherwise indicated, the meaning of "a plurality" is two or more. In addition, the use of "based on" or "according to" is intended to be open and inclusive in that a process, step, calculation, or other action "based on" or "according to" one or more of the stated conditions or values may in practice be based on additional conditions or beyond the stated values.
In order to solve the problem of poor image enhancement effect caused by more characteristic missing in the prior art, the embodiment of the invention provides an image enhancement method and device based on a diffusion model.
As shown in fig. 1, in a first aspect, an embodiment of the present invention provides an image enhancement method based on a diffusion model, where the method includes:
and 101, acquiring a target image to be enhanced, and encoding the target image by an encoder to obtain an encoding feature map.
The target image to be enhanced can be an image acquired by a terminal with characteristics lost and discontinuous in the processes of power generation, power transmission and power distribution in the field of visual analysis of power business.
An Encoder (Encoder) is a device that compiles, converts, or communicates or stores signals or data into a signal form.
In this embodiment, the encoder can compress the input target image into a potential spatial representation, resulting in an encoded signature.
Step 102, obtaining an image enhancement instruction, and encoding the image enhancement instruction through a text editor to obtain a text code.
Wherein the image enhancement instructions include features and locations of the image that need to be enhanced.
Specifically, the features of the image to be enhanced may be face features, specific background features, and the like, and the position of the image to be enhanced may be the upper left corner, the upper right corner, and the like of the image.
In this embodiment, the text editor is a CLIP (Contrastive Language-Image Pre-tracking) text editor. And mapping the image and the text to the same vector space through a CLIP text editor to obtain the text code.
And step 103, inputting the coding feature map and the text codes into a pre-trained target image enhancement network.
Specifically, the coding feature map obtained in the step 101 and the text code obtained in the step 102 are both input into a pre-trained target image enhancement network, and target images are enhanced in a targeted manner through the target image enhancement network.
Step 104, gradually adding Gaussian noise into the coding feature map according to a preset noise adding rule and a preset step number to obtain a target noise image obeying Gaussian distribution, and determining the prediction noise in the result image after adding Gaussian noise in each step.
Specifically, adding a certain amount of Gaussian noise into the coding feature map to obtain a result image after the Gaussian noise is added for the first time; and then adding a certain amount of Gaussian noise into the result image after the Gaussian noise is added for the first time to obtain a result image after the Gaussian noise is added for the second time, and repeating the step of adding the Gaussian noise for a preset step for a plurality of times to obtain a target noise image approaching the Gaussian noise. The Gaussian noise is gradually added to the coding feature map, so that the original coding feature map is changed into a noise map conforming to standard Gaussian distribution.
In the process of gradually adding the gaussian noise, the prediction noise contained in the resultant image of each step is determined.
And 105, carrying out image enhancement on a region corresponding to the text code in the target noise image based on a cross attention mechanism to obtain a noise enhanced image.
The attention mechanism is a carrier of a deep learning network to which the attention computation rules can be applied. The cross attention mechanism can unify the association characteristics between the inside of the modes and the modes to perform graph-text matching calculation.
In this embodiment, a attention mechanism is applied to determine a matching relationship between an image enhancement instruction and a coding feature map, and a region matching with the image enhancement instruction is subjected to targeted enhancement, so as to obtain a noise-enhanced image.
And 106, gradually removing the prediction noise of each step from the noise-added enhanced image according to a preset noise removal rule and a preset step number to obtain a denoised image.
Specifically, in contrast to the direction of gradually adding the gaussian noise, the prediction noise in the previous step is gradually removed from the noise-added enhanced image, and the noise-removing step is repeated for a preset number of times, so that a noise-removed image with the prediction noise gradually removed is obtained.
And 107, decoding the denoised image through a decoder to obtain an enhanced image.
Wherein the Decoder (Decoder) is capable of restoring data compressed into a potential spatial representation to an image, which is an enhanced image.
Specifically, the steps 101 and 102 mainly obtain the encoding feature map of the target image and the text encoding of the input image enhancement command, and the execution sequence of the steps 101 and 102 is not limited specifically, and may be executed abnormally or synchronously, as long as the execution is completed before the step 103.
According to the image enhancement method based on the diffusion model, firstly, an obtained target image to be enhanced and an obtained image enhancement instruction are respectively encoded through an encoder and a text editor, and an encoding feature diagram and a text encoding are respectively obtained; secondly, inputting the coded coding feature map and the text codes into a pre-trained target image enhancement network; then, gradually adding Gaussian noise into the coding feature map to obtain target noise images obeying the Gaussian noise respectively, and determining the prediction noise in the result image after adding the Gaussian noise in each step; then, based on a cross attention mechanism, performing image enhancement on a region corresponding to the image enhancement instruction in the target noise image to obtain a noise-enhanced image; then, corresponding to the noise adding process, gradually removing the prediction noise of each step from the noise adding enhanced image to obtain a denoised image; and finally, decoding the denoised image through a decoder to obtain an enhanced image.
The invention aims at target images with more missing features, such as: the method comprises the steps that in the field of power business vision analysis, a terminal with characteristics lost and discontinuous in the power generation, transmission and distribution processes acquires images, gaussian noise is gradually introduced into a target image to attenuate useful information in the images, and the noisy images tend to be Gaussian noise; and then the original image is restored by removing noise in the image by step denoising, the detail signal and the characteristic of the whole target image are enhanced while the noise and the interference in the target image are eliminated, meanwhile, the image area corresponding to the image enhancement instruction in the target image is enhanced pertinently by combining a cross attention mechanism, the restoration of the characteristics of image texture, saturation, color and the like is supported, the enhancement effect on the target image is effectively improved, and higher-quality image data is provided for subsequent image analysis.
Further, the preset noise addition rule is determined based on a diffusion process of the denoising diffusion probability model.
The denoising diffusion probability model (Denoising Diffusion Probabilistic Models, DDPM) is a parameterized Markov chain and is trained by a variational reasoning method. The denoising diffusion probability model is one of depth generation models, and generally comprises two processes, a diffusion process and a reverse process. The diffusion process is also called a forward diffusion process, a forward diffusion process or a noise adding process, and the reverse process is also called a reverse diffusion process or a reverse denoising process.
As shown in FIG. 2, byTo->The process of (1) is the diffusion process of the denoising diffusion probability model, which is composed of +.>To->Is the inverse of the denoising diffusion probability model.
The diffusion process is a step-by-step noise adding process, diagonal Gaussian noise is added to the sample image in each step of the diffusion process, and the sample data distribution of the original sample image is converted into a simple image conforming to the standard Gaussian distribution by continuously adding the Gaussian noise.
The reverse process is a de-drying process, sampling is carried out from the image conforming to the standard Gaussian distribution, and a small part of Gaussian noise is removed in each step, so that the de-noised image is gradually close to the real data distribution, further a sample image in the real data distribution is obtained, and the recovery of the sample image is realized.
According to a preset noise adding rule and a preset step number, gradually adding Gaussian noise into the coding feature map to obtain a target noise image obeying Gaussian distribution, wherein the method specifically comprises the following steps of:
and adding Gaussian noise to the coding feature map in each step of the diffusion process according to the diffusion process of the denoising diffusion probability model.
Wherein the parameter value of the added gaussian noise is determined based on a preset noise schedule.
Specifically, the diffusion process of the denoising diffusion probability model is a denoising process based on a Markov assumption. When the step number of step-by-step noise adding is determined as a preset step number T, after the parameters of Gaussian noise to be added in each step are determined based on a preset noise time table, the coding feature map is taken as a coding feature mapGaussian noise is gradually added to the picture.
And calculating a result image after adding Gaussian noise in each step of the diffusion process according to the coding feature map and the noise time table, and outputting the result image corresponding to the preset step number as a target noise image.
Specifically, the coding feature map is taken asThe number of steps required to add noise is T, the parameters of Gaussian noise added each time are determined according to a noise time table, a result image after each step of noise addition can be unambiguously obtained in the diffusion process, and the result image corresponding to the T-th step is used as a target noise image.
Further, according to the coding feature map and the noise schedule, calculating a result image after adding Gaussian noise in each step of the diffusion process, specifically:
assuming that the preset step number is T, the initial distribution of the sample data of the coding feature map isThe Gaussian noise with the mean value and standard deviation of a specific difference is added to the coding feature diagram at each time t of the diffusion process, and the Gaussian noise is expressed by the following formula:
(1),
(2),
wherein ,for adding the noise to the resulting image at time t, < >>For presetting the noise schedule, < >>IncludedA parameter value representing the addition of Gaussian noise at each step of the diffusion process, and
it is thus possible to obtain,(3),
wherein ,definition of the variable->,/>Based on the Markov assumption, after continuous iteration, the resulting image after adding Gaussian noise for each step of the diffusion process can be calculated according to the following formula:
(4),
wherein ,for the coding feature map before adding Gaussian noise, < +.>The result of adding noise corresponding to the time from the noise adding time to the time t;
that is to say that the first and second,(5),
from the above, it can be seen that for a determined code profile throughout the diffusion processAnd noise schedule->Can obtain the result image after noise adding in any step +.>When the preset step number T is large enough, the final denoised result image can be regarded as isotropic Gaussian distribution noise, namely +.>
Further, the prediction noise in the result image after the addition of the gaussian noise at each step is determined specifically as follows:
as shown in fig. 3, the prediction noise in the resultant image after the addition of the gaussian noise at each step is determined by a prediction noise model.
The prediction noise model is formed based on a U-Net network with the same input and output dimensions, and the U-Net network comprises a contracted path and an expanded path; the contraction path adopts a multi-layer downsampling structure, and the multi-layer downsampling structure is realized through a first convolution module; the expansion path adopts a multi-layer up-sampling structure, and the multi-layer up-sampling structure is realized through a second convolution module; the number of layers of the multi-layer downsampling structure is the same as the number of layers of the multi-layer upsampling structure.
In the present embodiment, the input of the noise model is predictedAnd (5) encoding tensors and time t of 128×128 in a single channel by adopting an encoding technology, and merging residual structures. The output of the prediction noise model is +.>The number and the size of the channels are the same as those of the input of the prediction noise model.
The multi-layer downsampling structure is a 4-layer downsampling structure, and downsampling of the prediction noise model adopts convolution operation with convolution kernel of 3×3, step size of 2 and filling of 1.
And taking the image after noise addition as a characteristic image input for the first time, and reducing the input characteristic image by half at each layer of the multi-layer downsampling structure by utilizing a first convolution module.
And (3) doubling the input characteristic images on each layer of the multi-layer up-sampling structure by utilizing a second convolution module through a nearest interpolation method, splicing the characteristic images with the characteristic images corresponding to the contracted paths, and finally outputting the prediction noise of the image after noise addition.
The first convolution module includes 5 convolution units, and the convolution channel numbers of the five convolution units are respectively set to 32, 64, 128, 256 and 512 from top to bottom. In order to prevent gradient extinction and gradient explosion, a residual structure is used for completing network transmission and expansion and reduction of the number of channels. The prediction noise model converts the number of channels to 1 at the output.
Further, the target noise image includes a plurality of image channels.
Specifically, the image channel is an important concept of an image, and in the RGB color mode, a complete image is composed of three image channels of red, green and blue, and the three image channels cooperate to generate the complete image.
The cross-attention mechanism includes a channel attention mechanism and a spatial attention mechanism.
The essence of attention record is to locate information of interest to a user in an image and restrain useless information in the image.
Based on a cross attention mechanism, carrying out image enhancement on an area corresponding to text coding in a target noise image to obtain a noise enhanced image, and specifically comprising the following steps:
and carrying out pertinence enhancement of different image channels on the feature map corresponding to each image channel of the region corresponding to the text coding in the target noise image through a channel attention mechanism to obtain a channel attention feature map.
The channel attention mechanism comprises a compression part and an excitation part, wherein the compression part mainly compresses global space information, then performs feature learning in the dimension of an image channel to form the importance of each channel, and the excitation part is used for distributing different weights to each channel.
And (3) carrying out targeted enhancement on different spatial positions on the channel attention feature map through a spatial attention mechanism to obtain a noise-added enhanced image.
The spatial attention mechanism is to find the position of the picture focused by the user and process the position.
Further, through a channel attention mechanism, the pertinence enhancement of different image channels is performed on the feature map corresponding to each image channel of the region corresponding to the text code in the target noise image, so as to obtain a channel attention feature map, which specifically includes:
and carrying out dimension reduction processing on the feature map of each image channel of the region corresponding to the text code in the target noise image according to the methods of maximum pooling and average pooling to obtain the global feature of the feature map corresponding to the image channel.
And processing the global features through the multi-layer perceptron to obtain the weight coefficient of the image channel.
And weighting the feature images corresponding to the image channels through the weight coefficients to obtain weighted feature images.
And multiplying the weighted feature map and an image channel of the target noise image to obtain a channel attention feature map.
In the present embodiment, as shown in fig. 4, the target noise image xThe region image corresponding to the text coding is processed by the encoder in the self-encoder to obtain the coding featuref c Coding featuresf c Processing by decoder in self-encoder to obtain characteristic diagramFFor characteristic diagramFPerforming maximum pooling and average pooling to generate a channel attention mapM c Will beM c And (3) withFMultiplication operation is carried out to obtain a channel attention characteristic diagramF’
Channel attention diagramM c The specific calculation formula of (2) is as follows:
,(6),
wherein AvgPool is global average pooling; maxPool is global max pooling;is the corresponding weight coefficient; MLP is a multi-layer perceptron; />To reduce the rate; />Is a sigmoid function; />The number of channels; /> and />Respectively representing 2 weight coefficients; />Is characterized by->The feature vector after the averaging pooling is marked by +.>Representing a channel attention module; />Is characterized by->The vector after the maximum pooling operation is generated by adding 2 features and activating a sigmoid functionM c
Channel attention is soughtM c And (3) withFMultiplication operation is carried out to obtain a channel attention characteristic diagramF’
Further, through a spatial attention mechanism, pertinence enhancement of different spatial positions is performed on the channel attention feature map, and a noise enhanced image is obtained, which specifically comprises:
and processing the channel attention characteristic diagram according to the methods of maximum pooling and average pooling to obtain a processing result.
And performing connection operation on the processing result based on the corresponding image channel to obtain a connected feature map.
And reducing the dimension of the connected feature map into a single channel by a convolution dimension reduction processing method to obtain a space feature map.
And multiplying the space feature image and the target noise image to obtain a noise-added enhanced image.
In this embodiment, as shown in FIG. 4, the channel attention profile isF’Input feature map used as spatial attention mechanism, and channel attention feature map based on graphic channelF’Performing maximum pooling and average pooling processing, performing convolutional neural (Conv) connection operation on processing results of all image channels to obtain a connected feature map, then performing dimension reduction on the connected feature map into a single channel by a set of convolution kernel dimension reduction processing method, and activating to generate a space feature mapM s The method comprises the steps of carrying out a first treatment on the surface of the Finally, the space feature diagramM s And channel attention profileF’And (5) performing multiplication processing to obtain a space feature map.
Space feature mapM s The specific calculation formula of (2) is as follows:
(7),
(8),
wherein ,representation->The superscript s denotes the spatial attention module.
Further, the preset noise removal rule is determined based on the inverse process of the denoising diffusion probability model.
The inverse process of the denoising diffusion probability model is a process of reconstructing a target image from noise.
According to a preset noise removal rule and a preset step number, the prediction noise of each step is gradually removed from the noise-added enhanced image, and the method specifically comprises the following steps:
and removing the prediction noise determined in the diffusion process corresponding to the inverse process from the noise-added enhanced image at each step of the inverse process based on the inverse process of the denoising diffusion probability model.
In particular, the inverse of the denoising diffusion probability model can also be assumed to be a Markov chain. Each step in the reverse processThe conditional probability distribution can be determined precisely>The +.A. can then be obtained by iterative continuous sampling in the opposite direction>And completing the generating task. But due to->Depending on the data distribution of all samples, the direct determination is therefore madeIt is not realistic. Thus, take the construction->Parameterized neural networks approximate their distribution, assuming +.>Is a probability distribution of the inverse process and obeys a Gaussian distribution, the mean of which +.>Sum of variances->All are-> and />As an input parameter, it is specifically expressed by the following formula:
(9),
in practical application, in order to facilitate subsequent calculation and reduce training difficulty of the neural network, variance is calculatedIs set to be constant which does not need to participate in the training of the neural network and is related to the time constant >
Training the mean value during the training phase using only neural networksAnd (3) obtaining the product. Although it is impossible to directly calculate +.>But can be according to->Time course value->And initial value->Calculating posterior conditional probability->
Specifically, a bayesian formula is applied:
(10),
according to (10) and (4) it is possible to:
(11),
wherein ,(12),/>(13)。
further, according to and />The relation between the two, and the combination of (9) and (11) can determine the loss function of the target image enhancement network.
Training means using neural networksWhen the training method is used, three selection modes can be adopted for the quantity to be predicted to obtain the training result.
First kind: direct prediction of the mean of gaussian noise per step in the inverse process
Second kind: predicting initial valuesWill->Substituting into (12), obtaining the mean +.>
Third kind: predicting noiseEliminating +.>The following formula is obtained:
(14),
calculation by (14), wherein />Is the predicted value of noise z.
In this embodiment, a third way is used for prediction, and the loss function is:
(15),
the final objective of network optimization is to maximize the end result of the reverse processThereby obtaining the result of the generation of the most suitable sample, so that the variation lower bound +.>To optimize its negative log likelihood function: / >
(16),
Equation (15) can be regarded as a variation lower bound lossAnd (15) will result in a comparison of the direct optimization +.>Better sample quality.
In the present embodiment, use is made ofThe loss function replaces the MSE loss function and is brought into equation (4), resulting in the final loss function as follows:
(17)。
as shown in fig. 4, further, before inputting the coding feature map and the text codes into the pre-trained target image enhancement network, the method further includes:
training an original image enhancement network to obtain an image enhancement network with the error value of the predicted noise and the real noise smaller than a preset loss value as a target image enhancement network.
Specifically, before the image enhancement network is applied in the invention, the original image enhancement network is required to be trained through a training sample, and the image enhancement network with the error value of the predicted noise and the real noise smaller than the preset loss value is used as a trained target image enhancement network.
Further, training the original image enhancement network to obtain an image enhancement network with an error value of the predicted noise and the real noise smaller than a preset loss value as a target image enhancement network, which specifically comprises:
And obtaining a high-quality image meeting the preset quality requirement, and processing the high-quality image in a downsampling mode to obtain a corresponding low-quality image.
The downsampling, also called downsampling, is a multi-rate digital signal processing technology or a process of reducing the sampling rate of a signal, and is generally used for reducing the data transmission rate or the data size.
And reducing the data size in the high-quality image to obtain a corresponding low-quality image.
The high quality image and the low quality image are encoded by an encoder to obtain a high quality encoded picture and a low quality encoded picture.
Specifically, the corresponding high quality code map and low quality code map form a training image pair, and the high quality code map and the low quality code map are mapped from the pixel space to the hidden layer space by an encoder in the self-encoder.
In this embodiment, if the size of the image is [ B, C, H, W ], where B represents image batch processing, C represents the number of channels, H represents the height of the image, and W represents the width of the image. After the image is encoded by the self-encoder, the size of the resulting encoded picture is [ B, C, H/8,W/8].
The self-encoder needs to be trained well before application and parameters are fixed in the subsequent training process. That is, the training of the self-encoder can be performed independently, the training method is not limited herein, and the self-encoder can also directly use the trained model.
Gaussian noise is gradually added to the low quality code map, and prediction noise in the resultant image after each step of adding gaussian noise is determined.
And determining error values of the prediction noise and the noise true value, and changing parameters of the original image enhancement network when the error values are larger than the preset loss values until the error values are smaller than the preset loss values, so as to obtain the trained target image enhancement network.
Specifically, in the training stage, the prediction noise can be obtained by calculation according to the input training sample image and the loss function of the model, and whether the image enhancement network is trained well can be determined according to the error value of the prediction noise and the noise true value and the preset loss value.
As shown in fig. 5, the workflow of the diffusion model-based image enhancement method of the present invention is divided into two parts, a training phase and a generating phase.
In the training stage, the input original image is an acquired high-quality image meeting the preset quality requirement and a low-quality image obtained by processing the high-quality image in a downsampling mode.
And encoding the original image by an Encoder (Encoder), mapping the original image from a pixel space to a hidden layer space to obtain an encoding feature map, and then gradually adding Gaussian noise into the encoded image based on a denoising diffusion probability model to obtain a noise image.
And the custom image enhancement options are custom image enhancement instructions, and the CLIP text editor is trained to edit the image enhancement instructions to obtain text codes. In the figure, a custom image enhancement instruction is encoded through the CLIP to generate an Embedding with the size of [ B, K, E ]. Where K represents the maximum coding length of the text and E represents the size of the assembled.
And gradually adding Gaussian noise into the coding feature map in a diffusion process based on a denoising diffusion probability model, and then determining the predicted noise in the result image after adding the Gaussian noise in each step through a noise prediction model based on U-Net. Meanwhile, a noise prediction model based on U-Net receives feature coding graphs of a high-quality coding graph and a low-quality coding graph and text coding of an image enhancement instruction, and trains a matching relation between the image enhancement instruction and an image based on a cross attention mechanism.
Based on the error values of the predicted noise and the real noise and the magnitude of the preset loss value, when the error values are larger than the preset loss value, updating parameters of a noise prediction model of the U-Net through a back propagation algorithm, wherein the parameters of the encoder and the CLIP text editor cannot be updated in the process of updating the parameters.
In the generation stage, an input low-quality image is encoded by an encoder to obtain a hidden layer image.
And gradually adding Gaussian noise into the coding feature map based on the diffusion process of the denoising diffusion probability model to obtain a target noise image obeying Gaussian distribution.
And iterating the T-shaped wheel through a denoising model based on the U-Net network and a reverse process based on a denoising diffusion probability model, and gradually removing noise in the denoised image to obtain the denoised image.
The denoised image is restored from the hidden layer space to an enhanced high quality image by a Decoder (Decoder).
As shown in fig. 6, in a second aspect, the present invention provides an image enhancement apparatus based on a diffusion model, the apparatus comprising:
the encoding module 201 is configured to obtain a target image to be enhanced, and encode the target image by using an encoder to obtain an encoding feature map;
the text encoding module 202 is configured to obtain an image enhancement instruction, encode the image enhancement instruction through a text editor, and obtain a text code; the image enhancement instruction comprises the characteristics and the positions of the image to be enhanced;
the input module 203 is configured to input the coding feature map and the text code into a pre-trained target image enhancement network;
The noise prediction module 204 is configured to gradually add gaussian noise to the coding feature map according to a preset noise addition rule and a preset number of steps, obtain a target noise image subject to gaussian distribution, and determine a prediction noise in a result image after adding the gaussian noise at each step;
the image enhancement module 205 is configured to perform image enhancement on an area corresponding to text encoding in the target noise image based on a cross attention mechanism, so as to obtain a noise enhanced image;
the denoising module 206 is configured to gradually remove the prediction noise of each step from the noise-added enhanced image according to a preset noise removal rule and a preset step number, so as to obtain a denoised image;
the decoding module 207 is configured to decode the denoised image by using a decoder, so as to obtain an enhanced image.
Further, the preset noise adding rule is determined based on a diffusion process of the denoising diffusion probability model; upon gradually adding gaussian noise to the encoding feature map in accordance with a preset noise addition rule and a preset number of steps, resulting in a target noise image that follows a gaussian distribution, the noise prediction module 204 is configured to perform:
according to the diffusion process of the denoising diffusion probability model, gaussian noise is added to the coding feature map in each step of the diffusion process; the parameter value of the added Gaussian noise is determined based on a preset noise time table;
And calculating a result image after adding Gaussian noise in each step of the diffusion process according to the coding feature map and the noise time table, and outputting the result image corresponding to the preset step number as a target noise image.
Further, in calculating the resultant image after adding gaussian noise for each step of the diffusion process from the coding feature map and the noise schedule, the noise prediction module 204 is specifically configured to perform:
the resulting image after adding gaussian noise for each step of the diffusion process is calculated according to the following formula:
wherein ,for the coding feature map before adding Gaussian noise, < +.>The result of adding noise corresponding to the time from the noise adding time to the time t;
,/>
for presetting the noise schedule, < >>Comprises->A parameter value representing the addition of Gaussian noise at each step of the diffusion process, and +.>
Further, the target noise image includes a plurality of image channels, and the cross attention mechanism includes a channel attention mechanism and a spatial attention mechanism; the image enhancement module 205 includes a first enhancement unit and a second enhancement unit;
the first enhancement unit is used for carrying out targeted enhancement on different image channels on the feature map corresponding to each image channel of the region corresponding to the text code in the target noise image through a channel attention mechanism to obtain a channel attention feature map;
The second enhancement unit is used for carrying out targeted enhancement on different spatial positions on the channel attention feature map through a spatial attention mechanism to obtain a noise enhanced image.
Further, the first enhancement unit is specifically configured to perform:
for the feature map of each image channel of the region corresponding to the text code in the target noise image, performing dimension reduction processing on the feature map according to the methods of maximum pooling and average pooling to obtain the global feature of the feature map corresponding to the image channel;
processing the global features through a multi-layer sensor to obtain the weight coefficient of the image channel;
weighting the feature images corresponding to the image channels through the weight coefficients to obtain weighted feature images;
and multiplying the weighted feature map and an image channel of the target noise image to obtain a channel attention feature map.
Further, the second enhancement unit is specifically configured to perform:
processing the channel attention feature map according to the methods of maximum pooling and average pooling to obtain a processing result;
performing connection operation on the processing result based on the corresponding image channel to obtain a connected feature map;
the connected feature images are reduced in dimension into a single channel by a convolution dimension reduction processing method, and a space feature image is obtained;
And multiplying the space feature image and the target noise image to obtain a noise-added enhanced image.
Further, the preset noise removal rule is determined based on the inverse process of the denoising diffusion probability model; the denoising module 206 is specifically configured to perform:
and removing the prediction noise determined in the diffusion process corresponding to the inverse process from the noise-added enhanced image at each step of the inverse process based on the inverse process of the denoising diffusion probability model.
Further, the device further comprises a model training module, which is used for training the original image enhancement network before the coding feature map and the text codes are input into the pre-trained target image enhancement network, so as to obtain the image enhancement network with the error value of the prediction noise and the real noise smaller than the preset loss value as the target image enhancement network.
Further, the model training module is specifically configured to perform:
acquiring a high-quality image meeting the preset quality requirement, and processing the high-quality image in a downsampling mode to obtain a corresponding low-quality image;
encoding the high-quality image and the low-quality image by an encoder to obtain a high-quality encoding picture and a low-quality encoding picture;
Gradually adding Gaussian noise into the low-quality coding diagram, and determining the prediction noise in the result image after adding Gaussian noise in each step;
and determining error values of the prediction noise and the noise true value, and changing parameters of the original image enhancement network when the error values are larger than the preset loss values until the error values are smaller than the preset loss values, so as to obtain the trained target image enhancement network.
It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described systems, apparatuses and units may refer to corresponding procedures in the foregoing method embodiments, which are not repeated herein.
In a third aspect, the present invention provides an electronic device comprising a processor and a memory having stored therein at least one instruction, at least one program, code set or instruction set, the at least one instruction, at least one program, code set or instruction set being loaded and executed by the processor to implement the diffusion model based image enhancement method of any of the above.
In a fourth aspect, the present invention provides a computer readable storage medium having stored therein at least one instruction, at least one program, code set, or instruction set, the at least one instruction, at least one program, code set, or instruction set being loaded and executed by a processor to implement a diffusion model based image enhancement method as in any of the above.
In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on a computer, the processes or functions in accordance with embodiments of the present invention are produced in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in or transmitted from one computer-readable storage medium to another, for example, by wired (e.g., coaxial cable, fiber optic, digital Subscriber Line (DSL)), or wireless (e.g., infrared, wireless, microwave, etc.) means from one website, computer, server, or data center. Computer readable storage media can be any available media that can be accessed by a computer or data storage devices, such as servers, data centers, etc., that contain an integration of one or more available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., solid State Disk (SSD)), etc.
The foregoing is merely illustrative of specific embodiments of the present invention, and the scope of the present invention is not limited thereto, but any changes or substitutions within the technical scope of the present invention should be covered by the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (12)

1. A diffusion model-based image enhancement method, the method comprising:
acquiring a target image to be enhanced, and encoding the target image through an encoder to obtain an encoding feature map;
acquiring an image enhancement instruction, and encoding the image enhancement instruction through a text editor to obtain a text code; the image enhancement instruction comprises the characteristics and the positions of the image to be enhanced;
inputting the coding feature map and the text codes into a pre-trained target image enhancement network;
according to a preset noise adding rule and a preset step number, gradually adding Gaussian noise into the coding feature map to obtain a target noise image obeying Gaussian distribution, and determining the prediction noise in a result image after adding Gaussian noise in each step;
based on a cross attention mechanism, performing image enhancement on a region corresponding to the text code in the target noise image to obtain a noise-added enhanced image;
According to a preset noise removal rule and the preset step number, the prediction noise of each step is gradually removed from the noise-added enhanced image, and a denoised image is obtained;
and decoding the denoised image through a decoder to obtain an enhanced image.
2. The image enhancement method according to claim 1, wherein the preset noise addition rule is determined based on a diffusion process of a denoising diffusion probability model; gradually adding Gaussian noise into the coding feature map according to a preset noise adding rule and a preset step number to obtain a target noise image obeying Gaussian distribution, wherein the method specifically comprises the following steps of:
according to the diffusion process of the denoising diffusion probability model, gaussian noise is added to the coding feature map in each step of the diffusion process; the parameter value of the added Gaussian noise is determined based on a preset noise time table;
and calculating a result image after adding the Gaussian noise in each step of the diffusion process according to the coding feature map and the noise time table, and outputting the result image corresponding to the preset step number as a target noise image.
3. The image enhancement method according to claim 2, wherein the calculating the result image after adding the gaussian noise for each step of the diffusion process according to the coding feature map and the noise schedule is specifically:
Calculating a result image of the diffusion process after adding the Gaussian noise at each step according to the following formula:
wherein ,for the coding feature map before adding Gaussian noise, < +.>The result of adding noise corresponding to the time from the noise adding time to the time t;
,/>
for presetting the noise schedule, < >>Comprises->A parameter value representing the addition of Gaussian noise at each step of the diffusion process, and +.>
4. The image enhancement method according to claim 1, wherein the target noise image comprises a plurality of image channels, and the cross-attention mechanism comprises a channel attention mechanism and a spatial attention mechanism; the image enhancement is performed on the region corresponding to the text code in the target noise image based on the cross attention mechanism to obtain a noise enhanced image, and the method specifically comprises the following steps:
through the channel attention mechanism, carrying out pertinence enhancement on different image channels on the feature map corresponding to each image channel of the region corresponding to the text code in the target noise image to obtain a channel attention feature map;
and carrying out targeted enhancement of different spatial positions on the channel attention feature map through the spatial attention mechanism to obtain a noise-added enhanced image.
5. The method for enhancing an image according to claim 4, wherein the step of performing, by the channel attention mechanism, the targeted enhancement of different image channels on the feature map corresponding to each image channel of the region corresponding to the text code in the target noise image to obtain a channel attention feature map specifically includes:
for the feature map of each image channel of the region corresponding to the text code in the target noise image, performing dimension reduction processing on the feature map according to a maximum pooling and average pooling method to obtain global features of the feature map corresponding to the image channel;
processing the global features through a multi-layer sensor to obtain the weight coefficient of the image channel;
weighting the feature images corresponding to the image channels through the weight coefficients to obtain weighted feature images;
and multiplying the weighted feature map and the image channel of the target noise image to obtain a channel attention feature map.
6. The image enhancement method according to claim 5, wherein the channel attention feature map is subjected to targeted enhancement of different spatial positions through the spatial attention mechanism to obtain a noisy enhanced image, and the method specifically comprises:
Processing the channel attention feature map according to the methods of maximum pooling and average pooling to obtain a processing result;
performing connection operation on the processing result based on the corresponding image channel to obtain a connected feature map;
the connected feature images are subjected to dimension reduction into a single channel by a convolution dimension reduction processing method, so that a space feature image is obtained;
and multiplying the space feature image and the target noise image to obtain a noise-added enhanced image.
7. The image enhancement method according to claim 2, wherein the preset noise removal rule is determined based on a reverse process of a denoising diffusion probability model; the step of gradually removing the prediction noise of each step from the noise-added enhanced image according to a preset noise removal rule and the preset step number specifically comprises the following steps:
and removing the prediction noise determined in the diffusion process corresponding to the inverse process from the noise-added enhanced image at each step of the inverse process based on the inverse process of the denoising diffusion probability model.
8. The image enhancement method according to claim 1, wherein prior to said inputting said encoding feature map and said text encoding into a pre-trained target image enhancement network, said method further comprises:
Training an original image enhancement network to obtain an image enhancement network with the error value of the predicted noise and the real noise smaller than a preset loss value as a target image enhancement network.
9. The image enhancement method according to claim 8, wherein training the original image enhancement network to obtain an image enhancement network with an error value of prediction noise and real noise smaller than a preset loss value as the target image enhancement network specifically comprises:
acquiring a high-quality image meeting a preset quality requirement, and processing the high-quality image in a downsampling mode to obtain a corresponding low-quality image;
encoding the high-quality image and the low-quality image by an encoder to obtain a high-quality encoding diagram and a low-quality encoding diagram;
gradually adding Gaussian noise into the low-quality coding diagram, and determining the prediction noise in the result image after adding Gaussian noise in each step;
and determining error values of the prediction noise and the noise true value, and changing parameters of the original image enhancement network when the error values are larger than preset loss values until the error values are smaller than the preset loss values, so as to obtain the trained target image enhancement network.
10. An image enhancement device based on a diffusion model, the device comprising:
the coding module is used for acquiring a target image to be enhanced, and coding the target image through the coder to obtain a coding feature map;
the text coding module is used for acquiring the image enhancement instruction, and coding the image enhancement instruction through the text editor to obtain a text code; the image enhancement instruction comprises the characteristics and the positions of the image to be enhanced;
the input module is used for inputting the coding feature map and the text codes into a pre-trained target image enhancement network;
the noise prediction module is used for gradually adding Gaussian noise into the coding feature map according to a preset noise adding rule and a preset step number to obtain a target noise image obeying Gaussian distribution, and determining the prediction noise in a result image after Gaussian noise is added in each step;
the image enhancement module is used for carrying out image enhancement on the region corresponding to the text code in the target noise image based on a cross attention mechanism to obtain a noise enhanced image;
the denoising module is used for gradually removing the prediction noise of each step from the noise-added enhanced image according to a preset noise removal rule and the preset step number to obtain a denoised image;
And the decoding module is used for decoding the denoised image through a decoder to obtain an enhanced image.
11. An electronic device comprising a processor and a memory having stored therein at least one instruction, at least one program, a set of codes, or a set of instructions, the at least one instruction, the at least one program, the set of codes, or the set of instructions being loaded and executed by the processor to implement the diffusion model-based image enhancement method of any one of claims 1-9.
12. A computer readable storage medium having stored therein at least one instruction, at least one program, code set, or instruction set, the at least one instruction, the at least one program, the code set, or instruction set being loaded and executed by a processor to implement the diffusion model-based image enhancement method of any of claims 1-9.
CN202310922672.7A 2023-07-26 2023-07-26 Diffusion model-based image enhancement method, device, equipment and storage medium Pending CN116664450A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310922672.7A CN116664450A (en) 2023-07-26 2023-07-26 Diffusion model-based image enhancement method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310922672.7A CN116664450A (en) 2023-07-26 2023-07-26 Diffusion model-based image enhancement method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN116664450A true CN116664450A (en) 2023-08-29

Family

ID=87724422

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310922672.7A Pending CN116664450A (en) 2023-07-26 2023-07-26 Diffusion model-based image enhancement method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN116664450A (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117315056A (en) * 2023-11-27 2023-12-29 支付宝(杭州)信息技术有限公司 Video editing method and device
CN117372631A (en) * 2023-12-07 2024-01-09 之江实验室 Training method and application method of multi-view image generation model
CN117392464A (en) * 2023-12-07 2024-01-12 哈尔滨工业大学(深圳)(哈尔滨工业大学深圳科技创新研究院) Image anomaly detection method and system based on multi-scale denoising probability model
CN117474796A (en) * 2023-12-27 2024-01-30 浪潮电子信息产业股份有限公司 Image generation method, device, equipment and computer readable storage medium
CN117576518A (en) * 2024-01-15 2024-02-20 第六镜科技(成都)有限公司 Image distillation method, apparatus, electronic device, and computer-readable storage medium
CN117649362A (en) * 2024-01-29 2024-03-05 山东师范大学 Laparoscopic image smoke removal method, system and equipment based on conditional diffusion model
CN117745593A (en) * 2023-12-29 2024-03-22 湖北大学 Diffusion model-based old photo scratch repairing method and system
CN117830483A (en) * 2023-12-27 2024-04-05 北京智象未来科技有限公司 Image-based video generation method, device, equipment and storage medium
CN118097363A (en) * 2024-04-28 2024-05-28 南昌大学 Face image generation and recognition method and system based on near infrared imaging

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110052031A1 (en) * 2009-09-02 2011-03-03 Thorsten Feiweier Method and magnetic resonance system to correct distortions in image data
US20180082107A1 (en) * 2015-03-27 2018-03-22 Intel Corporation Low-cost face recognition using gaussian receptive field features
WO2022083026A1 (en) * 2020-10-21 2022-04-28 华中科技大学 Ultrasound image denoising model establishing method and ultrasound image denoising method
CN115690487A (en) * 2022-10-09 2023-02-03 华东理工大学 Small sample image generation method
KR20230032673A (en) * 2021-08-31 2023-03-07 서울대학교산학협력단 Speech synthesis system that can control the generation speed
CN115908187A (en) * 2022-12-07 2023-04-04 北京航空航天大学 Image characteristic analysis and generation method based on rapid denoising diffusion probability model
US20230144433A1 (en) * 2021-11-05 2023-05-11 Nuctech Company Limited Method and system of verifying authenticity of declaration information, device and medium
CN116129465A (en) * 2022-12-07 2023-05-16 乐歌人体工学科技股份有限公司 Progressive head posture estimation method based on probability diffusion model
CN116309682A (en) * 2023-03-21 2023-06-23 复旦大学 Medical image segmentation method based on conditional Bernoulli diffusion
CN116385848A (en) * 2023-03-27 2023-07-04 重庆理工大学 AR display device image quality improvement and intelligent interaction method based on stable diffusion model
DE102023001698A1 (en) * 2023-04-27 2023-07-06 Mercedes-Benz Group AG Method for an automated generation of data for raster map-based prediction approaches

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110052031A1 (en) * 2009-09-02 2011-03-03 Thorsten Feiweier Method and magnetic resonance system to correct distortions in image data
US20180082107A1 (en) * 2015-03-27 2018-03-22 Intel Corporation Low-cost face recognition using gaussian receptive field features
WO2022083026A1 (en) * 2020-10-21 2022-04-28 华中科技大学 Ultrasound image denoising model establishing method and ultrasound image denoising method
KR20230032673A (en) * 2021-08-31 2023-03-07 서울대학교산학협력단 Speech synthesis system that can control the generation speed
US20230144433A1 (en) * 2021-11-05 2023-05-11 Nuctech Company Limited Method and system of verifying authenticity of declaration information, device and medium
CN115690487A (en) * 2022-10-09 2023-02-03 华东理工大学 Small sample image generation method
CN115908187A (en) * 2022-12-07 2023-04-04 北京航空航天大学 Image characteristic analysis and generation method based on rapid denoising diffusion probability model
CN116129465A (en) * 2022-12-07 2023-05-16 乐歌人体工学科技股份有限公司 Progressive head posture estimation method based on probability diffusion model
CN116309682A (en) * 2023-03-21 2023-06-23 复旦大学 Medical image segmentation method based on conditional Bernoulli diffusion
CN116385848A (en) * 2023-03-27 2023-07-04 重庆理工大学 AR display device image quality improvement and intelligent interaction method based on stable diffusion model
DE102023001698A1 (en) * 2023-04-27 2023-07-06 Mercedes-Benz Group AG Method for an automated generation of data for raster map-based prediction approaches

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
OMRI AVRAHAMI等: "Blended Diffusion for Text-driven Editing of Natural Images", 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), pages 18187 - 18197 *
保罗•加莱奥内: "TensorFlow 2.0神经网络实践", 北京:机械工业出版社, pages: 187 - 190 *
朱芮: "多模态医学图像融合若干问题研究", 《中国博士学位论文全文数据库 (医药卫生科技辑)》, vol. 2022, no. 04, pages 060 - 2 *
赵宏: "基于扩散生成对抗网络的文本生成图像模型研究", 《电子与信息学报》, vol. 44, no. 0, pages 1 - 11 *

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117315056A (en) * 2023-11-27 2023-12-29 支付宝(杭州)信息技术有限公司 Video editing method and device
CN117315056B (en) * 2023-11-27 2024-03-19 支付宝(杭州)信息技术有限公司 Video editing method and device
CN117372631B (en) * 2023-12-07 2024-03-08 之江实验室 Training method and application method of multi-view image generation model
CN117372631A (en) * 2023-12-07 2024-01-09 之江实验室 Training method and application method of multi-view image generation model
CN117392464A (en) * 2023-12-07 2024-01-12 哈尔滨工业大学(深圳)(哈尔滨工业大学深圳科技创新研究院) Image anomaly detection method and system based on multi-scale denoising probability model
CN117474796A (en) * 2023-12-27 2024-01-30 浪潮电子信息产业股份有限公司 Image generation method, device, equipment and computer readable storage medium
CN117474796B (en) * 2023-12-27 2024-04-05 浪潮电子信息产业股份有限公司 Image generation method, device, equipment and computer readable storage medium
CN117830483A (en) * 2023-12-27 2024-04-05 北京智象未来科技有限公司 Image-based video generation method, device, equipment and storage medium
CN117745593A (en) * 2023-12-29 2024-03-22 湖北大学 Diffusion model-based old photo scratch repairing method and system
CN117745593B (en) * 2023-12-29 2024-07-05 湖北大学 Diffusion model-based old photo scratch repairing method and system
CN117576518A (en) * 2024-01-15 2024-02-20 第六镜科技(成都)有限公司 Image distillation method, apparatus, electronic device, and computer-readable storage medium
CN117576518B (en) * 2024-01-15 2024-04-23 第六镜科技(成都)有限公司 Image distillation method, apparatus, electronic device, and computer-readable storage medium
CN117649362A (en) * 2024-01-29 2024-03-05 山东师范大学 Laparoscopic image smoke removal method, system and equipment based on conditional diffusion model
CN117649362B (en) * 2024-01-29 2024-04-26 山东师范大学 Laparoscopic image smoke removal method, system and equipment based on conditional diffusion model
CN118097363A (en) * 2024-04-28 2024-05-28 南昌大学 Face image generation and recognition method and system based on near infrared imaging

Similar Documents

Publication Publication Date Title
CN116664450A (en) Diffusion model-based image enhancement method, device, equipment and storage medium
CN111539879B (en) Video blind denoising method and device based on deep learning
CN112233038B (en) True image denoising method based on multi-scale fusion and edge enhancement
CN113658051A (en) Image defogging method and system based on cyclic generation countermeasure network
CN111091503B (en) Image defocusing and blurring method based on deep learning
CN112164011B (en) Motion image deblurring method based on self-adaptive residual error and recursive cross attention
US11430090B2 (en) Method and apparatus for removing compressed Poisson noise of image based on deep neural network
CN111199521B (en) Video deblurring three-dimensional convolution depth network method embedded with Fourier aggregation
CN110796622B (en) Image bit enhancement method based on multi-layer characteristics of series neural network
CN113570516B (en) Image blind motion deblurring method based on CNN-Transformer hybrid self-encoder
Thomas et al. A reduced-precision network for image reconstruction
US20220414838A1 (en) Image dehazing method and system based on cyclegan
CN113066034A (en) Face image restoration method and device, restoration model, medium and equipment
CN116681584A (en) Multistage diffusion image super-resolution algorithm
CN105338219B (en) Video image denoising method and apparatus
Yuan et al. Single image dehazing via NIN-DehazeNet
CN114723630A (en) Image deblurring method and system based on cavity double-residual multi-scale depth network
CN113724136A (en) Video restoration method, device and medium
Saleem et al. A non-reference evaluation of underwater image enhancement methods using a new underwater image dataset
KR102057395B1 (en) Video generation method using video extrapolation based on machine learning
Chen et al. Image denoising via generative adversarial networks with detail loss
CN115018726A (en) U-Net-based image non-uniform blur kernel estimation method
CN114862699A (en) Face repairing method, device and storage medium based on generation countermeasure network
CN114841870A (en) Image processing method, related device and system
CN115861048A (en) Image super-resolution method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20230829