CN117994167B - Diffusion model defogging method integrating parallel multi-convolution attention - Google Patents
Diffusion model defogging method integrating parallel multi-convolution attention Download PDFInfo
- Publication number
- CN117994167B CN117994167B CN202410045689.3A CN202410045689A CN117994167B CN 117994167 B CN117994167 B CN 117994167B CN 202410045689 A CN202410045689 A CN 202410045689A CN 117994167 B CN117994167 B CN 117994167B
- Authority
- CN
- China
- Prior art keywords
- image
- model
- defogging
- convolution
- noise
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 57
- 238000009792 diffusion process Methods 0.000 title claims abstract description 45
- 238000012549 training Methods 0.000 claims abstract description 32
- 230000008569 process Effects 0.000 claims abstract description 27
- 238000012360 testing method Methods 0.000 claims description 19
- 230000000694 effects Effects 0.000 claims description 14
- 230000004927 fusion Effects 0.000 claims description 12
- 229920002776 polycyclohexyl methacrylate Polymers 0.000 claims description 12
- AFDXODALSZRGIH-QPJJXVBHSA-N (E)-3-(4-methoxyphenyl)prop-2-enoic acid Chemical compound COC1=CC=C(\C=C\C(O)=O)C=C1 AFDXODALSZRGIH-QPJJXVBHSA-N 0.000 claims description 11
- 101100110224 Oreochromis mossambicus atp2b2 gene Proteins 0.000 claims description 11
- 238000005070 sampling Methods 0.000 claims description 8
- 230000006870 function Effects 0.000 claims description 7
- 238000007476 Maximum Likelihood Methods 0.000 claims description 5
- 238000012545 processing Methods 0.000 claims description 5
- 230000004913 activation Effects 0.000 claims description 4
- 238000002474 experimental method Methods 0.000 claims description 4
- 238000010606 normalization Methods 0.000 claims description 4
- 238000011176 pooling Methods 0.000 claims description 4
- 238000004364 calculation method Methods 0.000 claims description 3
- 239000000284 extract Substances 0.000 claims description 3
- 230000007246 mechanism Effects 0.000 claims description 3
- 238000006243 chemical reaction Methods 0.000 claims description 2
- 238000011156 evaluation Methods 0.000 abstract description 7
- 238000013135 deep learning Methods 0.000 abstract description 5
- 208000037170 Delayed Emergence from Anesthesia Diseases 0.000 abstract 1
- 230000000007 visual effect Effects 0.000 description 6
- 230000007547 defect Effects 0.000 description 4
- 238000010586 diagram Methods 0.000 description 4
- HNWFFTUWRIGBNM-UHFFFAOYSA-N 2-methyl-9,10-dinaphthalen-2-ylanthracene Chemical compound C1=CC=CC2=CC(C3=C4C=CC=CC4=C(C=4C=C5C=CC=CC5=CC=4)C4=CC=C(C=C43)C)=CC=C21 HNWFFTUWRIGBNM-UHFFFAOYSA-N 0.000 description 3
- 230000015572 biosynthetic process Effects 0.000 description 3
- 238000003786 synthesis reaction Methods 0.000 description 3
- 238000013527 convolutional neural network Methods 0.000 description 2
- 238000001514 detection method Methods 0.000 description 2
- LJROKJGQSPMTKB-UHFFFAOYSA-N 4-[(4-hydroxyphenyl)-pyridin-2-ylmethyl]phenol Chemical compound C1=CC(O)=CC=C1C(C=1N=CC=CC=1)C1=CC=C(O)C=C1 LJROKJGQSPMTKB-UHFFFAOYSA-N 0.000 description 1
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000015556 catabolic process Effects 0.000 description 1
- 230000019771 cognition Effects 0.000 description 1
- 239000003086 colorant Substances 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000013136 deep learning model Methods 0.000 description 1
- 238000006731 degradation reaction Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000009191 jumping Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 238000002360 preparation method Methods 0.000 description 1
- 238000013441 quality evaluation Methods 0.000 description 1
- 238000011084 recovery Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0464—Convolutional networks [CNN, ConvNet]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02A—TECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
- Y02A90/00—Technologies having an indirect contribution to adaptation to climate change
- Y02A90/10—Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Software Systems (AREA)
- General Physics & Mathematics (AREA)
- Computing Systems (AREA)
- Biophysics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Mathematical Physics (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Databases & Information Systems (AREA)
- Medical Informatics (AREA)
- Multimedia (AREA)
- Image Processing (AREA)
- Image Analysis (AREA)
Abstract
The invention belongs to the field of deep learning, and particularly relates to a HazeDiffusion diffusion model defogging method integrating parallel multi-convolution attention, which comprises the following steps: constructing a data set; constructing a diffusion network model HazeDiffusion; training on the built HazeDiffusion model by utilizing the belonging training set; acquiring a foggy image to be recovered, and carrying out defogging enhancement on the foggy image through a HazeDiffusion model after training; an evaluation index is established for evaluation of the HazeDiffusion model. The invention is based on a diffusion model, introduces a parallel multi-convolution attention residual block (PMCA), and the PMCA module comprises two parts of parallel attention and parallel multi-convolution, and performs multi-scale connection through residual errors; and the size of the input image is adjusted through double three times of downsampling, and then the defogging image is upsampled by using the Laplacian pyramid, so that the model can process high-resolution images, and the efficiency of the diffusion model is indirectly improved.
Description
Technical Field
The invention belongs to the technical field of deep learning, and particularly relates to a diffusion model defogging method integrating parallel multi-convolution attention.
Background
Haze can absorb and reflect light in the air, and under the condition of poor weather conditions, the acquired image quality is seriously insufficient, and problems of fuzzy details, color distortion, low contrast and the like often exist, so that the information identification degree of the image is reduced, and the performance of a subsequent series of advanced visual tasks such as object detection, scene identification, automatic driving and the like is seriously influenced. Therefore, it is very important to study how to obtain a clear image from a degraded image obtained in a foggy scene. The image defogging aim is to eliminate the influence of haze in the image, recover clear images from blurred images and restore details of the images.
At present, image defogging studies are mainly classified into three categories: feature and a priori based methods and learning based methods. And carrying out corresponding defogging treatment based on the characteristic and priori based method and the estimation of the atmospheric scattering model. Although the method can restore good image details, when the adopted assumption and the prior are not established in some specific scenes, the problems of supersaturation of defocused images, color distortion, difficulty in processing sky areas and the like can be caused. CNN has made great progress in some tasks in recent years, and there is also a great deal of CNN-based related work for defogging algorithms. The methods can be mainly divided into two types, the first type is still based on an atmospheric degradation model, parameters in the model are estimated by using a neural network, and the early methods are mostly based on the idea. The second type is to directly output and obtain defogged images by using the input foggy images, namely end2end in deep learning.
The deep learning image generation method has a certain result in image defogging, but the generation method based on the generation of the countermeasure network is limited by various limits, a plurality of networks need to be trained, the model is difficult to converge, and optimization instability, network collapse and the like are easily caused. The application of the diffusion model (DDPM) to image denoising successfully solves the problem of unstable generation of the countermeasure network training, and gradually takes the dominant role in the field of image generation. However, the diffusion model has the defects of low sampling speed, poor maximum likelihood, weak data generalization capability and the like. Many studies have made many efforts to solve the above-mentioned limitations from the practical point of view today.
Disclosure of Invention
Aiming at the technical problems that the diffusion model has the defects of low sampling speed, poor maximum likelihood, weak data generalization capability and the like, the invention provides a diffusion model defogging method integrating parallel multi-convolution attention, and the working focus is focused on the improvement of a noise estimation network in a reverse process, so that the diffusion model defogging method can be better applied to image defogging. Firstly, providing a parallel multi-convolution attention residual error block PMCA, wherein the PMCA module mainly comprises two parts of parallel attention and parallel multi-convolution, and multi-scale connection is carried out through residual errors; a SKFusion (SELECTIVE KERNEL Fuison) fusion mode for improving the self-selective convolution kernel network is introduced, the size of an input image is adjusted through double three times of downsampling, and then a Laplacian pyramid upsampling defogging image is used, so that the model can process a high-resolution image, and the efficiency of a diffusion model is indirectly improved.
In order to solve the technical problems, the invention adopts the following technical scheme:
a diffusion model defogging method integrating parallel multi-convolution attention comprises the following steps:
S1, based on a conditional diffusion model, improving a reverse process noise estimation network to construct an image defogging model HazeDiffusion;
S2, introducing SKFusion fusion modes, and realizing more specific and rich acquisition of information of each scale through dynamic feature fusion and jump connection;
S3, designing a PMCA module by combining pixels, channels and cross attention, and more accurately acquiring characteristics of condition information; through parallel convolution and residual error learning, the model is enabled to pay more attention to the haze area of the image more flexibly, and pay more attention to the local characteristics of the hazy image better;
S4, extracting high-frequency features by using bicubic downsampling to reduce the image size, recovering a high-resolution image by adopting an upsampling method based on a Laplacian pyramid, and improving the processing efficiency of the model.
The data sample of the image defogging model HazeDiffusion in S1 is RESIDE dataset, the RESIDE dataset is one of widely used datasets with image defogging standard, and the RESIDE dataset is composed of five subsets: indoor training set ITS, outdoor training set OTS, comprehensive target testing set SOTS, real world task driving testing set RTTS and hybrid subjective testing set HSTS; ITS and OTS are synthetic datasets, RTTS is a real world dataset, and HSTS consists of synthetic and real hazy images. Experiments trained a model on ITS dataset containing 100000 image pairs and tested on the SOTS indoor dataset of 500 pairs of images; models were trained on OTS containing 313950 image pairs and tested on the outdoor test set of SOTS for 500 pairs of images.
The main structure of the model in the S1 is a conditional defogging diffusion model fused with a foggy image, the diffusion model is a depth generation model, noise is added to available training data, then the process is reversed to restore the data, and the model gradually learns to eliminate the noise; gradually increasing Gaussian noise to the clear fog-free image in the diffusion process until the clear fog-free image becomes a pure noise image; the reverse process is the reverse process of the forward process, a random Gaussian noise is generated, the Gaussian noise and the hazy image Haze are input into a network model fusing parallel multi-convolution attention together, and a clear image is recovered through the reverse defogging process; the Haze of the hazy image is used as a condition to be added into the diffusion model, so that a defogging condition diffusion model is obtained, the problems that the defogging effect of the real image is poor and the indoor and outdoor data sets are separated and trained are complicated can be successfully solved, and the defogging effect of the image is improved.
The method for constructing the image defogging model HazeDiffusion in the S1 comprises the following steps: the HazeDiffusion model comprises two large modules, namely Diffusion Process and Reverse Process; the Diffusion Process module is a noise adding module, randomly generates a noise and is connected with the image; the Reverse Process module is a noise prediction module, inputs a Gaussian noise image and a foggy image into a convolution layer, and calculates time embedding for a noise level t; the downsampling stage sequentially comprises a convolution layer, a RESWITHATTN layer, a PCMA layer and a downsampling layer; at an intermediate stage of the network are RESWITHATTN layers and PCMA layers; the up-sampling stage comprises RESWITHATTN layers, a PCMA layer, an up-sampling layer and RESWITHATTN layers in sequence; fusing feature maps from different stages using an SK fusion module; the PCMA module comprises parallel attention and parallel multi-convolution, uses GroupNorm layers of data for normalization, and uses residual error connection to enrich characteristic information in the module; parallel multi-convolution extracts features using depth separable convolutions of different convolution kernel sizes, including 7 x 7, 5 x 5, and 3 x 3 convolutions.
The image defogging model HazeDiffusion training method comprises the following steps:
in HazeDiffusion network models constructed by the training set, an L1 loss supervision model is adopted to calculate the average error between the clear image and the defogging image, and the training is carried out through the maximum likelihood estimation output by the network; the loss formula is defined as follows:
Where n is the total number of samples of the training set, f (x) is the noise image generated, y i is the estimated noise image, and the L1 penalty function optimizes the model by detecting the absolute value of the difference between f (x) and y i;
In the training process, the diffusion model takes real data and pure noise as input samples, the output model estimates added noise, then calculates loss with the real noise at each moment, and iteratively updates model parameters.
The SKFusion fusion module in the S2 dynamically fuses the feature maps from different stages, SKFusion fuses the improved self-selective convolution kernel network, and by using the channel attention to fuse a plurality of feature branches, two feature maps are respectively set as x 1 and x 2, wherein x 1 is the feature map from jump connection, and x 2 is the feature map from the output of the network module; first x 1 is obtained by passing PWConv (PointWise Conv) layerThen, obtaining a fusion weight by using global average pooling, a multi-layer perceptron, a Softmax activation function and Split operation;
{a1,a2}=Split(Soft max(Fmlp(GAP(x1+x2)))
By passing through Will beAnd x 2; wherein GAP represents global average pooling, F mlp represents multi-layer perceptron, softma represents Softmax activation function, split represents Split operation.
In the step S3, a parallel multi-convolution attention PMCA module is designed for the improved noise estimation network, wherein the parallel multi-convolution attention PMCA module comprises parallel attention and parallel multi-convolution, groupNorm layers of data are used for normalization, so that training is more stable, and residual error connection is used for enriching characteristic information in the module; the parallel connection of a plurality of depth separable convolution layers with different scales effectively aggregates spatial information and conversion characteristics; paralleling multiple attention mechanisms can strengthen the model's focus on global and local features.
In the step S4, the image size is reduced by extracting high-frequency features through bicubic downsampling, the size of an input image is adjusted to 256×256 pixels through bicubic downsampling, and the calculation efficiency of a diffusion model is improved by reducing the input of the model; in order to obtain a defogging image with high quality, a low-resolution image generated by Laplacian pyramid processing is introduced, the image resolution is recovered, and the Laplacian pyramid retains most edges of the image on the aspect of improving the image resolution, so that detail blurring is avoided, artifacts are reduced, the process is simple, and the calculated amount is reduced.
Compared with the prior art, the invention has the beneficial effects that:
The invention provides a defogging method HazeDiffusion of a diffusion model integrating parallel multi-convolution attention, which combines the advantages of a conditional diffusion model and a deep learning model, solves the problems of incomplete defogging, color distortion, detail blurring and the like of the existing defogging algorithm to a certain extent, simplifies the complicated process of indoor and outdoor separate training, and effectively improves the performance of image generation on defogging tasks. The invention obtains the PSNR value of 27.8163 and the SSIM value of 0.9422 on the indoor synthetic foggy data set, and obtains the PSNR value of 29.2764 and the SSIM value of 0.9583 on the outdoor synthetic data set; the information entropy (Entropy), the fog density estimation (FADE) and the image Visual Information Fidelity (VIF) in the real fog data set obtain very high evaluation scores, and a Entropy value of 6.6685, a FADE value of 0.5843 and a VIF value of 0.9245 are obtained; and also excellent in subjective visual quality.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below. It will be apparent to those skilled in the art from this disclosure that the drawings described below are merely exemplary and that other embodiments may be derived from the drawings provided without undue effort.
The structures, proportions, sizes, etc. shown in the present specification are shown only for the purposes of illustration and description, and are not intended to limit the scope of the invention, which is defined by the claims, so that any structural modifications, changes in proportions, or adjustments of sizes, which do not affect the efficacy or the achievement of the present invention, should fall within the scope of the invention.
FIG. 1 is a diagram of the overall structure of HazeDiffusion of the present invention;
FIG. 2 is a block diagram of a PMCA block in HazeDiffusion model of the present invention;
FIG. 3 is a block diagram of ParaConv of the PMCA blocks of the present invention;
FIG. 4 is a block diagram of ParaAtm of the PMCA blocks of the present invention;
FIG. 5 is a graph comparing experimental results of the HazeDiffusion model and other image defogging methods used in the present invention to a true foggy image.
Detailed Description
For the purposes of making the objects, technical solutions and advantages of the embodiments of the present application more apparent, the technical solutions in the embodiments of the present application will be clearly and completely described below, and it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments, and these descriptions are only for further illustrating the features and advantages of the present application, not limiting the claims of the present application; all other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
The following describes in further detail the embodiments of the present invention with reference to the drawings and examples. The following examples are illustrative of the invention and are not intended to limit the scope of the invention.
The embodiment is realized under pytorch deep learning framework, and provides a HazeDiffusion diffusion model defogging method integrating multiple convolution attentions, which specifically comprises the following steps:
1. Data preparation
The data samples of this embodiment are from RESIDE datasets.
RESIDE is one of the widely used datasets for which image defogging is more standard. It consists of five subsets: indoor Training Set (ITS), outdoor Training Set (OTS), synthetic target test set (SOTS), real world task driven test set (RTTS), and Hybrid Subjective Test Set (HSTS). ITS and OTS are synthetic datasets, RTTS is a real world dataset, and HSTS consists of synthetic and real hazy images. Experiments trained a model on ITS dataset containing 100000 image pairs and tested on the SOTS indoor dataset of 500 pairs of images; models were trained on OTS containing 313950 image pairs and tested on the outdoor test set of SOTS for 500 pairs of images.
During training, the images are randomly cropped to a size of 256×256.
2. Model construction
The main body framework of the HazeDiffusion model is a U-Net structure, the specific network structure is shown in figure 1, and the HazeDiffusion model comprises two large modules, namely Diffusion Process and Reverse Process. The Diffusion Process module is a noise adding module, which randomly generates a noise and connects with the image. The Reverse Process module is a noise prediction module, inputs a Gaussian noise image and a foggy image into a convolution layer, and calculates time embedding for a noise level t; the downsampling stage sequentially comprises a convolution layer, a RESWITHATTN layer, a PCMA layer and a downsampling layer; at an intermediate stage of the network are RESWITHATTN layers and PCMA layers; the up-sampling stage comprises RESWITHATTN layers, a PCMA layer, an up-sampling layer and RESWITHATTN layers in sequence; feature maps from different phases are fused using an SK fusion module. As shown in fig. 2, the PCMA module includes parallel attention and parallel multi-convolution, normalized using GroupNorm layers of data, and enriched feature information using residual connections in the module. As shown in fig. 3, the parallel multi-convolution extracts features using depth separable convolutions of different convolution kernel sizes, including 7 x 7, 5 x 5, and 3 x 3 convolutions. As shown in fig. 4, 5, the parallel attention mechanism connects pixel attention, channel attention, and cross attention in parallel and compensates for global features by jumping connections.
3. Model training
And in HazeDiffusion network models constructed by the training set, an L1 loss supervision model is adopted to calculate the average error between the clear image and the defogging image, and the training is carried out through the maximum likelihood estimation output by the network. The loss formula is defined as follows:
Where n is the total number of samples of the training set, f (x) is the noise image generated, y i is the estimated noise image, and the L1 penalty function optimizes the model by detecting the absolute value of the difference between f (x) and y i.
In the training process, the diffusion model takes real data and pure noise as input samples, the output model estimates added noise, then calculates loss with the real noise at each moment, and iteratively updates model parameters.
4. Test results
During training, samples were taken uniformly for times T to {0, …, T } and t=2000 was set in all experiments, with β increasing from 1e-6 to 1e-2, and the images were randomly cropped to 256 x 256 sizes. The parameters were updated and iterated using Adam optimizer and back propagation algorithm with the fixed learning rate set to 1e-4.
5. Model evaluation
Peak signal-to-noise ratio (PSNR) and Structural Similarity (SSIM) evaluation indexes are calculated using the reconstruction result and the real image to evaluate the performance of the model.
Table 1 test results table for different algorithms pair-synthesized hazy data set
The evaluation of the effect of the synthesized hazy image is carried out in SOTS data sets, the defogging effect of 500 pairs of indoor images and 500 pairs of outdoor images is respectively evaluated, PSNR and SSIM calculation is carried out by adopting defogging images and clear images, and compared with six different defogging algorithms, namely DCP, AOD-Net, PCFAN, MADN, dehazeFormer and MixDehazeNet, the optimal indexes in table 1 are bold fonts.
Through table 1, it is found that HazeDiffusion under the same experimental environment, compared with a comparison algorithm, the results of indoor synthesis of a foggy dataset and outdoor synthesis of a foggy dataset PSMR and SSIM are excellent, and in the indoor synthesis of the foggy dataset, PSNR and SSIM are respectively improved by 1.1641 and 0.0114 compared with MADN; compared to the DehazeFormer and MixDehazeNet methods based on Vision Transformer structures, dehazeFormer and MixDehazeNet all showed good defogging effect on the indoor and outdoor synthetic datasets. Although DehazeFormer and MixDehazeNet perform well on the synthetic dataset, the test results of the synthetic dataset cannot be used as a reference index for actual scene applications, and applying image defogging to an actual scene requires verifying the effect of the algorithm on the actual image defogging.
Because it is difficult to obtain a corresponding clear image of a true hazy image, some non-reference image quality evaluation indexes, such as information entropy (Entropy), haze density estimation (FADE), image visual information fidelity assessment (VIF), and the like, are used to analyze and compare the defogging effects of the true images of different algorithms. The information entropy is used for evaluating the definition degree of the defogging image, and the higher the information entropy is, the more details of the defogging image are reserved; the fog density estimation can evaluate the fog density in the image, and the lower the fog density is, the less fog remains in the image; the image visual information fidelity level is used for evaluating the distortion level of defogging images, and the higher the VIF value is, the higher the image quality is. The test uses HSTS dataset and the average test index results are shown in table 2.
Table 2 different algorithms test results table for true misty data set
The comparison result in the table 2 shows that HazeDiffusion is superior to other comparison algorithms in information entropy, fog density estimation and image visual information fidelity degree results; figure 5 shows the defogging effect of the different algorithms. The DCP algorithm has poor effect of removing the fog image in the real world, and especially causes serious distortion of the image in a sky area; the AOD-Net algorithm still shows the defect of incomplete defogging; the image removed by PCFAN algorithm can generate artifacts at the edge, and defogging of the local area is not thorough; the MADN algorithm is not complete for the contrast recovery of the image, and can not realize targeted adjustment on the local bright and dark areas of the real image; both DehazeFormer and MixDehazeNet algorithms show severe defogging incompleteness, which is quite different from the result of the synthetic data set, and has defects in true data defogging. HazeDiffusion can restore rich colors and textures of the real image, has clear details and is closer to the subjective cognition real haze-free image. Although DehazeFormer and MxiDehazeNet are remarkable in the synthetic dataset, they do not perform well in the true image defogging effect; the HazeDiffusion model is more suitable for defogging of real images, and has more excellent indexes and more real defogging effect. Through the comprehensive detection of a plurality of evaluation indexes, hazeDiffusion algorithm has good defogging effect on a true foggy image, and compared with a comparison algorithm, the processed image can recover more complete image details, so that the image quality is improved obviously.
The preferred embodiments of the present invention have been described in detail, but the present invention is not limited to the above embodiments, and various changes can be made within the knowledge of those skilled in the art without departing from the spirit of the present invention, and the various changes are included in the scope of the present invention.
Claims (5)
1. The defogging method for the diffusion model integrating the attention of parallel multi-convolution is characterized by comprising the following steps of: comprises the following steps:
S1, based on a conditional diffusion model, improving a reverse process noise estimation network to construct an image defogging model HazeDiffusion;
The method for constructing the image defogging model HazeDiffusion in the S1 comprises the following steps: the HazeDiffusion model comprises two large modules, namely Diffusion Process and Reverse Process; the Diffusion Process module is a noise adding module, randomly generates a noise and is connected with the image; the Reverse Process module is a noise prediction module, inputs a Gaussian noise image and a foggy image into a convolution layer, and calculates time embedding for a noise level t; the downsampling stage sequentially comprises a convolution layer, a RESWITHATTN layer, a PCMA layer and a downsampling layer; at an intermediate stage of the network are RESWITHATTN layers and PCMA layers; the up-sampling stage comprises RESWITHATTN layers, a PCMA layer, an up-sampling layer and RESWITHATTN layers in sequence; fusing feature maps from different stages using an SK fusion module; the PCMA module comprises parallel attention and parallel multi-convolution, uses GroupNorm layers of data for normalization, and uses residual error connection to enrich characteristic information in the module; parallel multi-convolution extracts features using depth separable convolutions of different convolution kernel sizes, including 7 x 7, 5 x 5, and 3 x 3 convolutions;
S2, introducing SKFusion fusion modes, and realizing more specific and rich acquisition of information of each scale through dynamic feature fusion and jump connection;
the SKFusion fusion module in the S2 dynamically fuses the feature maps from different stages, SKFusion fuses the improved self-selective convolution kernel network, and by using the channel attention to fuse a plurality of feature branches, two feature maps are respectively set as x 1 and x 2, wherein x 1 is the feature map from jump connection, and x 2 is the feature map from the output of the network module; first x 1 is obtained by passing PWConv (PointWise Conv) layer Then, obtaining a fusion weight by using global average pooling, a multi-layer perceptron, a Softmax activation function and Split operation;
By passing through Will beAnd x 2; wherein GAP represents global average pooling, F mlp represents a multi-layer perceptron, softmax represents a Softmax activation function, split represents Split operation;
S3, designing a PMCA module by combining pixels, channels and cross attention, and more accurately acquiring characteristics of condition information; through parallel convolution and residual error learning, the model is enabled to pay more attention to the haze area of the image more flexibly, and pay more attention to the local characteristics of the hazy image better;
In the step S3, a parallel multi-convolution attention PMCA module is designed for the improved noise estimation network, wherein the parallel multi-convolution attention PMCA module comprises parallel attention and parallel multi-convolution, groupNorm layers of data are used for normalization, so that training is more stable, and residual error connection is used for enriching characteristic information in the module; the parallel connection of a plurality of depth separable convolution layers with different scales effectively aggregates spatial information and conversion characteristics; paralleling multiple attention mechanisms can strengthen the model's focus on global and local features;
S4, extracting high-frequency features by using bicubic downsampling to reduce the image size, recovering a high-resolution image by adopting an upsampling method based on a Laplacian pyramid, and improving the processing efficiency of the model.
2. The diffusion model defogging method for fusing parallel multi-convolution attention according to claim 1, wherein: the data sample of the image defogging model HazeDiffusion in S1 is RESIDE dataset, the RESIDE dataset is one of widely used datasets with image defogging standard, and the RESIDE dataset is composed of five subsets: indoor training set ITS, outdoor training set OTS, comprehensive target testing set SOTS, real world task driving testing set RTTS and hybrid subjective testing set HSTS; ITS and OTS are synthetic datasets, RTTS is a real world dataset, HSTS consists of synthetic and real foggy images, experiments train models on ITS dataset containing 100000 image pairs, and tests on the indoor dataset of SOTS of 500 image pairs; models were trained on OTS containing 313950 image pairs and tested on the outdoor test set of SOTS for 500 pairs of images.
3. The diffusion model defogging method for fusing parallel multi-convolution attention according to claim 1, wherein: the main structure of the model in the S1 is a conditional defogging diffusion model fused with a foggy image, the diffusion model is a depth generation model, noise is added to available training data, then the process is reversed to restore the data, and the model gradually learns to eliminate the noise; gradually increasing Gaussian noise to the clear fog-free image in the diffusion process until the clear fog-free image becomes a pure noise image; the reverse process is the reverse process of the forward process, a random Gaussian noise is generated, the Gaussian noise and the hazy image Haze are input into a network model fusing parallel multi-convolution attention together, and a clear image is recovered through the reverse defogging process; the Haze of the hazy image is used as a condition to be added into the diffusion model, so that a defogging condition diffusion model is obtained, the problems that the defogging effect of the real image is poor and the indoor and outdoor data sets are separated and trained are complicated can be successfully solved, and the defogging effect of the image is improved.
4. The diffusion model defogging method for fusing parallel multi-convolution attention according to claim 1, wherein: the image defogging model HazeDiffusion training method comprises the following steps:
in HazeDiffusion network models constructed by the training set, an L1 loss supervision model is adopted to calculate the average error between the clear image and the defogging image, and the training is carried out through the maximum likelihood estimation output by the network; the loss formula is defined as follows:
Where n is the total number of samples of the training set, f (x) is the noise image generated, y i is the estimated noise image, and the L1 penalty function optimizes the model by detecting the absolute value of the difference between f (x) and y i;
In the training process, the diffusion model takes real data and pure noise as input samples, the output model estimates added noise, then calculates loss with the real noise at each moment, and iteratively updates model parameters.
5. The diffusion model defogging method for fusing parallel multi-convolution attention according to claim 1, wherein: in the step S4, the image size is reduced by extracting high-frequency features through bicubic downsampling, the size of an input image is adjusted to 256×256 pixels through bicubic downsampling, and the calculation efficiency of a diffusion model is improved by reducing the input of the model; in order to obtain a defogging image with high quality, a low-resolution image generated by Laplacian pyramid processing is introduced, the image resolution is recovered, and the Laplacian pyramid retains most edges of the image on the aspect of improving the image resolution, so that detail blurring is avoided, artifacts are reduced, the process is simple, and the calculated amount is reduced.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410045689.3A CN117994167B (en) | 2024-01-11 | 2024-01-11 | Diffusion model defogging method integrating parallel multi-convolution attention |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410045689.3A CN117994167B (en) | 2024-01-11 | 2024-01-11 | Diffusion model defogging method integrating parallel multi-convolution attention |
Publications (2)
Publication Number | Publication Date |
---|---|
CN117994167A CN117994167A (en) | 2024-05-07 |
CN117994167B true CN117994167B (en) | 2024-06-28 |
Family
ID=90895602
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202410045689.3A Active CN117994167B (en) | 2024-01-11 | 2024-01-11 | Diffusion model defogging method integrating parallel multi-convolution attention |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117994167B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118172297B (en) * | 2024-05-16 | 2024-07-09 | 南京航空航天大学 | Restoration method of low-light image of strong light absorption component |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111539887A (en) * | 2020-04-21 | 2020-08-14 | 温州大学 | Neural network image defogging method based on mixed convolution channel attention mechanism and layered learning |
CN114742719A (en) * | 2022-03-14 | 2022-07-12 | 西北大学 | End-to-end image defogging method based on multi-feature fusion |
Family Cites Families (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110503654B (en) * | 2019-08-01 | 2022-04-26 | 中国科学院深圳先进技术研究院 | Medical image segmentation method and system based on generation countermeasure network and electronic equipment |
WO2022095253A1 (en) * | 2020-11-04 | 2022-05-12 | 常州工学院 | Method for removing cloud and haze on basis of depth channel sensing |
CN113450273B (en) * | 2021-06-18 | 2022-10-14 | 暨南大学 | Image defogging method and system based on multi-scale multi-stage neural network |
US11663705B2 (en) * | 2021-09-17 | 2023-05-30 | Nanjing University Of Posts And Telecommunications | Image haze removal method and apparatus, and device |
CN113947537A (en) * | 2021-09-17 | 2022-01-18 | 南京邮电大学 | Image defogging method, device and equipment |
CN116468625A (en) * | 2023-03-23 | 2023-07-21 | 河南大学 | Single image defogging method and system based on pyramid efficient channel attention mechanism |
CN116739985A (en) * | 2023-05-10 | 2023-09-12 | 浙江医院 | Pulmonary CT image segmentation method based on transducer and convolutional neural network |
CN116645287B (en) * | 2023-05-22 | 2024-03-29 | 北京科技大学 | Diffusion model-based image deblurring method |
CN116721033A (en) * | 2023-06-21 | 2023-09-08 | 西南石油大学 | Single image defogging method based on random mask convolution and attention mechanism |
CN117151990B (en) * | 2023-06-28 | 2024-03-22 | 西南石油大学 | Image defogging method based on self-attention coding and decoding |
-
2024
- 2024-01-11 CN CN202410045689.3A patent/CN117994167B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111539887A (en) * | 2020-04-21 | 2020-08-14 | 温州大学 | Neural network image defogging method based on mixed convolution channel attention mechanism and layered learning |
CN114742719A (en) * | 2022-03-14 | 2022-07-12 | 西北大学 | End-to-end image defogging method based on multi-feature fusion |
Also Published As
Publication number | Publication date |
---|---|
CN117994167A (en) | 2024-05-07 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111915531B (en) | Neural network image defogging method based on multi-level feature fusion and attention guidance | |
CN111915530B (en) | End-to-end-based haze concentration self-adaptive neural network image defogging method | |
CN110210608B (en) | Low-illumination image enhancement method based on attention mechanism and multi-level feature fusion | |
CN111709895A (en) | Image blind deblurring method and system based on attention mechanism | |
CN111275637A (en) | Non-uniform motion blurred image self-adaptive restoration method based on attention model | |
CN117994167B (en) | Diffusion model defogging method integrating parallel multi-convolution attention | |
CN112465727A (en) | Low-illumination image enhancement method without normal illumination reference based on HSV color space and Retinex theory | |
CN114463218B (en) | Video deblurring method based on event data driving | |
CN116596792B (en) | Inland river foggy scene recovery method, system and equipment for intelligent ship | |
CN114972134A (en) | Low-light image enhancement method for extracting and fusing local and global features | |
CN116228550A (en) | Image self-enhancement defogging algorithm based on generation of countermeasure network | |
CN116468625A (en) | Single image defogging method and system based on pyramid efficient channel attention mechanism | |
CN115861094A (en) | Lightweight GAN underwater image enhancement model fused with attention mechanism | |
CN117952830B (en) | Three-dimensional image super-resolution reconstruction method based on iterative interaction guidance | |
CN117974459A (en) | Low-illumination image enhancement method integrating physical model and priori | |
CN113689346A (en) | Compact deep learning defogging method based on contrast learning | |
CN116128768B (en) | Unsupervised image low-illumination enhancement method with denoising module | |
CN113160056A (en) | Deep learning-based noisy image super-resolution reconstruction method | |
CN116721033A (en) | Single image defogging method based on random mask convolution and attention mechanism | |
CN117196940A (en) | Super-resolution reconstruction method suitable for real scene image based on convolutional neural network | |
CN115760640A (en) | Coal mine low-illumination image enhancement method based on noise-containing Retinex model | |
CN114820395B (en) | Underwater image enhancement method based on multi-field information fusion | |
Zhao et al. | Single image dehazing based on enhanced generative adversarial network | |
Huang et al. | Unsupervised image dehazing based on improved generative adversarial networks | |
Li et al. | Image Defogging Algorithm Based on Dual-Stream Skip Connections |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |