CN117151990B - Image defogging method based on self-attention coding and decoding - Google Patents
Image defogging method based on self-attention coding and decoding Download PDFInfo
- Publication number
- CN117151990B CN117151990B CN202310774453.9A CN202310774453A CN117151990B CN 117151990 B CN117151990 B CN 117151990B CN 202310774453 A CN202310774453 A CN 202310774453A CN 117151990 B CN117151990 B CN 117151990B
- Authority
- CN
- China
- Prior art keywords
- convolution
- image
- self
- layer
- images
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 88
- 230000004927 fusion Effects 0.000 claims abstract description 21
- 238000012549 training Methods 0.000 claims abstract description 17
- 238000010586 diagram Methods 0.000 claims abstract description 13
- 238000000605 extraction Methods 0.000 claims abstract description 7
- 230000006870 function Effects 0.000 claims description 23
- 230000007246 mechanism Effects 0.000 claims description 16
- 230000000694 effects Effects 0.000 claims description 15
- 238000012360 testing method Methods 0.000 claims description 12
- 238000012545 processing Methods 0.000 claims description 10
- 208000037170 Delayed Emergence from Anesthesia Diseases 0.000 claims description 6
- 238000007781 pre-processing Methods 0.000 claims description 6
- 230000002776 aggregation Effects 0.000 claims description 5
- 238000004220 aggregation Methods 0.000 claims description 5
- 230000004913 activation Effects 0.000 claims description 4
- 238000004364 calculation method Methods 0.000 claims description 4
- 230000001965 increasing effect Effects 0.000 claims description 3
- 238000002474 experimental method Methods 0.000 claims description 2
- 238000013507 mapping Methods 0.000 claims description 2
- 238000003062 neural network model Methods 0.000 claims description 2
- 238000011176 pooling Methods 0.000 claims description 2
- 230000009466 transformation Effects 0.000 claims 1
- 238000000844 transformation Methods 0.000 claims 1
- 239000000284 extract Substances 0.000 abstract description 3
- 239000012141 concentrate Substances 0.000 abstract 1
- 238000007499 fusion processing Methods 0.000 abstract 1
- 238000005070 sampling Methods 0.000 abstract 1
- 210000000697 sensory organ Anatomy 0.000 abstract 1
- 239000010410 layer Substances 0.000 description 27
- 230000008569 process Effects 0.000 description 14
- 238000004422 calculation algorithm Methods 0.000 description 8
- 238000011156 evaluation Methods 0.000 description 8
- 238000013527 convolutional neural network Methods 0.000 description 6
- 230000000007 visual effect Effects 0.000 description 6
- 102100040954 Ephrin-A1 Human genes 0.000 description 5
- 101000965523 Homo sapiens Ephrin-A1 Proteins 0.000 description 5
- 238000013528 artificial neural network Methods 0.000 description 5
- 239000002131 composite material Substances 0.000 description 4
- 230000002708 enhancing effect Effects 0.000 description 4
- 230000008901 benefit Effects 0.000 description 3
- 239000003086 colorant Substances 0.000 description 3
- 238000011084 recovery Methods 0.000 description 3
- 230000005540 biological transmission Effects 0.000 description 2
- 238000013135 deep learning Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000010606 normalization Methods 0.000 description 2
- 230000009467 reduction Effects 0.000 description 2
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 230000003247 decreasing effect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 230000008034 disappearance Effects 0.000 description 1
- 238000003384 imaging method Methods 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 230000008447 perception Effects 0.000 description 1
- 239000013047 polymeric layer Substances 0.000 description 1
- 238000007619 statistical method Methods 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T3/00—Geometric image transformations in the plane of the image
- G06T3/40—Scaling of whole images or parts thereof, e.g. expanding or contracting
- G06T3/4046—Scaling of whole images or parts thereof, e.g. expanding or contracting using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
- G06N3/0442—Recurrent networks, e.g. Hopfield networks characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0464—Convolutional networks [CNN, ConvNet]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/082—Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/774—Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/776—Validation; Performance evaluation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/80—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
- G06V10/806—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02A—TECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
- Y02A90/00—Technologies having an indirect contribution to adaptation to climate change
- Y02A90/10—Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Software Systems (AREA)
- Computing Systems (AREA)
- Health & Medical Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Biomedical Technology (AREA)
- Medical Informatics (AREA)
- Multimedia (AREA)
- Life Sciences & Earth Sciences (AREA)
- Databases & Information Systems (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Molecular Biology (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses an image defogging method based on self-attention coding and decoding, which comprises the steps of firstly downsampling an image, and then introducing a self-convolution and conventional convolution fusion processing downsampled feature map to improve the local feature extraction capacity of a whole convolution module; secondly, convolving the feature map again by introducing a residual error dense block; then, high-low level semantic information fusion in different scale features is realized through a gating fusion unit; then up-sampling is carried out, and the size of the feature map is restored to the size of the initial image; and finally, weighting the average absolute error loss function and the multi-scale similarity loss function to form a mixed loss function for model training so as to improve subjective feeling of defogging results. The method enables the basic unit to concentrate more on extracting high-frequency information, better highlights important characteristic diagram information in the channel, better extracts important information in the image and can recover defogging images more conforming to artificial sense organs.
Description
Technical Field
The invention relates to the technical field of computer vision image processing, in particular to an image defogging method based on self-attention coding and decoding.
Background
The image defogging algorithm aims at eliminating noise interference caused by haze on an image, improving definition and color saturation of the image and recovering detail information of the image. Prior to the advent of deep learning methods, conventional defogging methods received attention from many students, mainly of two types: an image enhancement-based defogging method is used for optimizing an image visual effect only by enhancing an image, and enhancing certain information in a foggy image only according to actual requirements without considering a foggy formation mechanism and a foggy image imaging process; the other is a defogging method based on a physical model, which uses an atmospheric scattering model as a theoretical basis, and estimates unknown parameters in the model through a statistical method so as to restore a defogging image.
In recent years, the deep learning theory is widely applied in the image defogging field, and partial students train the neural network by using a large number of foggy day image data sets, so that defogging work is realized more efficiently. Image defogging methods based on convolutional neural networks can be classified into the following three types according to differences in image feature processing modes:
defogging method based on traditional convolutional neural network: such methods are mainly based on conventional convolutional neural networks for image defogging, and generally comprise the following steps: image preprocessing, feature extraction and feature recovery. However, the conventional convolutional neural network defogging method is poor in effect when texture and edge information are processed, loss or blurring of the texture information is easy to occur, and effective extraction and utilization of global and local key information are not available, so that the defogging effect in a complex scene is poor.
Image defogging method based on attention mechanism: the method mainly utilizes an attention mechanism to extract important features in the image, so that the defogging effect is more accurate. In particular, the attention mechanism may be by calculating the weight of each pixel in the image, thereby making the defogging algorithm more focused on important areas in the image. Volodymyr, et al, first introduced an attention mechanism into the RNN model to classify images, selected the areas of the images to be processed by the attention mechanism, each current state determined the location of attention based on the previous state and the currently entered image, and processed only the pixels within the area of attention, not all the pixels of the entire image. This has the advantage of reducing the number of pixels processed and the difficulty of tasks. However, the defogging method based on attention mostly only performs feature extraction on a single-scale level, directly performs recovery operation on the weighted extracted features, and lacks extraction and fusion of global multi-scale features.
Image defogging method based on multi-scale feature fusion: the method mainly fuses the image features with different scales, so that the defogging accuracy of the image is improved. In particular, such algorithms typically fuse features extracted by the model at different scales to generate more accurate defogging results. The multiscale CNN can extract effective characteristics from the foggy image to estimate the transmission diagram, but because the atmospheric light and the transmission diagram are estimated at the same time without adopting a learning mode, the atmospheric light estimation has larger error, and the quality of the final foggy image is affected.
In summary, compared with the traditional convolutional neural network algorithm, the image defogging algorithm based on the attention mechanism can adaptively select the region of interest and adjust according to scene characteristics, so that focus is paid attention to and defogging effect is improved. The multiscale fusion mechanism can extract and fuse information on different scales, so that multiscale information can be better processed, details and colors of images can be better recovered, and defogging effect is improved. However, the above methods based on the attention mechanism still lack an understanding of the spatial features and frequency domain feature information of the input image itself. Therefore, it is particularly important to design a suitable attention mechanism in the image defogging field and to reasonably utilize multi-scale feature fusion.
Disclosure of Invention
Aiming at the problem that an image defogging algorithm based on an attention mechanism lacks understanding of the spatial characteristic and the frequency domain characteristic information of an input image, the invention provides an image defogging method based on self-attention coding and decoding.
The invention provides an image defogging method based on self-attention coding and decoding, which specifically comprises the following steps:
s1: and selecting the public image defogging data set as an image data set to be tested, dividing the data set into a training set and a testing set, and carrying out image preprocessing.
An OTS data set in RESIDE-beta and a Foggy data set in Cityscapes are adopted as experimental data sets, images containing traffic scenes and the Foggy data sets in the OTS data sets are mixed to carry out training tasks, the OTS comprises outdoor clear images 2061, and one clear image corresponds to 35 Foggy images with different degrees; the Cityscapes contains 5002 clear traffic scene pictures, one clear picture corresponds to 3 foggy day images with different degrees, and the total number of the traffic foggy pictures is 15006. M images are selected as training sets, N images are selected as test sets, K real fog images containing traffic scenes are selected from RESIDE-beta at the same time, and the K real fog images are used as test sets for analyzing defogging effects of the real fog images, wherein 90% of the M images are used as training sets (M), 9% of the N images are used as real fog images (K), and 1% of the N images are used as test sets (N).
The image preprocessing refers to performing a series of processing steps on an original image before further processing the image in a computer vision task, so as to improve the effect of a subsequent task or reduce errors. These processing steps may include, but are not limited to, the following:
1. noise reduction: noise in the image is removed to better identify and analyze objects in the image.
2. Resizing the: the image is resized to fit a particular task or model.
3. Cutting: unnecessary parts in the image are removed so as to improve the effect of the subsequent task.
4. Rotation and flipping: the image is rotated or flipped to better match the desired input of the model.
5. Normalization and normalization: the image pixel values are normalized or normalized to better accommodate the input requirements of the model.
6. Contrast and brightness enhancement: the contrast and brightness of the image is increased or decreased in order to better identify and analyze objects in the image.
7. Conversion color space: the image is converted from one color space to another to better accommodate the needs of a particular task.
The image preprocessing can help to improve the accuracy and efficiency of the computer vision task, thereby better meeting the demands of users.
S2: downsampling the preprocessed images in the training set, and processing the downsampled feature map through self-convolution and conventional convolution fusion; the specific operation method is as follows:
adopting the maximum pooling of 3*3 as a downsampling layer, and acquiring multi-scale characteristic information by adopting the downsampling layer for a plurality of times in the encoding stage; and introducing self-convolution based on an attention mechanism after the downsampling layer, and fusing the characteristics extracted by conventional convolution with the characteristics extracted by self-convolution.
The self-convolution based on the attention mechanism operates as follows:
(1) Generating a convolution kernel related to the length and width of the initial image X through a kernel generating function phi (X), and expanding the generated single convolution kernel into a convolution kernel group with the channel dimension of C; the kernel generation function phi (X) is:
φ(X)=W 1 ξ(W 0 (X))
wherein X represents an input image; w (W) 0 ,W 1 Representing 2 linear transforms; ζ represents a nonlinear activation function;
(2) And performing convolution operation on the generated convolution kernel group and the corresponding position of the initial image X, performing aggregation operation on the self-convolution and common convolution results in the field of K, and finally outputting a feature map of the initial image.
S3: and (3) introducing a residual error dense block, performing convolution operation on the feature map again, and increasing the extraction of the feature information to finish the coding of the fog map.
The specific operation of the residual error dense block is as follows: the input is skip connected to each residual block based on the original residual structure, and the output of each layer convolution operation in each residual block is skip connected to the output of that residual block.
The original residual structure includes: two convolution blocks with the convolution kernel size of 3 and one convolution block with the convolution kernel size of 5, an identity mapping module, 3 Relu activation function modules and an aggregation module; the calculation formula is as follows:
F=Y-X
wherein F is the residual error to be learned; x and Y are input and output feature maps, respectively.
S4: and through a plurality of gating fusion units, different scale features are fused, so that high-level and low-level semantic information fusion in the multi-scale features is realized.
A single gated fusion unit consists of one convolutional layer and one polymeric layer. The convolution layer kernel sizes are assigned 7*7, 5*5, 3*3, 3*3, respectively, according to the feature map sizes.
The specific operation of the step is as follows:
first, network upper layer feature F i And underlying feature F i+1 Extracting and inputting the extracted data into a gating fusion unit; its output weight Q i And Q i+1 The number of the characteristic channels is matched according to the number of the specific characteristic channels of the upper layer and the lower layer, and each characteristic channel is respectively corresponding to each characteristic channel;
finally, the feature images of different layers are linearly combined with the weights output by the gating fusion unit, and the combined feature images are further sent to corresponding decoders to obtain target fog image residual errors, wherein the mathematical expression is as follows:
Q 1 ,Q 2 ,Q 3 ,.......Q i+1 =g i (F 1 ,F 2 ,F 3 ,.......F i+1 )
wherein g i Representing a gating fusion module of an ith layer; f (F) i Representing an i-th layer feature map; q (Q) i The weight of the combination output of the ith layer and the (i-1) th layer is represented; f (F) oi The characteristic diagram of the i-1 layer combined with the i-1 layer is shown.
S5: upsampling by taking bilinear interpolation as an upsampling layer, gradually restoring the size of the feature map to the size of the initial image by adopting a plurality of upsampling layers in a decoding stage, and finishing decoding of the fog map;
s6: during model training, weighting an average absolute error loss function and a multi-scale similarity loss function to form a mixed loss function, so as to obtain model parameters; wherein, the formula of the mixing loss function is as follows:
L mix =α·L MS-SSIM +(1-α)·G·L 1
wherein L is 1 Representing an average absolute error loss function; l (L) MS-SSIM Representing a multi-scale similarity loss function; alpha represents the weight within the mixing loss function; g is obtained by using Gaussian convolution on the errorWeight coefficient of (2);
s7: and inputting the preprocessed image test set into a trained neural network model for defogging based on the coded and decoded images for testing, so as to obtain defogged images.
Compared with the prior art, the invention has the following advantages:
(1) The self-attention-based coding and decoding method provided by the invention fully considers the problem of feature loss in the deep neural network from the network structure, and the multi-scale gating fusion unit connected layer by layer transmits the lost image feature information of each layer to the corresponding layer, so that the network can not lose the feature information related to human visual sense such as excessive image textures, colors and the like.
(2) The improved self-convolution module can pay attention to more detail features of the image when the features are extracted, so that the restored image is clearer; the residual error dense block can optimize calculation, so that the model is converged rapidly, and the reduction of image recovery quality caused by the increase of network depth is avoided; finally, L is adopted in training 1 And L MS-SSIM The composite loss function enhances the artificial perception of the restored image. The defogged image is more in line with the visual sense of human body in subjective sense, and objective indexes also indicate that the defogged image is better in quality, so that sufficient data preprocessing support can be provided for other image processing fields.
Additional advantages, objects, and features of the invention will be set forth in part in the description which follows and in part will become apparent to those having ordinary skill in the art upon examination of the following or may be learned from practice of the invention.
Drawings
Fig. 1 is a network configuration diagram of the image defogging method based on self-attention encoding and decoding of the present invention.
Fig. 2 is a block diagram of a modified self-convolution module.
Fig. 3 is a block diagram of a residual dense module.
Fig. 4 is a graph of the effect of each defogging method on the composite fog map in the OTS subset data set. In the figure, (a) a synthetic hazy image; (b) DCP process; (c) a Dehaze method; (d) AOD process; (e) GMAN process; (f) an ERAN method; (g) MFFID method; (h) the process of the invention; (i) a haze-free image.
FIG. 5 is a graph showing the effect of each defogging method on a composite fog map in the Foggy subset of data. (a) a synthetic hazy image; (b) DCP process; (c) a Dehaze method; (d) AOD process; (e) GMAN process; (f) an ERAN method; (g) MFFID method; (h) the process of the invention; (i) a haze-free image.
Fig. 6 is an effect diagram of each defogging method on a real fog map. (a) a true hazy image; (b) DCP process; (c) a Dehaze method; (d) AOD process; (e) GMAN process; (f) an ERAN method; (g) MFFID method; (h) the process of the invention.
Detailed Description
The preferred embodiments of the present invention will be described below with reference to the accompanying drawings, it being understood that the preferred embodiments described herein are for illustration and explanation of the present invention only, and are not intended to limit the present invention.
As shown in fig. 1-3, the image defogging method based on self-attention encoding and decoding provided by the present invention firstly adds a plurality of improved self-convolution modules in an encoder-decoder, and the modules can adaptively allocate weights at different positions of an image based on an attention mechanism, so as to prioritize spatial domain information. And secondly, replacing a common convolution block with a residual dense block so as to achieve the purposes of reducing or eliminating gradient disappearance, enhancing information flow and characteristic multiplexing and optimizing calculation force. Because the jump connection of the U-shaped structure directly connects the shallow layer characteristics of the encoder to the deep layer characteristics of the corresponding decoder, and the high-low grade semantic information is easy to ignore, a plurality of gating units with different scales are adopted in the network to fuse the upper and lower characteristic layers, and the module can aggregate the semantic information in a wider space, so that the characteristic loss is further reduced. And finally, connecting the fused features to the corresponding decoders layer by layer, and restoring the clear images after processing by the decoders.
Training process and reasoning:
in the experiment, an Adam optimizer is adopted to optimize training; initial learning rate of 1×10 -4 The method comprises the steps of carrying out a first treatment on the surface of the Batch training size was 16. Training image and verification imageThe small adjustment is OTS-256 x 256, foggy-512 x 1024, and the size of the test image is not limited.
Experimental evaluation:
the performance of the image defogging method (Our) of the present invention and the advanced defogging method (DCP method, dehaze method, AOD method, GMANmethod, ERAN method, MFFID method) in RESIDE-beta and in Cityscapes respectively were tested and compared. Fig. 4 and 5 are graphs of the effect of each defogging method on the composite fog map in the OTS and Foggy sub-data sets, respectively. Table 1 shows the objective evaluation index comparison of different comparison algorithms on the synthetic fog pattern.
TABLE 1 objective evaluation results of different defogging methods on synthetic haze patterns
It can be seen that the SSIM value of the method reaches 0.9207, which is 0.299 higher than that of the traditional DCP method and 0.0551 higher than that of other neural network methods on average; the MS-SSIM index value reaches 0.9762, which is 0.156 higher than that of the traditional DCP method and 0.0312 higher than that of other neural network methods on average; the PSNR index reaches 26.531dB, which is 12.11dB higher than that of the traditional DCP method, and 3.171dB higher than that of other neural network methods. Experimental results show that the method is superior to other methods in objective evaluation indexes on the synthetic fog images, the difference between the defogging images and the clear images is smaller, fog clusters can be effectively removed by the method, structural information of the images before and after defogging is more similar, and the processing effect of detail information such as edges and colors of the images is better than that of a comparison method.
Fig. 6 is an effect diagram of each defogging method on a true fog map. Table 2 shows the objective evaluation index comparison of different defogging algorithms on the real fog map. It can be seen that, since DCP has the problem of excessively enhancing boundary information, the index without reference is extremely sensitive to the pixel edge variation of the image, so that the values of the two indexes are abnormally high and have no comparability. In addition, the problem that the GMAN method outlines edges in subjective evaluation is also presented, and the evaluation indexes are higher than those of the ERAN method, the MMFID method and the method of the invention, which have better visual effects. Therefore, the method of the present invention has various indexes higher than other comparison methods except for the two network results of poor subjective visual performance, i.e., DCP and GMANs. Experimental results show that the method can recover fine contrast details and texture changes in the image, the edges of the recovered image are clearer, and the overall visual effect is better than that of a contrast method.
TABLE 2 objective evaluation results of different defogging methods on real fog patterns
The present invention is not limited to the above-mentioned embodiments, but is intended to be limited to the following embodiments, and any modifications, equivalents and modifications can be made to the above-mentioned embodiments without departing from the scope of the invention.
Claims (6)
1. A self-attention encoding and decoding-based image defogging method, comprising the steps of:
s1: selecting a public image defogging data set as an image data set to be tested, dividing the data set into a training set and a testing set, and carrying out image preprocessing;
s2: downsampling the preprocessed images in the training set, and processing the downsampled feature map through self-convolution and conventional convolution fusion; the specific operation is as follows:
adopting the maximum pooling of 3*3 as a downsampling layer, and acquiring multi-scale characteristic information by adopting the downsampling layer for a plurality of times in the encoding stage; introducing self-convolution based on an attention mechanism after a downsampling layer, and fusing the characteristics extracted by conventional convolution with the characteristics extracted by self-convolution;
s3: the residual error dense block is introduced, convolution operation is carried out on the feature map again, extraction of feature information is increased, and coding of the fog map is completed;
s4: the high-low semantic information fusion in the multi-scale features is realized by fusing the different-scale features through a plurality of gating fusion units;
s5: upsampling by taking bilinear interpolation as an upsampling layer, gradually restoring the size of the feature map to the size of the initial image by adopting a plurality of upsampling layers in a decoding stage, and finishing decoding of the fog map;
s6: during model training, weighting an average absolute error loss function and a multi-scale similarity loss function to form a mixed loss function, so as to obtain model parameters; the formula of the mixing loss function is as follows:
L mix =α·L MS-SSIM +(1-α)·G·L 1
wherein L is 1 Represents the average absolute error loss function, L MS-SSIM Representing a multi-scale similarity loss function, wherein alpha represents weights in the mixed loss function, and G is a weight coefficient obtained by using Gaussian convolution on errors;
s7: and inputting the preprocessed image test set into a trained neural network model for defogging based on the coded and decoded images for testing, so as to obtain defogged images.
2. The self-attention-based coded and decoded image defogging method as claimed in claim 1, wherein in step S2, the self-convolution based on the attention mechanism operates as follows:
(1) Generating a convolution kernel related to the length and width of the initial image X through a kernel generating function phi (X), and expanding the generated single convolution kernel into a convolution kernel group with the channel dimension of C; the kernel generation function phi (X) is:
φ(X)=W 1 ξ(W 0 (X))
wherein X represents an input image, W 0 、W 1 Representing 2 linear transformations, ζ representing a nonlinear activation function;
(2) And performing convolution operation on the generated convolution kernel group and the corresponding position of the initial image X, performing aggregation operation on the self-convolution and common convolution results in the field of K, and finally outputting a feature map of the initial image.
3. The self-attention-based coded and decoded image defogging method as claimed in claim 1, wherein in step S3, the specific operation of the residual error density block is: the input is skip connected to each residual block based on the original residual structure, and the output of each layer convolution operation in each residual block is skip connected to the output of that residual block.
4. A self-attention-based coded decoded image defogging method as claimed in claim 3, wherein said original residual structure comprises: two convolution blocks with the convolution kernel size of 3 and one convolution block with the convolution kernel size of 5, an identity mapping module, 3 Relu activation function modules and an aggregation module; the calculation formula is as follows:
F=Y-X
where F is the residual to be learned and X, Y is the input and output profile, respectively.
5. The self-attention-encoding-decoding-based image defogging method of claim 1, wherein in step S4, the single gating fusion unit is composed of one convolution layer and one aggregation layer; the specific operation of the step is as follows:
(1) Network upper layer feature F i And underlying feature F i+1 Extracted, input into a gating fusion unit, and output a weight Q i And Q i+1 The number of the characteristic channels is matched according to the number of the specific characteristic channels of the upper layer and the lower layer, and each characteristic channel is respectively corresponding to each characteristic channel;
(2) The characteristic diagrams of different layers and the weights output by the gating fusion unit are linearly combined, and the combined characteristic diagrams are further sent to corresponding decoders to obtain target fog diagram residual errors, wherein the mathematical expression is as follows:
Q 1 ,Q 2 ,Q 3 ,.......Q i+1 =g i (F 1 ,F 2 ,F 3 ,.......F i+1 )
wherein g i Gate fusion module representing ith layer, F i Represents an i-th layer feature map, Q i Representing the weight output by the combination of the ith layer and the (i-1) th layer, F oi The characteristic diagram of the i-1 layer combined with the i-1 layer is shown.
6. The image defogging method based on self-attention encoding and decoding according to claim 1, wherein in step S1, an OTS data set in the rest- β and a Foggy data set in the citiscapes are adopted as data sets of experiments, and the images containing traffic scenes and the Foggy data sets in the OTS data sets are used in a mixed manner for training, wherein a part of the images are selected as training sets, a part of the images are selected as test sets, and a part of the real fog images containing traffic scenes are selected from the rest- β at the same time as test sets for analyzing defogging effects of the real fog images.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310774453.9A CN117151990B (en) | 2023-06-28 | 2023-06-28 | Image defogging method based on self-attention coding and decoding |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310774453.9A CN117151990B (en) | 2023-06-28 | 2023-06-28 | Image defogging method based on self-attention coding and decoding |
Publications (2)
Publication Number | Publication Date |
---|---|
CN117151990A CN117151990A (en) | 2023-12-01 |
CN117151990B true CN117151990B (en) | 2024-03-22 |
Family
ID=88906888
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310774453.9A Active CN117151990B (en) | 2023-06-28 | 2023-06-28 | Image defogging method based on self-attention coding and decoding |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117151990B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117726550B (en) * | 2024-02-18 | 2024-04-30 | 成都信息工程大学 | Multi-scale gating attention remote sensing image defogging method and system |
Citations (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110738622A (en) * | 2019-10-17 | 2020-01-31 | 温州大学 | Lightweight neural network single image defogging method based on multi-scale convolution |
CN110992270A (en) * | 2019-12-19 | 2020-04-10 | 西南石油大学 | Multi-scale residual attention network image super-resolution reconstruction method based on attention |
CN111445418A (en) * | 2020-03-31 | 2020-07-24 | 联想(北京)有限公司 | Image defogging method and device and computer equipment |
CN111861925A (en) * | 2020-07-24 | 2020-10-30 | 南京信息工程大学滨江学院 | Image rain removing method based on attention mechanism and gate control circulation unit |
CN113628152A (en) * | 2021-09-15 | 2021-11-09 | 南京天巡遥感技术研究院有限公司 | Dim light image enhancement method based on multi-scale feature selective fusion |
CN113962878A (en) * | 2021-07-29 | 2022-01-21 | 北京工商大学 | Defogging model method for low-visibility image |
CN114638960A (en) * | 2022-03-22 | 2022-06-17 | 平安科技(深圳)有限公司 | Model training method, image description generation method and device, equipment and medium |
CN114821765A (en) * | 2022-02-17 | 2022-07-29 | 上海师范大学 | Human behavior recognition method based on fusion attention mechanism |
CN114862713A (en) * | 2022-04-29 | 2022-08-05 | 西安理工大学 | Two-stage image rain removing method based on attention smooth expansion convolution |
CN114881871A (en) * | 2022-04-12 | 2022-08-09 | 华南农业大学 | Attention-fused single image rain removing method |
CN115546046A (en) * | 2022-08-30 | 2022-12-30 | 华南农业大学 | Single image defogging method fusing frequency and content characteristics |
CN115700882A (en) * | 2022-10-21 | 2023-02-07 | 东南大学 | Voice enhancement method based on convolution self-attention coding structure |
CN115705493A (en) * | 2021-08-11 | 2023-02-17 | 暨南大学 | Image defogging modeling method based on multi-feature attention neural network |
CN115880170A (en) * | 2022-12-05 | 2023-03-31 | 华南理工大学 | Single-image rain removing method and system based on image prior and gated attention learning |
CN116013297A (en) * | 2022-12-17 | 2023-04-25 | 西安交通大学 | Audio-visual voice noise reduction method based on multi-mode gating lifting model |
CN116071549A (en) * | 2023-02-16 | 2023-05-05 | 河南稳健科技有限公司 | Multi-mode attention thinning and dividing method for retina capillary vessel |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116152082A (en) * | 2021-11-15 | 2023-05-23 | 三星电子株式会社 | Method and apparatus for image deblurring |
-
2023
- 2023-06-28 CN CN202310774453.9A patent/CN117151990B/en active Active
Patent Citations (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110738622A (en) * | 2019-10-17 | 2020-01-31 | 温州大学 | Lightweight neural network single image defogging method based on multi-scale convolution |
CN110992270A (en) * | 2019-12-19 | 2020-04-10 | 西南石油大学 | Multi-scale residual attention network image super-resolution reconstruction method based on attention |
CN111445418A (en) * | 2020-03-31 | 2020-07-24 | 联想(北京)有限公司 | Image defogging method and device and computer equipment |
CN111861925A (en) * | 2020-07-24 | 2020-10-30 | 南京信息工程大学滨江学院 | Image rain removing method based on attention mechanism and gate control circulation unit |
CN113962878A (en) * | 2021-07-29 | 2022-01-21 | 北京工商大学 | Defogging model method for low-visibility image |
CN115705493A (en) * | 2021-08-11 | 2023-02-17 | 暨南大学 | Image defogging modeling method based on multi-feature attention neural network |
CN113628152A (en) * | 2021-09-15 | 2021-11-09 | 南京天巡遥感技术研究院有限公司 | Dim light image enhancement method based on multi-scale feature selective fusion |
CN114821765A (en) * | 2022-02-17 | 2022-07-29 | 上海师范大学 | Human behavior recognition method based on fusion attention mechanism |
CN114638960A (en) * | 2022-03-22 | 2022-06-17 | 平安科技(深圳)有限公司 | Model training method, image description generation method and device, equipment and medium |
CN114881871A (en) * | 2022-04-12 | 2022-08-09 | 华南农业大学 | Attention-fused single image rain removing method |
CN114862713A (en) * | 2022-04-29 | 2022-08-05 | 西安理工大学 | Two-stage image rain removing method based on attention smooth expansion convolution |
CN115546046A (en) * | 2022-08-30 | 2022-12-30 | 华南农业大学 | Single image defogging method fusing frequency and content characteristics |
CN115700882A (en) * | 2022-10-21 | 2023-02-07 | 东南大学 | Voice enhancement method based on convolution self-attention coding structure |
CN115880170A (en) * | 2022-12-05 | 2023-03-31 | 华南理工大学 | Single-image rain removing method and system based on image prior and gated attention learning |
CN116013297A (en) * | 2022-12-17 | 2023-04-25 | 西安交通大学 | Audio-visual voice noise reduction method based on multi-mode gating lifting model |
CN116071549A (en) * | 2023-02-16 | 2023-05-05 | 河南稳健科技有限公司 | Multi-mode attention thinning and dividing method for retina capillary vessel |
Non-Patent Citations (8)
Title |
---|
Gated Fusion Network for Single Image Dehazing;Wenqi Ren et al;2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition;20181216;3253-3261 * |
Involution: Inverting the Inherence of Convolution for Visual Recognition;Duo Li et al;2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR);20211102;12316-12325 * |
一种基于循环生成式对抗网络的去雾算法;李潇雯等;西南师范大学学报(自然科学版);20200920(第09期);132-138 * |
一种基于深度学习的两阶段图像去雾网络;吴嘉炜等;计算机应用与软件;20200412(第04期);197-202 * |
利用门控机制融合依存与语义信息的事件检测方法;陈佳丽等;中文信息学报;20200815(第08期);55-64 * |
基于多损失约束与注意力块的图像修复方法;曹真等;陕西科技大学学报;20200616(第03期);164-171 * |
基于残差密集块与注意力机制的图像去雾网络;李硕士等;湖南大学学报(自然科学版);20210630;第48卷(第6期);112-118 * |
注意力机制下的多阶段低照度图像增强网络;谌贵辉等;计算机应用;20230210;第43卷(第2期);552 - 559 * |
Also Published As
Publication number | Publication date |
---|---|
CN117151990A (en) | 2023-12-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Lim et al. | DSLR: Deep stacked Laplacian restorer for low-light image enhancement | |
CN111047516B (en) | Image processing method, image processing device, computer equipment and storage medium | |
CN112001960B (en) | Monocular image depth estimation method based on multi-scale residual error pyramid attention network model | |
Hu et al. | Underwater image restoration based on convolutional neural network | |
CN112541864A (en) | Image restoration method based on multi-scale generation type confrontation network model | |
CN111292265A (en) | Image restoration method based on generating type antagonistic neural network | |
Hsu et al. | Single image dehazing using wavelet-based haze-lines and denoising | |
CN110363068B (en) | High-resolution pedestrian image generation method based on multiscale circulation generation type countermeasure network | |
CN111861945B (en) | Text-guided image restoration method and system | |
CN113222875B (en) | Image harmonious synthesis method based on color constancy | |
CN112381716B (en) | Image enhancement method based on generation type countermeasure network | |
Lu et al. | Rethinking prior-guided face super-resolution: A new paradigm with facial component prior | |
CN117151990B (en) | Image defogging method based on self-attention coding and decoding | |
CN116645569A (en) | Infrared image colorization method and system based on generation countermeasure network | |
Zheng et al. | T-net: Deep stacked scale-iteration network for image dehazing | |
CN116485934A (en) | Infrared image colorization method based on CNN and ViT | |
CN115829876A (en) | Real degraded image blind restoration method based on cross attention mechanism | |
CN115063318A (en) | Adaptive frequency-resolved low-illumination image enhancement method and related equipment | |
CN115641391A (en) | Infrared image colorizing method based on dense residual error and double-flow attention | |
Liu et al. | Facial image inpainting using multi-level generative network | |
Zheng et al. | Overwater image dehazing via cycle-consistent generative adversarial network | |
CN113763268A (en) | Blind restoration method and system for face image | |
CN116137043A (en) | Infrared image colorization method based on convolution and transfomer | |
CN110020986A (en) | The single-frame image super-resolution reconstruction method remapped based on Euclidean subspace group two | |
CN116258627A (en) | Super-resolution recovery system and method for extremely-degraded face image |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |