CN117151990B - Image defogging method based on self-attention coding and decoding - Google Patents

Image defogging method based on self-attention coding and decoding Download PDF

Info

Publication number
CN117151990B
CN117151990B CN202310774453.9A CN202310774453A CN117151990B CN 117151990 B CN117151990 B CN 117151990B CN 202310774453 A CN202310774453 A CN 202310774453A CN 117151990 B CN117151990 B CN 117151990B
Authority
CN
China
Prior art keywords
convolution
image
self
layer
images
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310774453.9A
Other languages
Chinese (zh)
Other versions
CN117151990A (en
Inventor
谌贵辉
汪少天
卢凯
魏钰力
郑莘于
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Southwest Petroleum University
Original Assignee
Southwest Petroleum University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Southwest Petroleum University filed Critical Southwest Petroleum University
Priority to CN202310774453.9A priority Critical patent/CN117151990B/en
Publication of CN117151990A publication Critical patent/CN117151990A/en
Application granted granted Critical
Publication of CN117151990B publication Critical patent/CN117151990B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • G06T3/4046Scaling of whole images or parts thereof, e.g. expanding or contracting using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • G06N3/0442Recurrent networks, e.g. Hopfield networks characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/082Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/776Validation; Performance evaluation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Biomedical Technology (AREA)
  • Medical Informatics (AREA)
  • Multimedia (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses an image defogging method based on self-attention coding and decoding, which comprises the steps of firstly downsampling an image, and then introducing a self-convolution and conventional convolution fusion processing downsampled feature map to improve the local feature extraction capacity of a whole convolution module; secondly, convolving the feature map again by introducing a residual error dense block; then, high-low level semantic information fusion in different scale features is realized through a gating fusion unit; then up-sampling is carried out, and the size of the feature map is restored to the size of the initial image; and finally, weighting the average absolute error loss function and the multi-scale similarity loss function to form a mixed loss function for model training so as to improve subjective feeling of defogging results. The method enables the basic unit to concentrate more on extracting high-frequency information, better highlights important characteristic diagram information in the channel, better extracts important information in the image and can recover defogging images more conforming to artificial sense organs.

Description

Image defogging method based on self-attention coding and decoding
Technical Field
The invention relates to the technical field of computer vision image processing, in particular to an image defogging method based on self-attention coding and decoding.
Background
The image defogging algorithm aims at eliminating noise interference caused by haze on an image, improving definition and color saturation of the image and recovering detail information of the image. Prior to the advent of deep learning methods, conventional defogging methods received attention from many students, mainly of two types: an image enhancement-based defogging method is used for optimizing an image visual effect only by enhancing an image, and enhancing certain information in a foggy image only according to actual requirements without considering a foggy formation mechanism and a foggy image imaging process; the other is a defogging method based on a physical model, which uses an atmospheric scattering model as a theoretical basis, and estimates unknown parameters in the model through a statistical method so as to restore a defogging image.
In recent years, the deep learning theory is widely applied in the image defogging field, and partial students train the neural network by using a large number of foggy day image data sets, so that defogging work is realized more efficiently. Image defogging methods based on convolutional neural networks can be classified into the following three types according to differences in image feature processing modes:
defogging method based on traditional convolutional neural network: such methods are mainly based on conventional convolutional neural networks for image defogging, and generally comprise the following steps: image preprocessing, feature extraction and feature recovery. However, the conventional convolutional neural network defogging method is poor in effect when texture and edge information are processed, loss or blurring of the texture information is easy to occur, and effective extraction and utilization of global and local key information are not available, so that the defogging effect in a complex scene is poor.
Image defogging method based on attention mechanism: the method mainly utilizes an attention mechanism to extract important features in the image, so that the defogging effect is more accurate. In particular, the attention mechanism may be by calculating the weight of each pixel in the image, thereby making the defogging algorithm more focused on important areas in the image. Volodymyr, et al, first introduced an attention mechanism into the RNN model to classify images, selected the areas of the images to be processed by the attention mechanism, each current state determined the location of attention based on the previous state and the currently entered image, and processed only the pixels within the area of attention, not all the pixels of the entire image. This has the advantage of reducing the number of pixels processed and the difficulty of tasks. However, the defogging method based on attention mostly only performs feature extraction on a single-scale level, directly performs recovery operation on the weighted extracted features, and lacks extraction and fusion of global multi-scale features.
Image defogging method based on multi-scale feature fusion: the method mainly fuses the image features with different scales, so that the defogging accuracy of the image is improved. In particular, such algorithms typically fuse features extracted by the model at different scales to generate more accurate defogging results. The multiscale CNN can extract effective characteristics from the foggy image to estimate the transmission diagram, but because the atmospheric light and the transmission diagram are estimated at the same time without adopting a learning mode, the atmospheric light estimation has larger error, and the quality of the final foggy image is affected.
In summary, compared with the traditional convolutional neural network algorithm, the image defogging algorithm based on the attention mechanism can adaptively select the region of interest and adjust according to scene characteristics, so that focus is paid attention to and defogging effect is improved. The multiscale fusion mechanism can extract and fuse information on different scales, so that multiscale information can be better processed, details and colors of images can be better recovered, and defogging effect is improved. However, the above methods based on the attention mechanism still lack an understanding of the spatial features and frequency domain feature information of the input image itself. Therefore, it is particularly important to design a suitable attention mechanism in the image defogging field and to reasonably utilize multi-scale feature fusion.
Disclosure of Invention
Aiming at the problem that an image defogging algorithm based on an attention mechanism lacks understanding of the spatial characteristic and the frequency domain characteristic information of an input image, the invention provides an image defogging method based on self-attention coding and decoding.
The invention provides an image defogging method based on self-attention coding and decoding, which specifically comprises the following steps:
s1: and selecting the public image defogging data set as an image data set to be tested, dividing the data set into a training set and a testing set, and carrying out image preprocessing.
An OTS data set in RESIDE-beta and a Foggy data set in Cityscapes are adopted as experimental data sets, images containing traffic scenes and the Foggy data sets in the OTS data sets are mixed to carry out training tasks, the OTS comprises outdoor clear images 2061, and one clear image corresponds to 35 Foggy images with different degrees; the Cityscapes contains 5002 clear traffic scene pictures, one clear picture corresponds to 3 foggy day images with different degrees, and the total number of the traffic foggy pictures is 15006. M images are selected as training sets, N images are selected as test sets, K real fog images containing traffic scenes are selected from RESIDE-beta at the same time, and the K real fog images are used as test sets for analyzing defogging effects of the real fog images, wherein 90% of the M images are used as training sets (M), 9% of the N images are used as real fog images (K), and 1% of the N images are used as test sets (N).
The image preprocessing refers to performing a series of processing steps on an original image before further processing the image in a computer vision task, so as to improve the effect of a subsequent task or reduce errors. These processing steps may include, but are not limited to, the following:
1. noise reduction: noise in the image is removed to better identify and analyze objects in the image.
2. Resizing the: the image is resized to fit a particular task or model.
3. Cutting: unnecessary parts in the image are removed so as to improve the effect of the subsequent task.
4. Rotation and flipping: the image is rotated or flipped to better match the desired input of the model.
5. Normalization and normalization: the image pixel values are normalized or normalized to better accommodate the input requirements of the model.
6. Contrast and brightness enhancement: the contrast and brightness of the image is increased or decreased in order to better identify and analyze objects in the image.
7. Conversion color space: the image is converted from one color space to another to better accommodate the needs of a particular task.
The image preprocessing can help to improve the accuracy and efficiency of the computer vision task, thereby better meeting the demands of users.
S2: downsampling the preprocessed images in the training set, and processing the downsampled feature map through self-convolution and conventional convolution fusion; the specific operation method is as follows:
adopting the maximum pooling of 3*3 as a downsampling layer, and acquiring multi-scale characteristic information by adopting the downsampling layer for a plurality of times in the encoding stage; and introducing self-convolution based on an attention mechanism after the downsampling layer, and fusing the characteristics extracted by conventional convolution with the characteristics extracted by self-convolution.
The self-convolution based on the attention mechanism operates as follows:
(1) Generating a convolution kernel related to the length and width of the initial image X through a kernel generating function phi (X), and expanding the generated single convolution kernel into a convolution kernel group with the channel dimension of C; the kernel generation function phi (X) is:
φ(X)=W 1 ξ(W 0 (X))
wherein X represents an input image; w (W) 0 ,W 1 Representing 2 linear transforms; ζ represents a nonlinear activation function;
(2) And performing convolution operation on the generated convolution kernel group and the corresponding position of the initial image X, performing aggregation operation on the self-convolution and common convolution results in the field of K, and finally outputting a feature map of the initial image.
S3: and (3) introducing a residual error dense block, performing convolution operation on the feature map again, and increasing the extraction of the feature information to finish the coding of the fog map.
The specific operation of the residual error dense block is as follows: the input is skip connected to each residual block based on the original residual structure, and the output of each layer convolution operation in each residual block is skip connected to the output of that residual block.
The original residual structure includes: two convolution blocks with the convolution kernel size of 3 and one convolution block with the convolution kernel size of 5, an identity mapping module, 3 Relu activation function modules and an aggregation module; the calculation formula is as follows:
F=Y-X
wherein F is the residual error to be learned; x and Y are input and output feature maps, respectively.
S4: and through a plurality of gating fusion units, different scale features are fused, so that high-level and low-level semantic information fusion in the multi-scale features is realized.
A single gated fusion unit consists of one convolutional layer and one polymeric layer. The convolution layer kernel sizes are assigned 7*7, 5*5, 3*3, 3*3, respectively, according to the feature map sizes.
The specific operation of the step is as follows:
first, network upper layer feature F i And underlying feature F i+1 Extracting and inputting the extracted data into a gating fusion unit; its output weight Q i And Q i+1 The number of the characteristic channels is matched according to the number of the specific characteristic channels of the upper layer and the lower layer, and each characteristic channel is respectively corresponding to each characteristic channel;
finally, the feature images of different layers are linearly combined with the weights output by the gating fusion unit, and the combined feature images are further sent to corresponding decoders to obtain target fog image residual errors, wherein the mathematical expression is as follows:
Q 1 ,Q 2 ,Q 3 ,.......Q i+1 =g i (F 1 ,F 2 ,F 3 ,.......F i+1 )
wherein g i Representing a gating fusion module of an ith layer; f (F) i Representing an i-th layer feature map; q (Q) i The weight of the combination output of the ith layer and the (i-1) th layer is represented; f (F) oi The characteristic diagram of the i-1 layer combined with the i-1 layer is shown.
S5: upsampling by taking bilinear interpolation as an upsampling layer, gradually restoring the size of the feature map to the size of the initial image by adopting a plurality of upsampling layers in a decoding stage, and finishing decoding of the fog map;
s6: during model training, weighting an average absolute error loss function and a multi-scale similarity loss function to form a mixed loss function, so as to obtain model parameters; wherein, the formula of the mixing loss function is as follows:
L mix =α·L MS-SSIM +(1-α)·G·L 1
wherein L is 1 Representing an average absolute error loss function; l (L) MS-SSIM Representing a multi-scale similarity loss function; alpha represents the weight within the mixing loss function; g is obtained by using Gaussian convolution on the errorWeight coefficient of (2);
s7: and inputting the preprocessed image test set into a trained neural network model for defogging based on the coded and decoded images for testing, so as to obtain defogged images.
Compared with the prior art, the invention has the following advantages:
(1) The self-attention-based coding and decoding method provided by the invention fully considers the problem of feature loss in the deep neural network from the network structure, and the multi-scale gating fusion unit connected layer by layer transmits the lost image feature information of each layer to the corresponding layer, so that the network can not lose the feature information related to human visual sense such as excessive image textures, colors and the like.
(2) The improved self-convolution module can pay attention to more detail features of the image when the features are extracted, so that the restored image is clearer; the residual error dense block can optimize calculation, so that the model is converged rapidly, and the reduction of image recovery quality caused by the increase of network depth is avoided; finally, L is adopted in training 1 And L MS-SSIM The composite loss function enhances the artificial perception of the restored image. The defogged image is more in line with the visual sense of human body in subjective sense, and objective indexes also indicate that the defogged image is better in quality, so that sufficient data preprocessing support can be provided for other image processing fields.
Additional advantages, objects, and features of the invention will be set forth in part in the description which follows and in part will become apparent to those having ordinary skill in the art upon examination of the following or may be learned from practice of the invention.
Drawings
Fig. 1 is a network configuration diagram of the image defogging method based on self-attention encoding and decoding of the present invention.
Fig. 2 is a block diagram of a modified self-convolution module.
Fig. 3 is a block diagram of a residual dense module.
Fig. 4 is a graph of the effect of each defogging method on the composite fog map in the OTS subset data set. In the figure, (a) a synthetic hazy image; (b) DCP process; (c) a Dehaze method; (d) AOD process; (e) GMAN process; (f) an ERAN method; (g) MFFID method; (h) the process of the invention; (i) a haze-free image.
FIG. 5 is a graph showing the effect of each defogging method on a composite fog map in the Foggy subset of data. (a) a synthetic hazy image; (b) DCP process; (c) a Dehaze method; (d) AOD process; (e) GMAN process; (f) an ERAN method; (g) MFFID method; (h) the process of the invention; (i) a haze-free image.
Fig. 6 is an effect diagram of each defogging method on a real fog map. (a) a true hazy image; (b) DCP process; (c) a Dehaze method; (d) AOD process; (e) GMAN process; (f) an ERAN method; (g) MFFID method; (h) the process of the invention.
Detailed Description
The preferred embodiments of the present invention will be described below with reference to the accompanying drawings, it being understood that the preferred embodiments described herein are for illustration and explanation of the present invention only, and are not intended to limit the present invention.
As shown in fig. 1-3, the image defogging method based on self-attention encoding and decoding provided by the present invention firstly adds a plurality of improved self-convolution modules in an encoder-decoder, and the modules can adaptively allocate weights at different positions of an image based on an attention mechanism, so as to prioritize spatial domain information. And secondly, replacing a common convolution block with a residual dense block so as to achieve the purposes of reducing or eliminating gradient disappearance, enhancing information flow and characteristic multiplexing and optimizing calculation force. Because the jump connection of the U-shaped structure directly connects the shallow layer characteristics of the encoder to the deep layer characteristics of the corresponding decoder, and the high-low grade semantic information is easy to ignore, a plurality of gating units with different scales are adopted in the network to fuse the upper and lower characteristic layers, and the module can aggregate the semantic information in a wider space, so that the characteristic loss is further reduced. And finally, connecting the fused features to the corresponding decoders layer by layer, and restoring the clear images after processing by the decoders.
Training process and reasoning:
in the experiment, an Adam optimizer is adopted to optimize training; initial learning rate of 1×10 -4 The method comprises the steps of carrying out a first treatment on the surface of the Batch training size was 16. Training image and verification imageThe small adjustment is OTS-256 x 256, foggy-512 x 1024, and the size of the test image is not limited.
Experimental evaluation:
the performance of the image defogging method (Our) of the present invention and the advanced defogging method (DCP method, dehaze method, AOD method, GMANmethod, ERAN method, MFFID method) in RESIDE-beta and in Cityscapes respectively were tested and compared. Fig. 4 and 5 are graphs of the effect of each defogging method on the composite fog map in the OTS and Foggy sub-data sets, respectively. Table 1 shows the objective evaluation index comparison of different comparison algorithms on the synthetic fog pattern.
TABLE 1 objective evaluation results of different defogging methods on synthetic haze patterns
It can be seen that the SSIM value of the method reaches 0.9207, which is 0.299 higher than that of the traditional DCP method and 0.0551 higher than that of other neural network methods on average; the MS-SSIM index value reaches 0.9762, which is 0.156 higher than that of the traditional DCP method and 0.0312 higher than that of other neural network methods on average; the PSNR index reaches 26.531dB, which is 12.11dB higher than that of the traditional DCP method, and 3.171dB higher than that of other neural network methods. Experimental results show that the method is superior to other methods in objective evaluation indexes on the synthetic fog images, the difference between the defogging images and the clear images is smaller, fog clusters can be effectively removed by the method, structural information of the images before and after defogging is more similar, and the processing effect of detail information such as edges and colors of the images is better than that of a comparison method.
Fig. 6 is an effect diagram of each defogging method on a true fog map. Table 2 shows the objective evaluation index comparison of different defogging algorithms on the real fog map. It can be seen that, since DCP has the problem of excessively enhancing boundary information, the index without reference is extremely sensitive to the pixel edge variation of the image, so that the values of the two indexes are abnormally high and have no comparability. In addition, the problem that the GMAN method outlines edges in subjective evaluation is also presented, and the evaluation indexes are higher than those of the ERAN method, the MMFID method and the method of the invention, which have better visual effects. Therefore, the method of the present invention has various indexes higher than other comparison methods except for the two network results of poor subjective visual performance, i.e., DCP and GMANs. Experimental results show that the method can recover fine contrast details and texture changes in the image, the edges of the recovered image are clearer, and the overall visual effect is better than that of a contrast method.
TABLE 2 objective evaluation results of different defogging methods on real fog patterns
The present invention is not limited to the above-mentioned embodiments, but is intended to be limited to the following embodiments, and any modifications, equivalents and modifications can be made to the above-mentioned embodiments without departing from the scope of the invention.

Claims (6)

1. A self-attention encoding and decoding-based image defogging method, comprising the steps of:
s1: selecting a public image defogging data set as an image data set to be tested, dividing the data set into a training set and a testing set, and carrying out image preprocessing;
s2: downsampling the preprocessed images in the training set, and processing the downsampled feature map through self-convolution and conventional convolution fusion; the specific operation is as follows:
adopting the maximum pooling of 3*3 as a downsampling layer, and acquiring multi-scale characteristic information by adopting the downsampling layer for a plurality of times in the encoding stage; introducing self-convolution based on an attention mechanism after a downsampling layer, and fusing the characteristics extracted by conventional convolution with the characteristics extracted by self-convolution;
s3: the residual error dense block is introduced, convolution operation is carried out on the feature map again, extraction of feature information is increased, and coding of the fog map is completed;
s4: the high-low semantic information fusion in the multi-scale features is realized by fusing the different-scale features through a plurality of gating fusion units;
s5: upsampling by taking bilinear interpolation as an upsampling layer, gradually restoring the size of the feature map to the size of the initial image by adopting a plurality of upsampling layers in a decoding stage, and finishing decoding of the fog map;
s6: during model training, weighting an average absolute error loss function and a multi-scale similarity loss function to form a mixed loss function, so as to obtain model parameters; the formula of the mixing loss function is as follows:
L mix =α·L MS-SSIM +(1-α)·G·L 1
wherein L is 1 Represents the average absolute error loss function, L MS-SSIM Representing a multi-scale similarity loss function, wherein alpha represents weights in the mixed loss function, and G is a weight coefficient obtained by using Gaussian convolution on errors;
s7: and inputting the preprocessed image test set into a trained neural network model for defogging based on the coded and decoded images for testing, so as to obtain defogged images.
2. The self-attention-based coded and decoded image defogging method as claimed in claim 1, wherein in step S2, the self-convolution based on the attention mechanism operates as follows:
(1) Generating a convolution kernel related to the length and width of the initial image X through a kernel generating function phi (X), and expanding the generated single convolution kernel into a convolution kernel group with the channel dimension of C; the kernel generation function phi (X) is:
φ(X)=W 1 ξ(W 0 (X))
wherein X represents an input image, W 0 、W 1 Representing 2 linear transformations, ζ representing a nonlinear activation function;
(2) And performing convolution operation on the generated convolution kernel group and the corresponding position of the initial image X, performing aggregation operation on the self-convolution and common convolution results in the field of K, and finally outputting a feature map of the initial image.
3. The self-attention-based coded and decoded image defogging method as claimed in claim 1, wherein in step S3, the specific operation of the residual error density block is: the input is skip connected to each residual block based on the original residual structure, and the output of each layer convolution operation in each residual block is skip connected to the output of that residual block.
4. A self-attention-based coded decoded image defogging method as claimed in claim 3, wherein said original residual structure comprises: two convolution blocks with the convolution kernel size of 3 and one convolution block with the convolution kernel size of 5, an identity mapping module, 3 Relu activation function modules and an aggregation module; the calculation formula is as follows:
F=Y-X
where F is the residual to be learned and X, Y is the input and output profile, respectively.
5. The self-attention-encoding-decoding-based image defogging method of claim 1, wherein in step S4, the single gating fusion unit is composed of one convolution layer and one aggregation layer; the specific operation of the step is as follows:
(1) Network upper layer feature F i And underlying feature F i+1 Extracted, input into a gating fusion unit, and output a weight Q i And Q i+1 The number of the characteristic channels is matched according to the number of the specific characteristic channels of the upper layer and the lower layer, and each characteristic channel is respectively corresponding to each characteristic channel;
(2) The characteristic diagrams of different layers and the weights output by the gating fusion unit are linearly combined, and the combined characteristic diagrams are further sent to corresponding decoders to obtain target fog diagram residual errors, wherein the mathematical expression is as follows:
Q 1 ,Q 2 ,Q 3 ,.......Q i+1 =g i (F 1 ,F 2 ,F 3 ,.......F i+1 )
wherein g i Gate fusion module representing ith layer, F i Represents an i-th layer feature map, Q i Representing the weight output by the combination of the ith layer and the (i-1) th layer, F oi The characteristic diagram of the i-1 layer combined with the i-1 layer is shown.
6. The image defogging method based on self-attention encoding and decoding according to claim 1, wherein in step S1, an OTS data set in the rest- β and a Foggy data set in the citiscapes are adopted as data sets of experiments, and the images containing traffic scenes and the Foggy data sets in the OTS data sets are used in a mixed manner for training, wherein a part of the images are selected as training sets, a part of the images are selected as test sets, and a part of the real fog images containing traffic scenes are selected from the rest- β at the same time as test sets for analyzing defogging effects of the real fog images.
CN202310774453.9A 2023-06-28 2023-06-28 Image defogging method based on self-attention coding and decoding Active CN117151990B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310774453.9A CN117151990B (en) 2023-06-28 2023-06-28 Image defogging method based on self-attention coding and decoding

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310774453.9A CN117151990B (en) 2023-06-28 2023-06-28 Image defogging method based on self-attention coding and decoding

Publications (2)

Publication Number Publication Date
CN117151990A CN117151990A (en) 2023-12-01
CN117151990B true CN117151990B (en) 2024-03-22

Family

ID=88906888

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310774453.9A Active CN117151990B (en) 2023-06-28 2023-06-28 Image defogging method based on self-attention coding and decoding

Country Status (1)

Country Link
CN (1) CN117151990B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117726550B (en) * 2024-02-18 2024-04-30 成都信息工程大学 Multi-scale gating attention remote sensing image defogging method and system

Citations (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110738622A (en) * 2019-10-17 2020-01-31 温州大学 Lightweight neural network single image defogging method based on multi-scale convolution
CN110992270A (en) * 2019-12-19 2020-04-10 西南石油大学 Multi-scale residual attention network image super-resolution reconstruction method based on attention
CN111445418A (en) * 2020-03-31 2020-07-24 联想(北京)有限公司 Image defogging method and device and computer equipment
CN111861925A (en) * 2020-07-24 2020-10-30 南京信息工程大学滨江学院 Image rain removing method based on attention mechanism and gate control circulation unit
CN113628152A (en) * 2021-09-15 2021-11-09 南京天巡遥感技术研究院有限公司 Dim light image enhancement method based on multi-scale feature selective fusion
CN113962878A (en) * 2021-07-29 2022-01-21 北京工商大学 Defogging model method for low-visibility image
CN114638960A (en) * 2022-03-22 2022-06-17 平安科技(深圳)有限公司 Model training method, image description generation method and device, equipment and medium
CN114821765A (en) * 2022-02-17 2022-07-29 上海师范大学 Human behavior recognition method based on fusion attention mechanism
CN114862713A (en) * 2022-04-29 2022-08-05 西安理工大学 Two-stage image rain removing method based on attention smooth expansion convolution
CN114881871A (en) * 2022-04-12 2022-08-09 华南农业大学 Attention-fused single image rain removing method
CN115546046A (en) * 2022-08-30 2022-12-30 华南农业大学 Single image defogging method fusing frequency and content characteristics
CN115700882A (en) * 2022-10-21 2023-02-07 东南大学 Voice enhancement method based on convolution self-attention coding structure
CN115705493A (en) * 2021-08-11 2023-02-17 暨南大学 Image defogging modeling method based on multi-feature attention neural network
CN115880170A (en) * 2022-12-05 2023-03-31 华南理工大学 Single-image rain removing method and system based on image prior and gated attention learning
CN116013297A (en) * 2022-12-17 2023-04-25 西安交通大学 Audio-visual voice noise reduction method based on multi-mode gating lifting model
CN116071549A (en) * 2023-02-16 2023-05-05 河南稳健科技有限公司 Multi-mode attention thinning and dividing method for retina capillary vessel

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116152082A (en) * 2021-11-15 2023-05-23 三星电子株式会社 Method and apparatus for image deblurring

Patent Citations (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110738622A (en) * 2019-10-17 2020-01-31 温州大学 Lightweight neural network single image defogging method based on multi-scale convolution
CN110992270A (en) * 2019-12-19 2020-04-10 西南石油大学 Multi-scale residual attention network image super-resolution reconstruction method based on attention
CN111445418A (en) * 2020-03-31 2020-07-24 联想(北京)有限公司 Image defogging method and device and computer equipment
CN111861925A (en) * 2020-07-24 2020-10-30 南京信息工程大学滨江学院 Image rain removing method based on attention mechanism and gate control circulation unit
CN113962878A (en) * 2021-07-29 2022-01-21 北京工商大学 Defogging model method for low-visibility image
CN115705493A (en) * 2021-08-11 2023-02-17 暨南大学 Image defogging modeling method based on multi-feature attention neural network
CN113628152A (en) * 2021-09-15 2021-11-09 南京天巡遥感技术研究院有限公司 Dim light image enhancement method based on multi-scale feature selective fusion
CN114821765A (en) * 2022-02-17 2022-07-29 上海师范大学 Human behavior recognition method based on fusion attention mechanism
CN114638960A (en) * 2022-03-22 2022-06-17 平安科技(深圳)有限公司 Model training method, image description generation method and device, equipment and medium
CN114881871A (en) * 2022-04-12 2022-08-09 华南农业大学 Attention-fused single image rain removing method
CN114862713A (en) * 2022-04-29 2022-08-05 西安理工大学 Two-stage image rain removing method based on attention smooth expansion convolution
CN115546046A (en) * 2022-08-30 2022-12-30 华南农业大学 Single image defogging method fusing frequency and content characteristics
CN115700882A (en) * 2022-10-21 2023-02-07 东南大学 Voice enhancement method based on convolution self-attention coding structure
CN115880170A (en) * 2022-12-05 2023-03-31 华南理工大学 Single-image rain removing method and system based on image prior and gated attention learning
CN116013297A (en) * 2022-12-17 2023-04-25 西安交通大学 Audio-visual voice noise reduction method based on multi-mode gating lifting model
CN116071549A (en) * 2023-02-16 2023-05-05 河南稳健科技有限公司 Multi-mode attention thinning and dividing method for retina capillary vessel

Non-Patent Citations (8)

* Cited by examiner, † Cited by third party
Title
Gated Fusion Network for Single Image Dehazing;Wenqi Ren et al;2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition;20181216;3253-3261 *
Involution: Inverting the Inherence of Convolution for Visual Recognition;Duo Li et al;2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR);20211102;12316-12325 *
一种基于循环生成式对抗网络的去雾算法;李潇雯等;西南师范大学学报(自然科学版);20200920(第09期);132-138 *
一种基于深度学习的两阶段图像去雾网络;吴嘉炜等;计算机应用与软件;20200412(第04期);197-202 *
利用门控机制融合依存与语义信息的事件检测方法;陈佳丽等;中文信息学报;20200815(第08期);55-64 *
基于多损失约束与注意力块的图像修复方法;曹真等;陕西科技大学学报;20200616(第03期);164-171 *
基于残差密集块与注意力机制的图像去雾网络;李硕士等;湖南大学学报(自然科学版);20210630;第48卷(第6期);112-118 *
注意力机制下的多阶段低照度图像增强网络;谌贵辉等;计算机应用;20230210;第43卷(第2期);552 - 559 *

Also Published As

Publication number Publication date
CN117151990A (en) 2023-12-01

Similar Documents

Publication Publication Date Title
Lim et al. DSLR: Deep stacked Laplacian restorer for low-light image enhancement
CN111047516B (en) Image processing method, image processing device, computer equipment and storage medium
CN112001960B (en) Monocular image depth estimation method based on multi-scale residual error pyramid attention network model
Hu et al. Underwater image restoration based on convolutional neural network
CN112541864A (en) Image restoration method based on multi-scale generation type confrontation network model
CN111292265A (en) Image restoration method based on generating type antagonistic neural network
Hsu et al. Single image dehazing using wavelet-based haze-lines and denoising
CN110363068B (en) High-resolution pedestrian image generation method based on multiscale circulation generation type countermeasure network
CN111861945B (en) Text-guided image restoration method and system
CN113222875B (en) Image harmonious synthesis method based on color constancy
CN112381716B (en) Image enhancement method based on generation type countermeasure network
Lu et al. Rethinking prior-guided face super-resolution: A new paradigm with facial component prior
CN117151990B (en) Image defogging method based on self-attention coding and decoding
CN116645569A (en) Infrared image colorization method and system based on generation countermeasure network
Zheng et al. T-net: Deep stacked scale-iteration network for image dehazing
CN116485934A (en) Infrared image colorization method based on CNN and ViT
CN115829876A (en) Real degraded image blind restoration method based on cross attention mechanism
CN115063318A (en) Adaptive frequency-resolved low-illumination image enhancement method and related equipment
CN115641391A (en) Infrared image colorizing method based on dense residual error and double-flow attention
Liu et al. Facial image inpainting using multi-level generative network
Zheng et al. Overwater image dehazing via cycle-consistent generative adversarial network
CN113763268A (en) Blind restoration method and system for face image
CN116137043A (en) Infrared image colorization method based on convolution and transfomer
CN110020986A (en) The single-frame image super-resolution reconstruction method remapped based on Euclidean subspace group two
CN116258627A (en) Super-resolution recovery system and method for extremely-degraded face image

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant