CN111915531B - Neural network image defogging method based on multi-level feature fusion and attention guidance - Google Patents
Neural network image defogging method based on multi-level feature fusion and attention guidance Download PDFInfo
- Publication number
- CN111915531B CN111915531B CN202010781155.9A CN202010781155A CN111915531B CN 111915531 B CN111915531 B CN 111915531B CN 202010781155 A CN202010781155 A CN 202010781155A CN 111915531 B CN111915531 B CN 111915531B
- Authority
- CN
- China
- Prior art keywords
- image
- convolution
- feature
- attention
- features
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 230000004927 fusion Effects 0.000 title claims abstract description 54
- 238000000034 method Methods 0.000 title claims abstract description 35
- 238000013528 artificial neural network Methods 0.000 title claims abstract description 10
- 230000006870 function Effects 0.000 claims abstract description 38
- 238000000605 extraction Methods 0.000 claims abstract description 12
- 238000012545 processing Methods 0.000 claims abstract description 8
- 230000004913 activation Effects 0.000 claims description 13
- 230000008447 perception Effects 0.000 claims description 12
- 238000012549 training Methods 0.000 claims description 10
- 230000008569 process Effects 0.000 claims description 7
- 230000007246 mechanism Effects 0.000 claims description 6
- 238000010606 normalization Methods 0.000 claims description 3
- 238000010586 diagram Methods 0.000 description 7
- 230000009286 beneficial effect Effects 0.000 description 2
- 239000012141 concentrate Substances 0.000 description 2
- 239000008358 core component Substances 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000000149 argon plasma sintering Methods 0.000 description 1
- 230000000295 complement effect Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 238000005286 illumination Methods 0.000 description 1
- 238000003384 imaging method Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 230000001575 pathological effect Effects 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T5/00—Image enhancement or restoration
- G06T5/73—Deblurring; Sharpening
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/048—Activation functions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02A—TECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
- Y02A90/00—Technologies having an indirect contribution to adaptation to climate change
- Y02A90/10—Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Biomedical Technology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Image Analysis (AREA)
- Image Processing (AREA)
Abstract
The invention discloses a neural network image defogging method with multi-level feature fusion and attention guidance, which comprises the following steps: constructing an image defogging model; acquiring foggy image data, and extracting feature graphs representing different phases through a feature extraction module; the characteristic graphs obtained at different stages are fused by utilizing a multi-level characteristic fusion module in the defogging model in a point-by-point element multiplication mode, and a clear image is better recovered by utilizing complementarity of low-layer and high-layer characteristics to guide a network; reconstructing the characteristics generated by the multi-level characteristic fusion module into clear fog-free images through a residual mixed attention module; and calculating the mean square error and the perceived loss of the restored image and the corresponding clear image, updating the image defogging model, and cooperatively optimizing the defogging model by two loss functions, namely a mean square error loss function and a perceived loss function. According to the technical scheme, defogging enhancement processing is carried out on the fog map which is actually shot, high-quality images are recovered, and the practicability is good.
Description
Technical Field
The invention relates to the technical field of image processing, in particular to a neural network image defogging method with multi-level feature fusion and attention guidance.
Background
The low visibility in severe weather (heavy fog and heavy rain) is a major problem faced by most computer vision techniques applied to actual scenes. Most automatic monitoring, autopilot, outdoor target recognition, etc. systems assume that the incoming video, images have clear visibility. However, such ideal conditions are not always satisfied in most cases, and therefore enhancement of low quality images, video, is an unavoidable task. Among them, image defogging is a representative image quality enhancement problem. The process of clear image fogging can be described by the atmospheric light scattering model proposed by McCartney et al:
I=tJ+A(1-t),
t(x)=e βd(x) ,
wherein I is a foggy image, t is medium transmissivity, J is a clear image, A is atmospheric illumination, d is the depth of object imaging, and beta is the scattering coefficient of the atmosphere. In the above model, I is a known quantity, and the objective of the image defogging task is to estimate a and t, and then generate a sharp image. The problem of image defogging is a pathological problem. Over the past 20 years, researchers have developed a number of image defogging algorithms to process images taken in foggy complex scenes. Early algorithms primarily focused on estimating depth information of images with multiple images and atmospheric cues to achieve image defogging. For example, narasimhan et al propose a physics-based method to locate depth discontinuities and calculate scene structures from two images of the same scene captured under arbitrary weather conditions. In addition, there are a series of algorithms to enhance the visual effect of image defogging by means of some prior information, and what is the typical algorithm is the defogging method of dark channel prior (DCP, dark Channel Prior) proposed in 2009 by kemine et al, the prior is based on observation and statistics to find that in most of the local areas of the foggy diagram which are not sky, some pixels always have at least one color channel with very low pixel value. For example, zhu Qingsong et al propose color decay prior (CAP, color Attenuation Prior). And recovering a clear image through the atmospheric scattering model by using the prior estimation t. These priors improve the defogging performance of the model to some extent. However, different priors rely on an estimate of a certain characteristic of the image and are often not suitable for real scenes.
Disclosure of Invention
Aiming at the defects of the prior art, the invention aims to provide a neural network image defogging method for defogging enhancement processing of a fog image which is actually shot and recovering multi-level characteristic fusion and attention guidance of a high-quality image.
In order to achieve the above purpose, the present invention provides the following technical solutions: a neural network image defogging method with multi-level feature fusion and attention guidance comprises the following steps:
s1, constructing an image defogging model; the image defogging model comprises a feature extraction module, a multi-level feature fusion module and a residual mixed convolution attention module;
s2, acquiring foggy image data, and firstly converting a foggy image into 16 feature images through a convolution layer; then, the feature graphs are processed through four stages of a feature extraction module to obtain features of different layers;
s3, a multi-level feature fusion module fuses feature graphs obtained at different stages in a point-by-point element multiplication mode, and a network is guided to better recover a clear image by utilizing complementarity of low-level features and high-level features;
s4, the characteristics generated by the multi-level characteristic fusion module are subjected to residual mixed convolution attention module to obtain a weight graph with the same size as the input elements, the weight graph obtained from an attention layer designed based on an attention mechanism guides a network to discard redundant information, the characteristic information effective for restoring a clear graph is focused, meanwhile, the training and operation efficiency of the residual mixed convolution attention module can be improved through the depth separable convolution operation adopted in the residual mixed convolution attention module, and the characteristics are finally reconstructed into clear haze-free images after passing through the residual mixed convolution attention module;
s5, calculating the mean square error and the perception loss of the restored image and the corresponding clear image, and updating an image defogging model; the method comprises the steps of measuring deviation between a restored image and a corresponding clear image by means of a mean square error, enabling a perception loss help model to perceive the image from a higher dimension, enabling the restored image to be more true, and enabling two loss functions of the mean square error and the perception loss to cooperate and jointly optimize a defogging model.
Preferably, step S5 specifically includes:
calculating a mean square error and a perceived loss for the restored image and the corresponding sharp image, wherein the first loss function is a mean square error loss function, and the formula is:
wherein W and H represent the width and height of the image, I re And I gt The method is a restored image and a corresponding clear image, i and j are pixel positions in an index image, c is an image RGB channel, and the range is from 1 to 3;
the second is a perceptual loss function, using a VGG16 network pre-trained on an ImageNet dataset (VGG-16 has 13 convolutional layers, divided into 5 phases) where the final convolutional layer at each phase of the network extracts features and computes the difference, the formula:
wherein { phi } k (-), k=1, 2,3} represents the feature extractor corresponding to the convolutional layer corresponding to VGG16 (i.e., conv1-2, conv2-2, and Conv 3-3), C k ,W k And H k Corresponds to phi k (.);
The total defogging model loss function is:
L=L mse +α*L per ,
where α is a parameter that balances the two loss functions.
Preferably, step S2 specifically includes:
feature extraction starts with a 3 x 3 convolution layer that converts a given input foggy image into 16 feature maps;
then, the feature maps are processed through the following four stages to obtain features of different layers; each stage comprises four layers, the first layer being a 3 x 3 convolution with a step size of 2, which is used to reduce the resolution of the feature map to 1/2 and double the width; the second layer and the third layer respectively comprise 3×3 convolutions, a ReLU activation function and 3×3 convolutions; the fourth layer is a 1 x 1 convolution, which reduces the width of the features produced by the third layer to 64 as an output for each stage.
Preferably, in step S3, the multi-level feature fusion module has three feature fusion modules from top to bottom. The feature fusion module is used for fusing the high-level features (fourth convolution-activation function-convolution combined output feature map) and the low-level features (third convolution-activation function-convolution combined output feature map), the fused features are regarded as high-level features, and then the second feature fusion module is used for fusing the high-level features with middle-level features in the third convolution-activation function-convolution combined output feature map. And finally, taking the features obtained by the second feature fusion module as high-level features, and fusing the features with low-level features in the first convolution-activation function-convolution combined output feature map through a third feature fusion module.
For each feature fusion module, given high-level and low-level features, element-by-element multiplication is used to achieve fusion between features. The fused features will be applied to the convolutional layers, the batch normalization and the ReLU activation functions, and then processed by the next fusion module.
Preferably, step S4 specifically includes: the residual hybrid convolution attention module has three consecutive packet convolution layers followed by an attention layer. The given features are processed by them and added to the residual to obtain the output features. The grouping convolution is to group the input features by the number of channels (the number of groups is a super parameter), and apply the convolution operation to each group. Because of the division of the groups, the FLOP (floating point operations performed per second) of the residual mixed convolution attention model is greatly reduced, and the training and defogging efficiency of the network is improved. The group numbers of the grouping convolution layers are respectively 4, 8 and 16, namely, the input characteristic diagram is respectively divided into 4 groups, 8 groups and 16 groups according to the channel number for processing. This configuration was determined by experimental studies.
After three grouping convolution processes, an attention layer is added, and the attention layer enables the output characteristics to reflect important characteristic information of a clear image in an input fog image, so that the network focuses on important clear fog-free image information to be adopted; the attention mechanism is realized through two steps, wherein the first step is to use depth convolution, then a ReLU activation function, then point convolution and then a Sigmoid activation function so as to acquire feature weights; the second step multiplies the original input feature by the obtained weight to obtain a weight map with the same size as the input element, and the weight map is applied to the input feature by element multiplication to output a final feature; the weight map obtained from the attention layer guides the network to discard redundant information (haze characteristic information), focuses on characteristic information of clear haze-free images, and simultaneously adopts depth separable convolution operation (combining two parts of depth convolution and point convolution) to improve training and operation efficiency of the residual mixed convolution attention module.
The invention has the advantages that: compared with the prior art, the invention has the following beneficial effects:
1. compared with the prior art, the invention provides a multi-level feature fusion module which can adaptively adopt features of different levels and recover clear images by utilizing the complementation effect between the modules;
2. compared with the prior art, the invention develops a residual mixed convolution attention module with an attention layer. The mixed convolution operation improves the efficiency of network operation, and the attention block concentrates the model on more important information;
3. the invention also provides a method for cooperatively guiding the defogging model to achieve defogging performance by using the mean square error loss and the perception loss function. The mean square error measures the deviation between the restored image and the corresponding sharp image, while the perceived loss helps the model to perceive the image from a higher dimension, restoring a more realistic sharp image.
The invention is further described below with reference to the drawings and specific examples.
Drawings
FIG. 1 is a defogging flow chart according to an embodiment of the present invention;
FIG. 2 is an application scenario diagram of an embodiment of the present invention;
FIG. 3 is an application scenario diagram of a core component residual hybrid convolution module in the model of FIG. 2;
FIG. 4 is an application scenario diagram of the attention layer of the core component of the model of FIG. 3;
FIG. 5 is an effect diagram of the restored image in the image defogging model of FIG. 2 compared with other methods.
Detailed Description
Referring to fig. 1 to 5, the neural network image defogging method with multi-level feature fusion and attention guidance disclosed by the invention comprises the following steps:
s1, constructing an image defogging model; the image defogging model comprises a feature extraction module, a multi-level feature fusion module and a residual mixed convolution attention module;
the specific process is that an image defogging model is constructed as shown in fig. 2. The image defogging model comprises a feature extraction module (shown in figure 2), a multi-level feature fusion module (shown in figure 2) and a residual mixed convolution attention module (shown in figure 2);
s2, acquiring foggy image data, and firstly converting a foggy image into 16 feature images through a convolution layer; then, the feature graphs are processed through four stages of a feature extraction module to obtain features of different layers;
s3, a multi-level feature fusion module fuses feature graphs obtained at different stages in a point-by-point element multiplication mode, and a network is guided to better recover a clear image by utilizing complementarity of low-level features and high-level features;
s4, the characteristics generated by the multi-level characteristic fusion module are subjected to residual mixed convolution attention module to obtain a weight graph with the same size as the input elements, the weight graph obtained from an attention layer designed based on an attention mechanism guides a network to discard redundant information, the characteristic information effective for restoring a clear graph is focused, meanwhile, the training and operation efficiency of the residual mixed convolution attention module can be improved through the depth separable convolution operation adopted in the residual mixed convolution attention module, and the characteristics are finally reconstructed into clear haze-free images after passing through the residual mixed convolution attention module;
s5, calculating the mean square error and the perception loss of the restored image and the corresponding clear image, and updating an image defogging model; the method comprises the steps of measuring deviation between a restored image and a corresponding clear image by means of a mean square error, enabling a perception loss help model to perceive the image from a higher dimension, enabling the restored image to be more true, and enabling two loss functions of the mean square error and the perception loss to cooperate and jointly optimize a defogging model.
Preferably, step S5 specifically includes:
calculating a mean square error and a perceived loss for the restored image and the corresponding sharp image, wherein the first loss function is a mean square error loss function, and the formula is:
wherein W and H represent the width and height of the image, I re And I gt The method is a restored image and a corresponding clear image, i and j are pixel positions in an index image, c is an image RGB channel, and the range is from 1 to 3;
the second is a perceptual loss function, which uses a VGG16 pre-trained on the ImageNet dataset (VGG-16 has 13 convolutional layers, divided into 5 phases) to extract features and calculate differences from the last convolutional layer of each phase of the network, the formula:
wherein { phi } k (-), k=1, 2,3} represents the feature extractor corresponding to the convolutional layer corresponding to VGG16 (i.e., conv1-2, conv2-2, and Conv 3-3), C k ,W k And H k Corresponds to phi k (.);
The total defogging model loss function is:
L=L mse +α*L per ,
where α is a parameter that balances the two loss functions.
Preferably, step S2 specifically includes: the method comprises the specific processes that a hazy picture is obtained, and the characteristic extractor is different from the characteristic extractor of other methods in that the characteristic extractor does not need training in advance and is lightweight;
feature extraction starts with a 3 x 3 convolution layer that converts a given input foggy image into 16 feature maps;
then, the feature maps are processed through the following four stages to obtain features of different layers; each stage comprises four layers, the first layer being a 3 x 3 convolution with a step size of 2, which is used to reduce the resolution of the feature map to 1/2 and double the width; the second layer and the third layer respectively comprise 3×3 convolutions, a ReLU activation function and 3×3 convolutions; the fourth layer is a 1 x 1 convolution, which reduces the width of the features produced by the third layer to 64 as an output for each stage.
Preferably, in step S3, the multi-level feature fusion module has three feature fusion modules from top to bottom, where the feature fusion module fuses a high-level feature (a fourth convolution-activation function-convolution combined output feature map) and a low-level feature (a third convolution-activation function-convolution combined output feature map), and the feature generated by the fusion is regarded as a high-level feature, and then the second feature fusion module fuses the feature with a middle-level feature in the third convolution-activation function-convolution combined output feature map. And finally, taking the features obtained by the second feature fusion module as high-level features, and fusing the features with low-level features in the first convolution-activation function-convolution combined output feature map through a third feature fusion module.
For each feature fusion module, given high-level and low-level features, element-by-element multiplication is used to achieve fusion between features. The fused features will be applied to the convolutional layers, the batch normalization and the ReLU activation functions, and then processed by the next fusion module.
Preferably, step S4 specifically includes: the residual hybrid convolution attention module has three consecutive packet convolution layers followed by an attention layer. The given features are processed by them and added to the residual to obtain the output features. The grouping convolution is to group the input features by the number of channels (the number of groups is a super parameter), and apply the convolution operation to each group. Because of the division of the groups, the FLOP (floating point operations performed per second) of the residual mixed convolution attention model is greatly reduced, and the training and defogging efficiency of the network is improved. The group numbers of the grouping convolution layers are respectively 4, 8 and 16, namely, the input characteristic diagram is respectively divided into 4 groups, 8 groups and 16 groups according to the channel number for processing. This configuration was determined by experimental studies.
After three grouping convolution processes, an attention layer is added, and the attention layer enables the output characteristics to reflect important characteristic information of a clear image in an input fog image, so that the network focuses on important clear fog-free image information to be adopted; the attention mechanism is realized through two steps, wherein the first step is to use depth convolution, then a ReLU activation function, then point convolution and then a Sigmoid activation function so as to acquire feature weights; the second step multiplies the original input feature by the obtained weight to obtain a weight map with the same size as the input element, and the weight map is applied to the input feature by element multiplication to output a final feature; the weight map obtained from the attention layer guides the network to discard redundant information (haze characteristic information), focuses on characteristic information of clear haze-free images, and simultaneously adopts depth separable convolution operation (combining two parts of depth convolution and point convolution) to improve training and operation efficiency of the residual mixed convolution attention module.
When the method is actually applied, firstly, a foggy image is input into a feature extraction module, and the combination of a convolution layer, an activation function and the convolution layer at four different stages of the module is utilized to extract different features of four layers of the image effectively;
secondly, inputting the extracted four features into a multi-level feature fusion module, wherein the module multiplies the features of different levels element by element, and the complementarity of the features of the lower level and the higher level is utilized to help the network to better recover the clear image;
and then, processing the characteristics generated by the multi-level characteristic fusion module by using the residual mixed convolution attention module to obtain a weight graph with the same size as the input element. The weight map derived from the attention module directs the network to relinquish redundant functionality and focus attention on more important functions. The depth and point direction convolution operations employed can improve the efficiency of this module. Finally reconstructing the image into a clear fog-free image after passing through the module;
finally, calculating the mean square error and the perception loss of the restored image and the corresponding clear image, and updating an image defogging model; wherein the mean square error measures the deviation between the restored image and the corresponding sharp image, and the perceived loss helps the model to perceive the image from a higher dimension, the restored more realistic sharp image. The two loss functions cooperate to jointly optimize the defogging model.
The invention has the following beneficial effects:
1. compared with the prior art, the invention provides a multi-level feature fusion module which can adaptively adopt features of different levels and effectively recover clear images from blurred images by utilizing the complementary action between the features;
2. compared with the prior art, the invention develops a residual mixed convolution attention module with an attention layer. The mixed convolution operation improves the efficiency of network operation, and the attention block concentrates the model on more important information;
3. the invention also provides a method for cooperatively guiding the defogging model to achieve defogging performance by using the mean square error loss and the perception loss function. The mean square error measures the deviation between the restored image and the corresponding sharp image, while the perceived loss helps the model to perceive the image from a higher dimension, restoring a more realistic sharp image.
The foregoing embodiments are provided for further explanation of the present invention and are not to be construed as limiting the scope of the present invention, and some insubstantial modifications and variations of the present invention, which are within the scope of the invention, will be suggested to those skilled in the art in light of the foregoing teachings.
Claims (3)
1. A neural network image defogging method with multi-level feature fusion and attention guidance is characterized in that: the method comprises the following steps:
s1, constructing an image defogging model; the image defogging model comprises a feature extraction module, a multi-level feature fusion module and a residual mixed convolution attention module;
s2, acquiring foggy image data, and firstly converting a foggy image into 16 feature images through a convolution layer; then, the feature graphs are processed through four stages of a feature extraction module to obtain features of different layers;
s3, a multi-level feature fusion module fuses feature graphs obtained at different stages in a point-by-point element multiplication mode, and a network is guided to better recover a clear image by utilizing complementarity of low-level features and high-level features;
s4, the characteristics generated by the multi-level characteristic fusion module are subjected to residual mixed convolution attention module to obtain a weight graph with the same size as the input elements, the weight graph obtained from an attention layer designed based on an attention mechanism guides a network to discard redundant information, the characteristic information effective for restoring a clear graph is focused, meanwhile, the training and operation efficiency of the residual mixed convolution attention module can be improved through the depth separable convolution operation adopted in the residual mixed convolution attention module, and the characteristics are finally reconstructed into clear haze-free images after passing through the residual mixed convolution attention module;
s5, calculating the mean square error and the perception loss of the restored image and the corresponding clear image, and updating an image defogging model; the method comprises the steps of measuring deviation between a restored image and a corresponding clear image by means of a mean square error, enabling a perception loss help model to perceive the image from a higher dimension, enabling the restored image to be more true, and enabling two loss functions of the mean square error and the perception loss to cooperate to jointly optimize a defogging model;
step S3, the multi-level feature fusion module is provided with three feature fusion modules from top to bottom,
the feature fusion module I fuses the high-level features and the low-level features, and the fused features are regarded as high-level features; then the second feature fusion module fuses the middle layer feature in the third convolution-activation function-convolution combination output feature map with the middle layer feature; finally, the features obtained by the second feature fusion module are regarded as high-level features, and the features are fused with low-level features in the first convolution-activation function-convolution combination output feature map through a third feature fusion module;
for each feature fusion module, given high-level and low-level features, element-by-element multiplication is used for realizing fusion between features, the fused features are applied to convolution layer batch normalization and ReLU activation functions, and then the next fusion module is used for processing;
step S4, specifically comprising:
the residual mixed convolution attention module is provided with three continuous grouping convolution layers, the back of the attention layer is provided with an attention layer, given features are processed by the attention layer and added into the residual to obtain output features, the grouping convolution is to group the input features according to the number of channels and apply convolution operation to each group respectively, and due to the division of the groups, the FLOP of the residual mixed convolution attention model is greatly reduced, and the training and defogging efficiency of the network is improved; the group numbers of the grouping convolution layers are respectively 4, 8 and 16, and the input characteristic images are respectively divided into 4 groups, 8 groups and 16 groups according to the channel number for processing;
after three grouping convolution processes, an attention layer is added, and the attention layer enables the output characteristics to reflect important characteristic information of a clear image in an input fog image, so that the network focuses on important clear fog-free image information to be adopted; the attention mechanism is realized through two steps, wherein the first step is to use depth convolution, then a ReLU activation function, then point convolution and then a Sigmoid activation function so as to acquire feature weights; the second step multiplies the original input feature by the obtained weight to obtain a weight map with the same size as the input element, and the weight map is applied to the input feature by element multiplication to output a final feature; the weight map obtained from the attention layer guides the network to discard redundant information, focuses on the characteristic information of clear fog-free images, and improves the training and operation efficiency of the residual mixed convolution attention module by adopting depth separable convolution operation.
2. The neural network image defogging method based on multi-level feature fusion and attention guidance according to claim 1, wherein the method comprises the following steps: step S5, specifically comprising:
calculating a mean square error and a perceived loss for the restored image and the corresponding sharp image, wherein the first loss function is a mean square error loss function, and the formula is:
wherein W and H represent the width and height of the image, I re And I gt The method is a restored image and a corresponding clear image, i and j are pixel positions in an index image, c is an image RGB channel, and the range is from 1 to 3;
the second is a perceptual loss function, which uses the last convolutional layer of each stage of the VGG16 network pre-trained on the ImageNet dataset to extract features and calculate differences, the formula:
wherein { phi } k (-), k=1, 2,3} represents the feature extractor corresponding to the convolutional layer corresponding to VGG16, C k ,W k And H k Corresponds to phi k (.);
The total defogging model loss function is:
L=L mse +α*L per ,
where α is a parameter that balances the two loss functions.
3. The neural network image defogging method for multi-level feature fusion and attention guidance according to claim 2, wherein the method comprises the following steps: step S2, specifically comprising:
feature extraction starts with a 3 x 3 convolution layer that converts a given input foggy image into 16 feature maps;
then, the feature maps are processed through the following four stages to obtain features of different layers; each stage comprises four layers, the first layer being a 3 x 3 convolution with a step size of 2, which is used to reduce the resolution of the feature map to 1/2 and double the width; the second layer and the third layer respectively comprise 3×3 convolutions, a ReLU activation function and 3×3 convolutions; the fourth layer is a 1 x 1 convolution, which reduces the width of the features produced by the third layer to 64 as an output for each stage.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010781155.9A CN111915531B (en) | 2020-08-06 | 2020-08-06 | Neural network image defogging method based on multi-level feature fusion and attention guidance |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010781155.9A CN111915531B (en) | 2020-08-06 | 2020-08-06 | Neural network image defogging method based on multi-level feature fusion and attention guidance |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111915531A CN111915531A (en) | 2020-11-10 |
CN111915531B true CN111915531B (en) | 2023-09-29 |
Family
ID=73288183
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010781155.9A Active CN111915531B (en) | 2020-08-06 | 2020-08-06 | Neural network image defogging method based on multi-level feature fusion and attention guidance |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111915531B (en) |
Families Citing this family (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112581409B (en) * | 2021-01-05 | 2024-05-07 | 戚如嬅耳纹科技(深圳)有限公司 | Image defogging method based on end-to-end multiple information distillation network |
CN112991201B (en) * | 2021-02-18 | 2024-04-05 | 西安理工大学 | Image defogging method based on color correction and context aggregation residual error network |
CN113222016B (en) * | 2021-05-12 | 2022-07-12 | 中国民航大学 | Change detection method and device based on cross enhancement of high-level and low-level features |
CN113139922B (en) * | 2021-05-31 | 2022-08-02 | 中国科学院长春光学精密机械与物理研究所 | Image defogging method and defogging device |
CN113284070A (en) * | 2021-06-16 | 2021-08-20 | 河南理工大学 | Non-uniform fog image defogging algorithm based on attention transfer mechanism |
CN113450273B (en) * | 2021-06-18 | 2022-10-14 | 暨南大学 | Image defogging method and system based on multi-scale multi-stage neural network |
CN113344806A (en) * | 2021-07-23 | 2021-09-03 | 中山大学 | Image defogging method and system based on global feature fusion attention network |
CN113870126B (en) * | 2021-09-07 | 2024-04-19 | 深圳市点维文化传播有限公司 | Bayer image recovery method based on attention module |
CN113689356B (en) * | 2021-09-14 | 2023-11-24 | 三星电子(中国)研发中心 | Image restoration method and device |
CN113781363B (en) * | 2021-09-29 | 2024-03-05 | 北京航空航天大学 | Image enhancement method with adjustable defogging effect |
CN114022371B (en) * | 2021-10-22 | 2024-04-05 | 中国科学院长春光学精密机械与物理研究所 | Defogging device and defogging method based on space and channel attention residual error network |
CN113962901B (en) * | 2021-11-16 | 2022-08-23 | 中国矿业大学(北京) | Mine image dust removing method and system based on deep learning network |
CN114283078B (en) * | 2021-12-09 | 2024-06-18 | 北京理工大学 | Self-adaptive fusion image defogging method based on two-way convolutional neural network |
CN117853371B (en) * | 2024-03-06 | 2024-05-31 | 华东交通大学 | Multi-branch frequency domain enhanced real image defogging method, system and terminal |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110097519A (en) * | 2019-04-28 | 2019-08-06 | 暨南大学 | Double supervision image defogging methods, system, medium and equipment based on deep learning |
AU2020100274A4 (en) * | 2020-02-25 | 2020-03-26 | Huang, Shuying DR | A Multi-Scale Feature Fusion Network based on GANs for Haze Removal |
-
2020
- 2020-08-06 CN CN202010781155.9A patent/CN111915531B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110097519A (en) * | 2019-04-28 | 2019-08-06 | 暨南大学 | Double supervision image defogging methods, system, medium and equipment based on deep learning |
AU2020100274A4 (en) * | 2020-02-25 | 2020-03-26 | Huang, Shuying DR | A Multi-Scale Feature Fusion Network based on GANs for Haze Removal |
Non-Patent Citations (1)
Title |
---|
一种基于条件生成对抗网络的去雾方法;贾绪仲;文志强;;信息与电脑(理论版)(第09期);全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN111915531A (en) | 2020-11-10 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111915531B (en) | Neural network image defogging method based on multi-level feature fusion and attention guidance | |
US10353271B2 (en) | Depth estimation method for monocular image based on multi-scale CNN and continuous CRF | |
CN110111366B (en) | End-to-end optical flow estimation method based on multistage loss | |
CN111915530B (en) | End-to-end-based haze concentration self-adaptive neural network image defogging method | |
CN112288658A (en) | Underwater image enhancement method based on multi-residual joint learning | |
CN111709895A (en) | Image blind deblurring method and system based on attention mechanism | |
CN111754446A (en) | Image fusion method, system and storage medium based on generation countermeasure network | |
CN110349093B (en) | Single image defogging model construction and defogging method based on multi-stage hourglass structure | |
CN111539888B (en) | Neural network image defogging method based on pyramid channel feature attention | |
CN115223004A (en) | Method for generating confrontation network image enhancement based on improved multi-scale fusion | |
CN114742719A (en) | End-to-end image defogging method based on multi-feature fusion | |
CN111275627A (en) | Image snow removing algorithm based on snow model and deep learning fusion | |
CN111553845A (en) | Rapid image splicing method based on optimized three-dimensional reconstruction | |
CN115115685A (en) | Monocular image depth estimation algorithm based on self-attention neural network | |
CN115035010A (en) | Underwater image enhancement method based on convolutional network guided model mapping | |
CN112419163A (en) | Single image weak supervision defogging method based on priori knowledge and deep learning | |
CN114119694A (en) | Improved U-Net based self-supervision monocular depth estimation algorithm | |
CN112508828A (en) | Multi-focus image fusion method based on sparse representation and guided filtering | |
CN116542865A (en) | Multi-scale real-time defogging method and device based on structural re-parameterization | |
CN112767275B (en) | Single image defogging method based on artificial sparse annotation information guidance | |
CN116228550A (en) | Image self-enhancement defogging algorithm based on generation of countermeasure network | |
CN113870162A (en) | Low-light image enhancement method integrating illumination and reflection | |
CN117994167B (en) | Diffusion model defogging method integrating parallel multi-convolution attention | |
CN115496694B (en) | Method for recovering and enhancing underwater image based on improved image forming model | |
CN116128768B (en) | Unsupervised image low-illumination enhancement method with denoising module |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |