CN116542865A

CN116542865A - Multi-scale real-time defogging method and device based on structural re-parameterization

Info

Publication number: CN116542865A
Application number: CN202310223074.0A
Authority: CN
Inventors: 左方; 刘家萌; 高铭远
Original assignee: Henan University
Current assignee: Henan University
Priority date: 2023-03-09
Filing date: 2023-03-09
Publication date: 2023-08-04

Abstract

The invention discloses a multi-scale real-time defogging method and device based on structural re-parameterization, wherein the method comprises the following steps: constructing a structural re-parameterization module; constructing a multi-scale image defogging network based on structural re-parameterization; defogging the haze image by adopting a K-estination image reconstruction module; defining a composite loss function of a multi-scale image defogging network based on structural re-parameterization; initializing a multi-scale image defogging network; preparing a data set; training a multi-scale image defogging network by using the prepared data set; and defogging the haze image by using a trained multi-scale image defogging network, and detecting the quality and efficiency of the defogged image. According to the invention, the K-arrival image reconstruction module is added in the multi-scale network to defog the haze image, so that the physical characteristics contained in haze weather in the image can be better learned, and a defogging picture with higher quality can be recovered.

Description

Multi-scale real-time defogging method and device based on structural re-parameterization

Technical Field

The invention relates to the technical field of defogging of single images, in particular to a multi-scale real-time defogging method and device based on structural heavy parameterization.

Background

The modern industrialized development causes the haze phenomenon to be frequent, and has great influence on scientific research in the field of computer vision besides affecting our life. Haze is a common atmospheric phenomenon generated by small floating particles such as dust and smoke in the air, which greatly absorb and scatter light, resulting in reduced visibility, reduced contrast of photographed images, blurred image quality and pixel distortion, and seriously affecting the optical system of visible light. Under the influence of haze, practical applications requiring high-quality clear pictures such as remote sensing, navigation, automatic driving, video monitoring and the like are easily threatened, and advanced computer vision tasks such as detection and identification are difficult to finish. Therefore, the image defogging technology becomes an increasingly important underlying visual task, has important research value and is a challenging subject. The traditional defogging algorithm is mainly divided into two types, namely, a defogging algorithm based on image enhancement and a defogging algorithm based on image restoration. The defogging algorithm based on image enhancement, such as wavelet transformation, homomorphic filtering and the like, starts from removing image noise as much as possible and improving image contrast, so that a defogging clear image is recovered. Defogging algorithms based on image restoration, such as dark channel defogging algorithm and Bayesian defogging algorithm, perform defogging processing based on an atmospheric degradation model. Defogging effect based on the atmospheric degradation model is generally better than defogging algorithm based on image enhancement.

In recent years, convolutional neural networks have been rapidly developed in the field of computer vision image processing, and many conventional computer vision algorithms have been replaced by Deep Learning (DL), so that modern defogging technologies using convolutional neural networks (Convolutional Neural Network, CNN) are continuously emerging. These deep learning defogging techniques can be divided into two categories, the first category still being to input an image with haze into an atmospheric scattering model (Atmospheric Scattering Model, ASM), and estimating global atmospheric light values and transmittance in an atmospheric degradation model by using a neural network, so as to calculate a clear image after defogging. The second category is to directly predict and output defogged images by inputting haze images into a convolutional neural network by using a deep learning end-to-end method.

Conventional image defogging algorithms have focused more on using prior knowledge, such as dark channel prior, color decay prior, contrast maximum prior to restore a sharp image. Not all real scene images are compatible with a predefined prior, however, the performance of conventional image defogging algorithms is greatly limited. Recently, deep learning has shown effectiveness in the field of image defogging, and various convolutional neural network-based methods have been proposed to estimate an atmospheric degradation model. The degradation model can be expressed specifically as: i (x) =j (x) t (x) +a (1-t (x)), where a is a global atmospheric light value and t (x) is a transmission matrix, and most of current deep learning methods utilize a multi-branch network to estimate the transmission matrix t (x) and the global atmospheric light value a, respectively, and calculate defogging images through an atmospheric degradation model. The multi-branch network can give consideration to the low-layer characteristics and the high-layer characteristics in the convolutional neural network, and the detail information of the security image simultaneously contains more semantic information. However, the use of the multi-branch network causes excessive parameter quantity and large calculation cost, which increases the time complexity of the defogging algorithm, and the defogging algorithm cannot be applied to scenes with high instantaneity, such as automatic navigation, real-time monitoring and the like. In addition, in order to pursue defogging speed, most algorithms such as an AOD algorithm, a Light-DehazeNet algorithm adopts a lightweight single-branch network for training reasoning, however, the lightweight single-branch network has lower performance, so that defogging effect is not good.

Disclosure of Invention

Aiming at the problem that the existing image defogging method does not achieve a good balance on defogging speed and defogging quality, the invention provides a multi-scale real-time defogging method and device based on structure re-parameterization, which improves the image defogging quality through a structure re-parameterization module and a multi-branch network structure, converts the structure re-parameterization module into a common convolution module in the reasoning process, reduces the parameter quantity in the network model reasoning process, and further improves the reasoning speed of a defogging model while improving the image defogging quality.

In order to achieve the above purpose, the present invention adopts the following technical scheme:

the invention provides a multi-scale real-time defogging method based on structural heavy parameterization, which comprises the following steps:

step 1: constructing a structural re-parameterization module;

step 2: constructing a multi-scale image defogging network based on structural re-parameterization; the multi-scale image defogging network comprises a structural re-parameterization module and a K-arrival image reconstruction module;

step 3: defogging the haze image by adopting a K-estination image reconstruction module;

step 4: defining a composite loss function of a multi-scale image defogging network based on structural re-parameterization;

step 5: initializing a multi-scale image defogging network;

step 6: preparing a data set;

step 7: training a multi-scale image defogging network by using the prepared data set;

step 8: and defogging the haze image by using a trained multi-scale image defogging network, and detecting the quality and efficiency of the defogged image.

Further, the structure re-parameterization module has different structures during network training and reasoning; the structure re-parameterization module has a plurality of different branches during training, converts the multi-branch structure during training into a single-branch structure during reasoning through an identity transformation during reasoning, and uses the single-branch structure after conversion to perform equivalent reasoning during reasoning.

Further, the structural re-parameterization module comprises an identity mapping branch, a 1×1 convolution layer, a 3×3 convolution layer and a 5×5 convolution layer during training; mapping the identity to branches, representing the 1 multiplied by 1 convolution layer and the 3 multiplied by 3 convolution layer as a 5 multiplied by 5 convolution layer through zero padding, and converting the identity to a single-branch 5 multiplied by 5 convolution layer through element addition operation on the four branches; the converted structure re-parameterized block structure has only one branch consisting of a 5 x 5 convolution layer and a nonlinear activation function ReLU layer.

Further, the multi-scale image defogging network comprises three feature extraction modules with different scales and a K-estination image reconstruction module, wherein each feature extraction module consists of a 3X 3 convolution layer and two structural heavy parameterization modules.

Further, the K-estimate module combines the global atmospheric light value and the transmission matrix by mathematical transformation to a parameter K,where t (x) represents a transmission matrix, a represents a global atmospheric light value, I (x) =j (x) t (x) +a (1-t (x)) represents an atmospheric degradation model, and b is a constant deviation value of 1 as a default value, J (x) =k (x) I (x) -K (x) +b.

Further, in the step 4, the composite loss function is composed of a mean square error loss function and an edge perception loss function.

Further, in the step 5, the parameter initialization is performed on the convolution kernel using gaussian distribution initialization.

Further, the step 6 includes:

the synthetic haze image is created using the NYU2 depth dataset and the training set, the validation set and the test set are divided in a proportion.

Further, the step 7 includes:

setting the initial learning rate, and setting the number of batch images and training rounds by adopting an ADAM optimizer until the network converges.

In another aspect, the present invention provides a multi-scale real-time defogging device based on structural heavy parameterization, comprising:

the first network construction unit is used for constructing a structure re-parameterization module;

the second network construction unit is used for constructing a multi-scale image defogging network based on structural reconsideration; the multi-scale image defogging network comprises a structural re-parameterization module and a K-arrival image reconstruction module;

the third network construction unit is used for defogging the haze image by adopting the K-arrival image reconstruction module;

the loss function construction unit is used for defining a composite loss function of the multi-scale image defogging network based on the structural reconrameterization;

the network initialization unit is used for initializing a multi-scale image defogging network;

a data set construction unit for preparing a data set;

a network training unit for training a multi-scale image defogging network using the prepared data set;

and the defogging unit is used for defogging the haze image by using the trained multi-scale image defogging network and detecting the quality and efficiency of the defogged image.

Further, in the lost function construction unit, the composite loss function is composed of a mean square error loss function and an edge perception loss function.

Further, in the network initialization unit, the gaussian distribution initialization is used for initializing parameters of the convolution kernel.

Further, the data set construction unit includes:

Further, the network training unit includes:

Compared with the prior art, the invention has the beneficial effects that:

(1) The invention provides a multi-scale real-time defogging method and device based on structure re-parameterization aiming at the single image defogging field, wherein a structure re-parameterization module can equivalently convert a multi-branch structure in training into a single-branch structure in reasoning, the multi-branch structure is utilized for training so as to improve the arrangement capacity of a network, and the single-branch structure is utilized for reasoning so as to reduce the calculation cost of the network.

(2) The K-arrival image reconstruction module is added in the multi-scale network to defog the haze image. Three features with different scales generated in three feature extraction stages are input into two up-sampling convolution layers, and are input into a K-arrival module after fusion so as to capture more key underlying structure information and higher semantic information. The K-estimate module is a transformation structure based on an atmospheric scattering model, and physical characteristics contained in haze weather in an image can be better learned through the K-estimate module so as to recover defogging pictures with higher quality.

(3) The present invention performs training and testing on NYU2 depth datasets. Experimental results show that the defogging quality of the model is superior to that of a mainstream defogging algorithm based on deep learning, and meanwhile, the reasoning speed of the network reaches the real-time field. In addition, because the lightweight network model can be conveniently embedded into computer vision-based systems such as aerial photography, automatic navigation and real-time monitoring.

Drawings

FIG. 1 is a schematic flow chart of a multi-scale real-time defogging method based on structural re-parameterization according to an embodiment of the invention;

FIG. 2 is a schematic diagram of a structural re-parameterized module according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of a multi-scale image defogging network architecture according to an embodiment of the present invention;

FIG. 4 is a graph showing the comparison of defogging effects according to an embodiment of the present invention;

FIG. 5 is a diagram showing an example of defogging effect according to an embodiment of the present invention;

fig. 6 is a schematic structural diagram of a multi-scale real-time defogging device based on structural re-parameterization according to an embodiment of the present invention.

Detailed Description

The invention is further illustrated by the following description of specific embodiments in conjunction with the accompanying drawings:

as shown in fig. 1, a multi-scale real-time defogging method based on structural re-parameterization includes:

step one, a structural re-parameterization module is constructed. The structure re-parameterization module has a different structure at the time of network training and at the time of reasoning, and has a plurality of different branches at the time of training, including an identity mapping branch, a 1 x 1 convolution layer, a 3 x 3 convolution layer, and a 5 x 5 convolution layer. During reasoning, we convert the multi-branch structure during training into a single-branch structure during reasoning through an identity transformation. Specifically, the identity map branches, the 1×1 convolutional layer, and the 3×3 convolutional layer can be represented as a 5×5 convolutional layer by zero padding, and we convert the identity of the four branches into a single-branch 5×5 convolutional layer by performing element addition on the four branches. The converted structure re-parameterization module structure has only one branch consisting of a 5×5 convolution layer and a nonlinear activation function ReLU layer, and the equivalent reasoning is carried out by using the converted single-branch structure in the reasoning process, and the structure re-parameterization module is shown in fig. 2.

And step two, constructing a multi-scale image defogging network. The network comprises a three-dimensional feature extraction module of pyramidal structure to extract multi-dimensional features, the first feature extraction stage consisting of a 3 x 3 convolution layer and two structural re-parameterization modules, in which the number of channels of the feature map is increased to 32, and the next two stages each consisting of a 3 x 3 convolution layer and two structural re-parameterization modules, except that they increase the depth of the feature map to 64 and 128, respectively, and reduce the feature resolution by half. Three different scale features generated by the three feature extraction stages are fed into the channel attention module to capture more critical underlying structural information and high level semantic information. The multi-scale image defogging network is shown in fig. 3.

Step three, a K-estimation image reconstruction module is constructed to defog the haze image. The K-estimation module is equivalently transformed by an atmosphere degradation model I (x) =J (x) t (x) +A (1-t (x)), and the atmosphere degradation model respectively estimates a global atmosphere light value A and a transmission matrix t (x). The K-estimate module then combines the global atmospheric light value and the transmission matrix by mathematical transformation to a parameter K. Specifically, the formula is: j (x) =k (x) I (x) -K (x) +b, whereinBy this formula +.>And A are both integrated into the new variable K (x). b is a constant offset value of 1 as a default value. Since K (x) is I (x) dependent, an input adaptive depth model can be constructed and by minimizing its output J (x) and haze-free images.

And step four, defining a composite loss function of the multi-scale image defogging network based on the structural reparameterization. The composite loss function is formed by mean squareAn error loss function (MSE) and an edge-aware loss function. During the training phase, the loss function L is compounded _total Defined by the combination of these two loss functions and given by:

wherein lambda is ₁ ,λ ₂ Is the weight of two loss functions, L ₁ As a mean square error loss function, L ₂ Is an edge-aware loss function.

And fifthly, initializing a multi-scale image defogging network. Specifically, during training, weights parameterize the convolution kernel using gaussian distribution initialization.

Step six, preparing a data set. Specifically, the present embodiment uses the NYU2 depth dataset to create a composite haze image comprising 27256 composite haze images containing different haze thicknesses and corresponding sharp images, the composite images being generated from 1450 indoor scene images. As one implementation, the training set, validation set and test set are partitioned in a ratio of 8:1:1.

And step seven, training a multi-scale image defogging network by using the prepared data set. Specifically, we set the initial learning rate to 0.0001, perform training optimization with ADAM optimizer, the number of batch images to 16, and the training round to 100 until the network converges.

And step eight, defogging the haze image by using a trained image defogging network, and detecting the quality and efficiency of the defogged image. Specifically, the test set is fed into a defogging network model after training, and defogging reconstruction is carried out on haze images. Comparing the defogged picture with a corresponding clear image, evaluating the merits of a defogging algorithm from objective values by adopting two indexes of Structural Similarity (SSIM) and peak signal to noise ratio (PSNR), and calculating defogging time to prove that the defogging efficiency reaches the real-time field.

The specific parameter configuration of the multi-scale network based on the structural re-parameterization module is shown in table 1.

Table 1 network architecture parameter configuration

Compared with the traditional algorithm based on a physical model, such as a DCP defogging algorithm or an MSCNN defogging algorithm based on deep learning and an AOD defogging algorithm, the method has better performance compared with the other three defogging algorithms. Compared with a DCP algorithm, the PSNR quantified by the defogging effect of the model on the NYU2 depth data set exceeds the PSNR after defogging by the DCP algorithm by 5.0%, the SSIM exceeds 10%, and compared with the MSCNN, the PSNR of the image after defogging by the model exceeds the MSCNN by 3.5% and the SSIM exceeds 7%. Compared with AOD-Net, the model provided by the invention has the advantages that the quantized PSNR value of the image after defogging is 1% -1.7% higher, and the SSIM value is 0.5% -1% higher. In addition to quantitatively describing our algorithm, we qualitatively describe the defogging capability of our algorithm, the visual effect of the defogged picture of the NYU2 dataset is compared with that of the clear picture, see fig. 4, and in order to measure our generalization, we do defogging treatment on a set of outdoor scenes, the visual effect is shown in fig. 5.

The invention is compared with the traditional algorithm based on a physical model, such as a DCP defogging algorithm, or the related lightweight algorithm based on deep learning, such as an AOD defogging algorithm and a DCPDN algorithm, and the comparison test is carried out on a workstation with a display card of Nvidia Titan XP, and the experimental result is shown in Table 2:

table 2 run times for four different models

As can be seen from Table 2, the defogging time of the traditional defogging algorithm DCP is 1.62s, the high-performance defogging algorithm AOD-Net based on deep learning is 4.5ms, the DCPDN is 41.7ms, the method is 7.6ms, the method is in the same order as the AOD-Net, the real-time defogging requirement is achieved, and the defogging quality of the defogging method is better than that of the AOD-Net defogging algorithm.

On the basis of the above embodiment, as shown in fig. 6, the present invention further provides a multi-scale real-time defogging device based on structural heavy parameterization, including:

a data set construction unit for preparing a data set;

Further, the data set construction unit includes:

Further, the network training unit includes:

In summary, the invention provides a multi-scale real-time defogging method and device based on structure re-parameterization aiming at the single image defogging field, wherein a structure re-parameterization module can equivalently convert a multi-branch structure in training into a single-branch structure in reasoning, the multi-branch structure is utilized for training so as to improve the arrangement capacity of a network, and the single-branch structure is utilized for reasoning so as to reduce the calculation cost of the network.

The K-arrival image reconstruction module is added in the multi-scale network to defog the haze image. Three features with different scales generated in three feature extraction stages are input into two up-sampling convolution layers, and are input into a K-arrival module after fusion so as to capture more key underlying structure information and higher semantic information. The K-estimate module is a transformation structure based on an atmospheric scattering model, and physical characteristics contained in haze weather in an image can be better learned through the K-estimate module so as to recover defogging pictures with higher quality.

The present invention performs training and testing on NYU2 depth datasets. Experimental results show that the defogging quality of the model is superior to that of a mainstream defogging algorithm based on deep learning, and meanwhile, the reasoning speed of the network reaches the real-time field. In addition, because the lightweight network model can be conveniently embedded into computer vision-based systems such as aerial photography, automatic navigation and real-time monitoring.

The foregoing is merely illustrative of the preferred embodiments of this invention, and it will be appreciated by those skilled in the art that changes and modifications may be made without departing from the principles of this invention, and it is intended to cover such modifications and changes as fall within the true scope of the invention.

Claims

1. A multi-scale real-time defogging method based on structural re-parameterization, which is characterized by comprising the following steps:

step 1: constructing a structural re-parameterization module;

step 5: initializing a multi-scale image defogging network;

step 6: preparing a data set;

2. The multi-scale real-time defogging method based on structure re-parameterization according to claim 1, wherein the structure re-parameterization module has different structures during network training and reasoning; the structure re-parameterization module has a plurality of different branches during training, converts the multi-branch structure during training into a single-branch structure during reasoning through an identity transformation during reasoning, and uses the single-branch structure after conversion to perform equivalent reasoning during reasoning.

3. The multi-scale real-time defogging method based on structure re-parameterization according to claim 1, wherein said structure re-parameterization module comprises an identity mapping branch, a 1 x 1 convolution layer, a 3 x 3 convolution layer, a 5 x 5 convolution layer during training; mapping the identity to branches, representing the 1 multiplied by 1 convolution layer and the 3 multiplied by 3 convolution layer as a 5 multiplied by 5 convolution layer through zero padding, and converting the identity to a single-branch 5 multiplied by 5 convolution layer through element addition operation on the four branches; the converted structure re-parameterized block structure has only one branch consisting of a 5 x 5 convolution layer and a nonlinear activation function ReLU layer.

4. The multi-scale real-time defogging method based on structural re-parameterization according to claim 1, wherein the multi-scale image defogging network comprises three different-scale feature extraction modules and a K-arrival image reconstruction module, wherein each feature extraction module is composed of a 3 x 3 convolution layer and two structural re-parameterization modules.

5. The method of claim 1, wherein the K-estimate module combines the global atmospheric light value and the transmission matrix into a parameter K by mathematical transformation,where t (x) represents a transmission matrix, a represents a global atmospheric light value, I (x) =j (x) t (x) +a (1-t (x)) represents an atmospheric degradation model, and b is a constant deviation value of 1 as a default value, J (x) =k (x) I (x) -K (x) +b.

6. The multi-scale real-time defogging method based on the structural reparameterization according to claim 1, wherein in the step 4, the composite loss function is composed of a mean square error loss function and an edge perception loss function.

7. The multi-scale real-time defogging method based on the structural re-parameterization according to claim 1, wherein in the step 5, the convolution kernel is initialized by using gaussian distribution initialization.

8. A multi-scale real-time defogging method based on structural re-parameterization according to claim 1, wherein said step 6 comprises:

9. A multi-scale real-time defogging method based on structural re-parameterization according to claim 1, wherein said step 7 comprises:

10. A multi-scale real-time defogging device based on structural re-parameterization, comprising:

a data set construction unit for preparing a data set;