CN116452450A

CN116452450A - Polarized image defogging method based on 3D convolution

Info

Publication number: CN116452450A
Application number: CN202310390770.0A
Authority: CN
Inventors: 王昕�; 付伟; 于海潮; 高隽
Original assignee: Hefei University of Technology
Current assignee: Hefei University of Technology
Priority date: 2023-04-07
Filing date: 2023-04-07
Publication date: 2023-07-18

Abstract

The invention discloses a polarized image defogging method based on 3D convolution, which comprises the following steps: 1. acquiring a synthesized polarized image dataset by utilizing a polarized image generation channel; 2. constructing a depth convolution neural network based on 3D convolution and using polarized images, taking four polarized images with different polarization angles as input, and training the depth convolution neural network to obtain a defogging model; 3. and carrying out defogging treatment on the polarized image to be defogged by using the trained model to obtain a recovered defogged image. The invention can realize defogging of the polarized image based on 3D convolution, so as to effectively improve defogging effect under complex and changeable scenes, thereby providing clearer images for a plurality of advanced visual tasks.

Description

Polarized image defogging method based on 3D convolution

Technical Field

The invention belongs to the fields of computer vision, image processing and analysis, and particularly relates to a defogging method based on a 3D convolution network by using polarized images.

Background

Fog or haze is a common atmospheric phenomenon, and under such weather conditions, outdoor air contains a large amount of tiny suspended particles, the tiny particles can refract and scatter atmospheric light, and light rays after refraction and scattering can be mixed with light rays after reflection of a target scene to be observed, so that the visibility of the scene is greatly reduced, the contrast of an image captured by an outdoor image acquisition device is obviously reduced, and even the phenomena of image color distortion and great loss of details are caused. Some advanced computer vision tasks, such as object detection, image segmentation, etc., require high quality images as input. However, in severe weather conditions with fog or haze, the quality of the acquired image is degraded, greatly affecting the processing of these visual tasks. Therefore, image defogging work is important.

In recent years, image defogging has been increasingly emphasized by researchers, and many image defogging models with good performance have been proposed. Currently, existing frames fall broadly into two categories: traditional defogging method based on manual priori knowledge and defogging method based on deep learning algorithm. The traditional defogging method relies on manual priori knowledge based on statistics from clear images, and utilizes an atmospheric scattering model to restore a defogging image. There are well known proposed dark channel prior algorithms (DCPs) based on the assumption that the value of a pixel in a haze free image in at least one color channel is close to 0. While conventional defogging methods have advanced to some extent, these assumptions and prior knowledge are specific to the particular scene and weather conditions, and therefore have limited generalization capability, i.e., the defogging capability of the model will be significantly reduced once the environment has changed significantly. The deep learning-based method trains a defogging model through a large amount of training data, and tests on test data by using the trained model. The deep learning-based method can be divided into two types, one is to recover the foggy image indirectly through the parameters in the network learning atmosphere model, and the other is to directly output the foggy image by taking the foggy image as input from end to end by using the deep learning network.

However, these learning-based approaches still suffer from drawbacks: 1. most of the input of the deep learning-based method is that a single image (RGB image) is subjected to training test by using an atmospheric scattering model, however, two key parameters in the model need to be estimated simultaneously, and thus the problems of discomfort and poor generalization capability are caused; 2. in order to solve the ill-posed problem and to improve the generalization ability, more and more defogging methods based on a plurality of images are presented, in which the method using polarized images of different polarization angles can make full use of scene information to obtain good effect, and then most of these methods are based on the assumption that transmitted light is not significantly polarized or require a specific cue such as sky area or similar object, resulting in deterioration of defogging effect of the real world foggy image.

Disclosure of Invention

The invention provides a polarized image defogging method based on 3D convolution for solving the defects in the prior art, so as to improve the quality of a shot image in a foggy environment and the defogging effect in a complex and changeable scene, thereby meeting the requirements of pictures required by high-level visual tasks and providing clearer images for a plurality of high-level visual tasks.

In order to solve the technical problems, the invention adopts the following technical scheme:

the invention discloses a polarized image defogging method based on 3D convolution, which is characterized by comprising the following steps:

step 1, acquiring a synthesized polarized image dataset;

step 1.1, acquiring a haze-free image J (z) with a scene depth map d (z) and a semantic segmentation map S;

step 1.2, scattering factor beta to the atmosphere in a certain range, global atmosphere light A _∞ And the polarization degree of global atmospheric light DoP _A Random assignment, thereby generating a foggy day image I (z) at pixel point z using equation (1):

I(z)＝T(z)+A(z)＝J(z)t(z)+A _∞ (1-t(z)) (1)

in the formula (1), z represents the spatial coordinates of the pixel, T (z), a (z) represents the transmitted light and the atmospheric light at the pixel point z, J (z) represents the haze-free image, T (z) represents the transmission diagram at the pixel point z, and T (z) =e ^-βd(z) Where d (z) represents the scene depth map at pixel point z;

step 1.3, calculating the degree of polarization DoP of the transmitted light T _T =g (S), where S represents a semantic segmentation map and g represents a random mapping function;

step 1.4, calculating the polarization degree DoP of the foggy day image I by using the formula (2):

I·DoP＝T·DoP _T +A·DoP _A (2)

in the formula (2), doPA represents the degree of polarization of atmospheric light a;

step 1.5, calculating the polarization angle to be by using the formula (3)Is->

In the formula (3), the amino acid sequence of the compound,representing the direction of a polarizer for transmitting a component parallel to the plane of incidence, an

Step 2, constructing a polarized image defogging model based on 3D convolution based on a U-Net architecture, wherein the method comprises the following steps: a POL-3D encoder, a spatial redundancy reduction module SSR, and a POL decoder;

step 2.1, constructing the POL-3D encoder formed by M3D convolution layers, wherein the M-th 3D convolution layer sequentially comprises: a convolution layer, an instance normalization and a ReLU activation function layer; the convolution kernel size of each convolution layer is formed by a tuple containing 3 integers and respectively represents the convolution kernel sizes in three dimensions of depth, height and width;

2.2, carrying out data lifting and fusion operation on polarized images with different polarization angles, then carrying out fusion, so as to obtain a 4-dimensional high-dimensional characteristic image, inputting the 4-dimensional high-dimensional characteristic image into a POL-3D encoder, and sequentially passing through M3D convolution layers, so as to obtain M characteristic images with different channel numbers, polarization angles, heights and widths, wherein 4 dimensions of the high-dimensional characteristic image comprise the channel numbers, the polarization angles, the image heights and the image widths;

step 2.3, after the spatial redundancy reduction module SSR processes the M feature maps, M effective feature maps F are obtained ₁ ,F ₂ ,…,F _m ,…,F _M ；

Step 2.4, the POL decoder pairs M significance signature F ₁ ,F ₂ ,…,F _m ,…,F _M After processing, outputting a final defogging prediction graph;

step 3, training a polarized image defogging model based on 3D convolution;

training the defogging model of the polarized image based on the 3D convolution by using an ADAM optimizer based on the polarized image and the corresponding real defogging image, using an average absolute error L1Loss as a Loss function, and calculating the Loss between the defogging prediction image and the real defogging image to update model parameters until the Loss function converges, thereby obtaining the optimal defogging model of the polarized image based on the 3D convolution, and carrying out defogging treatment on the synthesized polarized foggy image and the real photographed polarized foggy image.

The polarized image defogging method based on 3D convolution is also characterized in that the space redundancy reduction module SSR in the step 2.3 is composed of M octave convolution layers and corresponding maximum pooling layers, each octave convolution layer comprises a preprocessing block, an octave convolution block and a post-processing block, and the steps are as follows:

step 2.3.1, the preprocessing block in the mth octave convolution layer is composed of two branches, wherein one branch is composed of a convolution layer and an example normalization layer in sequence and is used for decomposing high-frequency characteristics; the other branch sequentially consists of an average pooling layer, a convolution layer and an example normalization layer and is used for decomposing the low-frequency characteristics; the convolution kernels of the two convolution layers are the same in size, and the number of output channels is different; m is M;

the mth feature map respectively passes through two branches of the preprocessing block in the mth octave convolution layer and correspondingly outputs the mth high-frequency feature mapAnd mth low-frequency characteristic map +.>

Step 2.3.2, the octave convolution block in the mth octave convolution layer consists of four convolution layers, two example normalization layers, an average pooling layer and an up-sampling layer;

the mth high-frequency characteristic diagramThe mth high-frequency to high-frequency characteristic diagram ++is obtained after the treatment of the first convolution layer of the octave convolution blocks in the mth octave convolution layer>At the same time, the mth high-frequency characteristic diagram +.>After being processed by an average pooling layer, the processed data are input into a second convolution layer for processing to obtain an mth high-frequency to low-frequency characteristic diagram +.>

The mth low-frequency characteristic diagramAfter the treatment of the third convolution layer, the mth low-frequency to low-frequency characteristic diagram +.>At the same time, the mth low-frequency characteristic diagram +.>Sequentially inputting into a fourth convolution layer and an up-sampling layer for processing to obtain an mth low-frequency to high-frequency characteristic diagram ++>

Will beAnd->After fusion, input into the first example normalization layer, and output the mth high frequency feature map +.>

Will beAnd->After fusion, the obtained product is input into a second example normalization layer, and an mth low-frequency characteristic diagram is output

Step 2.3.3, the post-processing block in the mth octave convolution layer consists of two convolution layers, an up-sampling layer and an example normalization layer;

the mth high-frequency characteristic diagramAfter passing through a convolution layer of the post-processing block, the mth high-frequency to high-frequency characteristic diagram is obtained>

The mth low-frequency characteristic diagramAfter passing through another convolution layer of the post-processing block, the result is input into an up-sampling layer to obtain the m-th low-frequency to high-frequency characteristic diagram +.>

And->After fusion, the mth feature map ++is obtained after the treatment of an example normalization layer>

Mth feature mapInputting the m-th octave convolution layer to the corresponding maximum pooling for processing to obtain an m-th effective feature map F _m The method comprises the steps of carrying out a first treatment on the surface of the Thereby obtaining M effective feature maps F from the feature maps of M different channel numbers, polarization angles, heights and widths after the processing of the space redundancy reduction module SSR ₁ ,F ₂ ,…,F _m ,…,F _M 。

The POL decoder in the step 2.4 is composed of M deconvolution layers;

when m=1, 2, …, M-1, each deconvolution layer comprises one 2D convolution layer and one example normalization layer; when m=m, the deconvolution layer includes only one 2D convolution layer;

when m=m, the mth feature map F _M Processing by using bilinear interpolation function, inputting into 1 st deconvolution layer, and outputting 1 st deconvolution layer characteristic diagram F' ₁ ；

When m=m-1, M-2, …,2, the mth feature map F _m With the M-M deconvolution layer feature map F' _M-m After fusion, the fused feature images are processed by using a bilinear interpolation function and then are input into an M-m+1th deconvolution layer, and an M-m+1th deconvolution layer feature image F 'is output' _M-m+1 ；

When m=1, the Mth deconvolution layer processes the 1 st feature map to obtain an Mth deconvolution layer feature map F' _m And the final defogging prediction graph is obtained.

The electronic device of the invention comprises a memory and a processor, wherein the memory is used for storing a program for supporting the processor to execute any polarized image defogging method, and the processor is configured to execute the program stored in the memory.

The invention relates to a computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being run by a processor, performs the steps of any one of the polarized image defogging methods.

Compared with the prior art, the invention has the beneficial effects that:

1. according to the invention, by constructing the polarized image depth neural network based on the 3D convolution, using polarized images with different polarization angles as input and combining with the 3D convolution encoder, the problem that the model design is carried out by using the 2D convolution in the traditional multi-image defogging network, and the correlation between grouped images is ignored is solved, so that the quality of the restored defogging image is improved.

2. The polarization image depth neural network based on the 3D convolution constructed by the invention introduces polarization information, uses a plurality of polarization images with different polarization angles in the same scene, can acquire more abundant scene information, solves the problem of poor generalization capability caused by using a single image as an image feature extracted from input dependent training data, thereby improving the generalization capability of the defogging network and being capable of adapting to complex and changeable environments.

3. The polarization image depth neural network based on the 3D convolution constructed by the invention introduces the space redundancy reducing SRR module based on the octave convolution, decomposes the feature image output by the convolution layer into features with different space frequencies, can safely reduce the space resolution of a low-frequency group through information sharing between adjacent positions, and solves the space redundancy problem caused by encoder dense parameters, thereby reducing the parameter quantity of the network and ensuring that the network is lighter.

4. The polarization image depth neural network based on the 3D convolution constructed by the invention aggregates the characteristic image output by each layer of the encoder with the characteristic image after the spatial redundancy reduction module to obtain a more refined prediction result, thereby improving the defogging effect.

Drawings

FIG. 1 is a flow chart of defogging using polarized images based on a 3D convolutional defogging network in accordance with the present invention;

FIG. 2 is a schematic diagram of a polarized image defogging depth neural network based on 3D convolution;

FIG. 3 is a graph of defogging results of the method of the present invention and other defogging methods on a synthetic dataset;

fig. 4 is a graph of defogging results on a real world dataset by the present invention and other defogging methods.

Detailed Description

In this embodiment, a polarized image defogging method based on 3D convolution aims to solve the problems that an existing network lacks a polarized data set and extracts useful information from grouped polarized images (multiple images shot at different polarization angles at the same view angle), and a defogging model capable of effectively defogging without specific clues is obtained by constructing a polarized image depth neural network based on 3D convolution, so that the quality of shot images in a foggy environment can be improved, and the requirements of pictures required by advanced visual tasks are met. Specifically, as shown in fig. 1, the steps are as follows:

step 1, acquiring a synthesized polarized image dataset;

step 1.1, searching a proper original data set for synthesizing a polarized image data set;

the original data set needs to meet the following two requirements: (1) A haze-free image J (z) with a scene depth map d (z); (2) with a semantic segmentation map S;

I(z)＝T(z)+A(z)＝J(z)t(z)+A _∞ (1-t(z)) (1)

in the formula (1), z represents the spatial coordinates of the pixel, T (z), a (z) represents the transmitted light and the atmospheric light at the pixel point z, J (z) represents the haze-free image, T (z) represents the transmission diagram at the pixel point z, and T (z) =e ^-βd(z) Where d (z) represents the scene depth at pixel point z; in the present embodiment, the value range of the atmospheric scattering factor β is [0.01,0.02 ]]Global atmosphere light a _∞ The value range of (5) is [0.85,0.95 ]]Polarization degree DoP of global atmosphere light _A The value range of (2) is [0.05,0.4 ]]。

Step 1.3, calculating the degree of polarization DoP of the transmitted light T _T =g (S), where S represents a semantic segmentation map and is provided by the original dataset, g represents a random mapping function; in the present embodiment, the polarization degree DoP of transmitted light _T The value range of (5) is [0.025,0.2 ]]。

I·DoP＝T·DoP _T +A·DoP _A (2)

in formula (2), doP _A The polarization degree of the atmospheric light a; in this embodiment, I, T, A may be decomposed into I ^// And I ^⊥ ，T ^// And T ^⊥ ，A ^// And A ^⊥ Where// and t denote that the component is parallel or perpendicular to the plane of incidence. Thus, the degree of polarization of I, T, A can be defined as

Step 1.5, calculating the polarization angle to be by using the formula (3)Is->

In this embodiment, because the special requirements of the polarization-based synthetic dataset generation pipeline cannot be used to generate the dataset of the present invention from the Foggy image provided in the existing standard, and the Foggy Cityscapes-DBF dataset meets all the requirements, the present invention uses the Foggy image J and the depth map z provided by the dataset to generate the scattering coefficient beta and the global atmosphere light A _∞ And generating DOP by using semantic segmentation map S _T And finally calculating to obtain a fog pattern I.

step 2.1, constructing a POL-3D encoder consisting of M3D convolution layers, wherein the M-th 3D convolution layer sequentially comprises: a convolution layer, an instance normalization and a ReLU activation function layer; the convolution kernel size of each convolution layer is formed by a tuple containing 3 integers and respectively represents the convolution kernel sizes in three dimensions of depth, height and width; in this embodiment, the value of M is 5, and when m=1, 5, the convolution kernel size is set to (3, 3), and when m=2,..4, the convolution kernel size is set to (2, 3).

2.2, carrying out data lifting and fusion operation on polarized images with different polarization angles, then carrying out fusion, so as to obtain a 4-dimensional high-dimensional characteristic image, inputting the 4-dimensional high-dimensional characteristic image into a POL-3D encoder, and sequentially passing through M3D convolution layers, so as to obtain M characteristic images with different channel numbers, polarization angles, heights and widths, wherein 4 dimensions of the high-dimensional characteristic image comprise the channel numbers, the polarization angles, the image heights and the image widths; in this embodiment, using four polarized images of non-passing polarization angles (0 °, 45 °, 90 °, 135 °) as inputs, the high-dimensional feature map can be expressed as a four-dimensional tensor: initial values of c×p×h×w, C, P, H, W are 3,4, 256, respectively. The number of output channels of the 5 3D convolutional layers is set to 64, 128, 256, 512, respectively.

Step 2.3, constructing a space redundancy reduction module SSR, which consists of M octave convolution layers and corresponding maximum pooling layers, wherein each octave convolution layer comprises a preprocessing block First OctConv, an octave convolution block OctConv and a post-processing block Last OctConv; in this embodiment, the number of input/output channels of the 5 octave convolution layers is (64, 64), (128 ), (256,256), (512 ), and the convolution kernel sizes of the preprocessing block, the octave convolution block, and the post-processing block are (1, 1), (3, 3), (3, 3), respectively.

Step 2.3.1, the preprocessing block in the mth octave convolution layer is composed of two branches, wherein one branch is composed of a convolution layer and an example normalization layer in sequence and is used for decomposing high-frequency characteristics; the other branch sequentially consists of an average pooling layer, a convolution layer and an example normalization layer and is used for decomposing the low-frequency characteristics; the convolution kernels of the two convolution layers are the same in size, and the output channels are different in number; m is M; in this embodiment, the convolution kernels of the two convolution layers are (3, 3), the number of output channels is controlled by a factor α, and the number of output channels of the convolution layer that decomposes the high frequency characteristic is αc _out The number of output channels of the convolution layer which decomposes the low frequency characteristic is (1-alpha) c _out Alpha is 0.5, the convolution kernel size of the average pooling layer is (1, 2), and the convolution step size is (1, 2).

The mth feature map respectively passes through two branches of the preprocessing block in the mth octave convolution layer, and correspondingly outputs an mth high-frequency feature map and an mth low-frequency feature map;

the mth high-frequency characteristic diagramThe mth high-frequency to high-frequency characteristic diagram is obtained after the treatment of the first convolution layer in the mth octave convolution layers>At the same time, mth high-frequency characteristic diagram +.>After being processed by an average pooling layer, the processed data is input into a second convolution layer for processing to obtain an mth high-frequency to low-frequency characteristic diagram +.>

The mth low-frequency characteristic diagramRespectively processing the first convolution layer to obtain the m low-frequency to low-frequency characteristic diagram +.>At the same time, mth low-frequency characteristic diagram +.>Sequentially inputting a fourth convolution layer, and processing in an up-sampling layer to obtain an mth low-frequency to high-frequency characteristic diagram ++>In the present embodimentIn the example, the average pooled convolution kernel size is (1, 2) and the convolution step size is (1, 2). The up-sampled amplification factor is (1, 2) and the algorithm uses the nearest algorithm.

The mth low-frequency characteristic diagramAfter passing through another convolution layer of the post-processing block, is input into oneIn the up-sampling layer, the mth low-frequency to high-frequency characteristic diagram is obtained>

Mth feature mapInputting the m-th octave convolution layer to the corresponding maximum pooling for processing to obtain an m-th effective feature map F _m The method comprises the steps of carrying out a first treatment on the surface of the Thereby obtaining M effective feature maps F from the feature maps of M different channel numbers, polarization angles, heights and widths after the processing of the space redundancy reduction module SSR ₁ ,F ₂ ,…,F _m ,…,F _M ；

Step 2.4, constructing a POL decoder consisting of M deconvolution layers; combining the low, middle and high layer feature maps generated by the POL-3D encoder with the output of each deconvolution layer after passing through a spatial redundancy reduction module by using a bilinear interpolation function to output a final defogging prediction map;

when m=1, 2, …, M-1, each deconvolution layer comprises one 2D convolution layer and one example normalization layer; when m=m, the deconvolution layer includes only one 2D convolution layer; in this embodiment, the number of input/output channels of the M-ary 5,5 deconvolution layers is (512 ), (1024,256), (512,128), (256, 64) and (128,3), respectively, the convolution kernel size is 3, and the convolution step size is 1.

When m=m, the mth feature map F _M Processing by bilinear interpolation function and inputting to 1 st inverseIn the convolution layer, the 1 st deconvolution layer characteristic diagram F 'is output' ₁ ；

When m=m-1, M-2, …,1, the mth feature map F _m With the M-M deconvolution layer feature map F' _M-m After fusion, the fused feature images are processed by using a bilinear interpolation function and then are input into an M-m+1th deconvolution layer, and an M-m+1th deconvolution layer feature image F 'is output' _M-m+1 ；

When m=1, the Mth deconvolution layer is used for processing the 1 st feature map, and the obtained Mth deconvolution layer feature map F' _M And the final defogging prediction graph is obtained.

Step 3, training a polarized image defogging model based on 3D convolution;

based on the polarized image and the corresponding real haze-free image, training a polarized image haze removal model based on the 3D convolution by using an ADAM optimizer, using an average absolute error (L1 Loss) as a Loss function, and calculating Loss between a haze removal prediction graph and the real haze-free graph to update model parameters until the Loss function converges, so that an optimal polarized image haze removal model based on the 3D convolution is obtained, and performing haze removal processing on the synthesized polarized haze image and the real photographed polarized haze image. In this embodiment, in the training phase, the network trains 300 epochs, and the initial learning rate is set to 1e ^-4 Attenuation was 0.5 per 50 epochs.

In this embodiment, 8925 polarized images with 4 different polarization angles (0 °, 45 °, 90 °, 135 °) and their corresponding real haze-free images generated by the polarized image generation channel are used for training, in the training process, the size of the input polarized image is randomly cut into 256×256, and the haze-free image output by the haze-free model based on 3D convolution is usedAnd J, carrying out L1loss calculation, and carrying out training by combining the loss obtained by calculation with an ADAM optimizer to guide a network so as to obtain a defogging model based on the 3D convolution and using a polarized image.

In this embodiment, an electronic device includes a memory for storing a program supporting the processor to execute the above method, and a processor configured to execute the program stored in the memory.

In this embodiment, a computer-readable storage medium stores a computer program that, when executed by a processor, performs the steps of the method described above.

Table 1 shows that the polarized image defogging method based on 3D convolution of the invention takes 'PSNR' and 'SSIM' as evaluation indexes respectively. For fair comparison, all defogging methods were retrained on the composite dataset, and during the test, the present invention and other defogging methods used polarized images and ordinary RGB images, respectively. The "PSNR", peak signal-to-noise ratio, is the ratio of the maximum power of a signal to the noise power that may affect its accuracy of representation, with a larger value indicating less distortion in image defogging. "SSIM", i.e. structural similarity, the index examines the similarity of images in terms of brightness, contrast, and structure, with a range of values of 0,1, the larger the value, the more similar the defogging image is to the real defogging image. From the quantitative analysis of Table 1, it can be seen that the method of the present invention achieves the best results on both criteria.

TABLE 1

Methods	PSNR(↑)	SSIM(↑)
			AOD-Net	20.58	0.80
PFFNet	28.63	0.89
			GridDehazeNet	29.23	0.91
4KDehazing-Net	29.47	0.90
			FFA-Net	29.93	0.92
GCANet	29.98	0.91
			Ours	30.21	0.92

FIG. 3 is a graph showing the results of the 3D convolution-based polarized image defogging method of the present invention and the current other defogging methods on a composite dataset. Wherein, ours represents the polarized image defogging method based on 3D convolution; AOD-Net, based on restating the atmospheric scattering model, proposes to use an end-to-end trainable network for defogging and to replace two key unknowns in the atmospheric scattering model with one; inspiring the PFFNT end-to-end defogging idea, adopting a network based on a U-Net architecture, and adding a ResNet-based conversion module between an encoder and a decoder to improve complex feature learning of different layers; gridDehazent is inspired by a grid-based image segmentation network, and another end-to-end trainable network independent of an atmospheric model is proposed, comprising preprocessing, an attention-based multi-scale Backbone and a post-processing module, and an attention mechanism helps a Backbone module to adjust multi-scale feature fusion contribution by activating/deactivating a part of GridDehazent; 4KDehazing-Net proposes a 4K resolution image defogging framework of multi-directional guided bilateral learning, which can process 4K (3840×2160) resolution images at a speed of 125 frames/second; FFA-Net proposes to combine channel and pixel attention together and emphasize residual learning and feature fusion; GCANet uses smooth hole convolution and gating networks to aggregate context information with multi-level features.

Fig. 4 is a graph showing the results of the 3D convolution-based polarized image defogging method of the present invention and the current other defogging methods on a real dataset. It can be seen that the method of the present invention represents a significant advantage over the real dataset, as opposed to the comparison over the synthetic data. This is because real world scattering is a complex physical process with spatial variations, which makes it difficult to learn the effects of spatial variations from common RGB images, and the lack of physical-based learning features makes these methods prone to artifacts for pixels that are subject to large spatial variations. In contrast, the method of the present invention alleviates this problem by mining the correlation between polarized images of four different polarization angles.

Claims

1. The polarized image defogging method based on 3D convolution is characterized by comprising the following steps of:

step 1, acquiring a synthesized polarized image dataset;

I(z)＝T(z)+A(z)＝J(z)t(z)+A _∞ (1-t(z)) (1)

in the formula (1), z represents the spatial coordinates of the pixel, T (z), and A (z) represents the transmitted light at the pixel point z, respectivelyAnd atmospheric light, J (z) represents a haze-free image, t (z) represents a transmission diagram at a pixel point z, and t (z) =e ^-βd (z), wherein d (z) represents a scene depth map at pixel point z;

I·DoP＝T·DoP _T +A·DoP _A (2)

in formula (2), doP _A The polarization degree of the atmospheric light a;

step 1.5, calculating the polarization angle to be by using the formula (3)Is->

step 2.3, after the spatial redundancy reduction module SSR processes the M feature maps, M effective feature maps F are obtained ₁ ，F ₂ ，...，F _m ，...，F _M ；

step 3, training a polarized image defogging model based on 3D convolution;

2. The polarized image defogging method based on 3D convolution according to claim 1, wherein the spatial redundancy reduction module SSR in the step 2.3 is composed of M octave convolution layers and corresponding max pooling layers, each octave convolution layer includes a preprocessing block, an octave convolution block and a post-processing block, and is processed according to the following steps:

The mth low-frequency characteristic diagramAfter the third convolution layer treatment, the m low-frequency to low-frequency characteristic diagram is obtainedAt the same time, the mth low-frequency characteristic diagram +.>Sequentially inputting into a fourth convolution layer and an up-sampling layer for processing to obtain an mth low-frequency to high-frequency characteristic diagram ++>

Will beAnd->After fusion, the obtained product is input into a first example normalization layer, and an mth high-frequency characteristic diagram is output

Will beAnd->After fusion, the signals are input into a normalization layer of a second example, and an mth low-frequency characteristic diagram is output +.>

3. The polarized image defogging method based on 3D convolution according to claim 2, wherein the POL decoder in step 2.4 is composed of M deconvolution layers;

4. An electronic device comprising a memory and a processor, wherein the memory is configured to store a program for supporting the processor to perform the polarized image defogging method of any of claims 1-3, the processor being configured to execute the program stored in the memory.

5. A computer readable storage medium having stored thereon a computer program, characterized in that the computer program when executed by a processor performs the steps of the polarized image defogging method of any of the claims 1-3.