CN116452450A - Polarized image defogging method based on 3D convolution - Google Patents
Polarized image defogging method based on 3D convolution Download PDFInfo
- Publication number
- CN116452450A CN116452450A CN202310390770.0A CN202310390770A CN116452450A CN 116452450 A CN116452450 A CN 116452450A CN 202310390770 A CN202310390770 A CN 202310390770A CN 116452450 A CN116452450 A CN 116452450A
- Authority
- CN
- China
- Prior art keywords
- convolution
- layer
- mth
- image
- defogging
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 56
- 230000010287 polarization Effects 0.000 claims abstract description 44
- 238000012549 training Methods 0.000 claims abstract description 14
- 238000010586 diagram Methods 0.000 claims description 47
- 238000010606 normalization Methods 0.000 claims description 27
- 238000012545 processing Methods 0.000 claims description 22
- 230000004927 fusion Effects 0.000 claims description 20
- 230000006870 function Effects 0.000 claims description 19
- 238000011176 pooling Methods 0.000 claims description 16
- 238000012805 post-processing Methods 0.000 claims description 14
- 230000009467 reduction Effects 0.000 claims description 13
- 238000005070 sampling Methods 0.000 claims description 12
- 238000007781 pre-processing Methods 0.000 claims description 11
- 230000008569 process Effects 0.000 claims description 7
- 230000011218 segmentation Effects 0.000 claims description 7
- 238000004590 computer program Methods 0.000 claims description 5
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 claims description 4
- 230000004913 activation Effects 0.000 claims description 3
- 125000003275 alpha amino acid group Chemical group 0.000 claims description 3
- 230000005540 biological transmission Effects 0.000 claims description 3
- 150000001875 compounds Chemical class 0.000 claims description 3
- 238000013507 mapping Methods 0.000 claims description 3
- 238000013528 artificial neural network Methods 0.000 abstract description 8
- 230000000694 effects Effects 0.000 abstract description 6
- 230000000007 visual effect Effects 0.000 abstract description 5
- 238000013135 deep learning Methods 0.000 description 5
- 238000004422 calculation algorithm Methods 0.000 description 4
- 238000012360 testing method Methods 0.000 description 4
- 238000004364 calculation method Methods 0.000 description 2
- 239000002131 composite material Substances 0.000 description 2
- 238000003709 image segmentation Methods 0.000 description 2
- 239000002245 particle Substances 0.000 description 2
- WTDRDQBEARUVNC-LURJTMIESA-N L-DOPA Chemical compound OC(=O)[C@@H](N)CC1=CC=C(O)C(O)=C1 WTDRDQBEARUVNC-LURJTMIESA-N 0.000 description 1
- 230000003213 activating effect Effects 0.000 description 1
- 230000003321 amplification Effects 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 230000002146 bilateral effect Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 230000006866 deterioration Effects 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 238000005065 mining Methods 0.000 description 1
- 238000003199 nucleic acid amplification method Methods 0.000 description 1
- 238000004445 quantitative analysis Methods 0.000 description 1
- 230000002194 synthesizing effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T5/00—Image enhancement or restoration
- G06T5/73—Deblurring; Sharpening
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
- G06N3/0455—Auto-encoder networks; Encoder-decoder networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0464—Convolutional networks [CNN, ConvNet]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T5/00—Image enhancement or restoration
- G06T5/50—Image enhancement or restoration using two or more images, e.g. averaging or subtraction
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20212—Image combination
- G06T2207/20221—Image fusion; Image merging
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02A—TECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
- Y02A90/00—Technologies having an indirect contribution to adaptation to climate change
- Y02A90/10—Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Biophysics (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Biomedical Technology (AREA)
- Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Life Sciences & Earth Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Image Processing (AREA)
Abstract
The invention discloses a polarized image defogging method based on 3D convolution, which comprises the following steps: 1. acquiring a synthesized polarized image dataset by utilizing a polarized image generation channel; 2. constructing a depth convolution neural network based on 3D convolution and using polarized images, taking four polarized images with different polarization angles as input, and training the depth convolution neural network to obtain a defogging model; 3. and carrying out defogging treatment on the polarized image to be defogged by using the trained model to obtain a recovered defogged image. The invention can realize defogging of the polarized image based on 3D convolution, so as to effectively improve defogging effect under complex and changeable scenes, thereby providing clearer images for a plurality of advanced visual tasks.
Description
Technical Field
The invention belongs to the fields of computer vision, image processing and analysis, and particularly relates to a defogging method based on a 3D convolution network by using polarized images.
Background
Fog or haze is a common atmospheric phenomenon, and under such weather conditions, outdoor air contains a large amount of tiny suspended particles, the tiny particles can refract and scatter atmospheric light, and light rays after refraction and scattering can be mixed with light rays after reflection of a target scene to be observed, so that the visibility of the scene is greatly reduced, the contrast of an image captured by an outdoor image acquisition device is obviously reduced, and even the phenomena of image color distortion and great loss of details are caused. Some advanced computer vision tasks, such as object detection, image segmentation, etc., require high quality images as input. However, in severe weather conditions with fog or haze, the quality of the acquired image is degraded, greatly affecting the processing of these visual tasks. Therefore, image defogging work is important.
In recent years, image defogging has been increasingly emphasized by researchers, and many image defogging models with good performance have been proposed. Currently, existing frames fall broadly into two categories: traditional defogging method based on manual priori knowledge and defogging method based on deep learning algorithm. The traditional defogging method relies on manual priori knowledge based on statistics from clear images, and utilizes an atmospheric scattering model to restore a defogging image. There are well known proposed dark channel prior algorithms (DCPs) based on the assumption that the value of a pixel in a haze free image in at least one color channel is close to 0. While conventional defogging methods have advanced to some extent, these assumptions and prior knowledge are specific to the particular scene and weather conditions, and therefore have limited generalization capability, i.e., the defogging capability of the model will be significantly reduced once the environment has changed significantly. The deep learning-based method trains a defogging model through a large amount of training data, and tests on test data by using the trained model. The deep learning-based method can be divided into two types, one is to recover the foggy image indirectly through the parameters in the network learning atmosphere model, and the other is to directly output the foggy image by taking the foggy image as input from end to end by using the deep learning network.
However, these learning-based approaches still suffer from drawbacks: 1. most of the input of the deep learning-based method is that a single image (RGB image) is subjected to training test by using an atmospheric scattering model, however, two key parameters in the model need to be estimated simultaneously, and thus the problems of discomfort and poor generalization capability are caused; 2. in order to solve the ill-posed problem and to improve the generalization ability, more and more defogging methods based on a plurality of images are presented, in which the method using polarized images of different polarization angles can make full use of scene information to obtain good effect, and then most of these methods are based on the assumption that transmitted light is not significantly polarized or require a specific cue such as sky area or similar object, resulting in deterioration of defogging effect of the real world foggy image.
Disclosure of Invention
The invention provides a polarized image defogging method based on 3D convolution for solving the defects in the prior art, so as to improve the quality of a shot image in a foggy environment and the defogging effect in a complex and changeable scene, thereby meeting the requirements of pictures required by high-level visual tasks and providing clearer images for a plurality of high-level visual tasks.
In order to solve the technical problems, the invention adopts the following technical scheme:
the invention discloses a polarized image defogging method based on 3D convolution, which is characterized by comprising the following steps:
step 1, acquiring a synthesized polarized image dataset;
step 1.1, acquiring a haze-free image J (z) with a scene depth map d (z) and a semantic segmentation map S;
step 1.2, scattering factor beta to the atmosphere in a certain range, global atmosphere light A ∞ And the polarization degree of global atmospheric light DoP A Random assignment, thereby generating a foggy day image I (z) at pixel point z using equation (1):
I(z)=T(z)+A(z)=J(z)t(z)+A ∞ (1-t(z)) (1)
in the formula (1), z represents the spatial coordinates of the pixel, T (z), a (z) represents the transmitted light and the atmospheric light at the pixel point z, J (z) represents the haze-free image, T (z) represents the transmission diagram at the pixel point z, and T (z) =e -βd(z) Where d (z) represents the scene depth map at pixel point z;
step 1.3, calculating the degree of polarization DoP of the transmitted light T T =g (S), where S represents a semantic segmentation map and g represents a random mapping function;
step 1.4, calculating the polarization degree DoP of the foggy day image I by using the formula (2):
I·DoP=T·DoP T +A·DoP A (2)
in the formula (2), doPA represents the degree of polarization of atmospheric light a;
step 1.5, calculating the polarization angle to be by using the formula (3)Is->
In the formula (3), the amino acid sequence of the compound,representing the direction of a polarizer for transmitting a component parallel to the plane of incidence, an
Step 2, constructing a polarized image defogging model based on 3D convolution based on a U-Net architecture, wherein the method comprises the following steps: a POL-3D encoder, a spatial redundancy reduction module SSR, and a POL decoder;
step 2.1, constructing the POL-3D encoder formed by M3D convolution layers, wherein the M-th 3D convolution layer sequentially comprises: a convolution layer, an instance normalization and a ReLU activation function layer; the convolution kernel size of each convolution layer is formed by a tuple containing 3 integers and respectively represents the convolution kernel sizes in three dimensions of depth, height and width;
2.2, carrying out data lifting and fusion operation on polarized images with different polarization angles, then carrying out fusion, so as to obtain a 4-dimensional high-dimensional characteristic image, inputting the 4-dimensional high-dimensional characteristic image into a POL-3D encoder, and sequentially passing through M3D convolution layers, so as to obtain M characteristic images with different channel numbers, polarization angles, heights and widths, wherein 4 dimensions of the high-dimensional characteristic image comprise the channel numbers, the polarization angles, the image heights and the image widths;
step 2.3, after the spatial redundancy reduction module SSR processes the M feature maps, M effective feature maps F are obtained 1 ,F 2 ,…,F m ,…,F M ;
Step 2.4, the POL decoder pairs M significance signature F 1 ,F 2 ,…,F m ,…,F M After processing, outputting a final defogging prediction graph;
step 3, training a polarized image defogging model based on 3D convolution;
training the defogging model of the polarized image based on the 3D convolution by using an ADAM optimizer based on the polarized image and the corresponding real defogging image, using an average absolute error L1Loss as a Loss function, and calculating the Loss between the defogging prediction image and the real defogging image to update model parameters until the Loss function converges, thereby obtaining the optimal defogging model of the polarized image based on the 3D convolution, and carrying out defogging treatment on the synthesized polarized foggy image and the real photographed polarized foggy image.
The polarized image defogging method based on 3D convolution is also characterized in that the space redundancy reduction module SSR in the step 2.3 is composed of M octave convolution layers and corresponding maximum pooling layers, each octave convolution layer comprises a preprocessing block, an octave convolution block and a post-processing block, and the steps are as follows:
step 2.3.1, the preprocessing block in the mth octave convolution layer is composed of two branches, wherein one branch is composed of a convolution layer and an example normalization layer in sequence and is used for decomposing high-frequency characteristics; the other branch sequentially consists of an average pooling layer, a convolution layer and an example normalization layer and is used for decomposing the low-frequency characteristics; the convolution kernels of the two convolution layers are the same in size, and the number of output channels is different; m is M;
the mth feature map respectively passes through two branches of the preprocessing block in the mth octave convolution layer and correspondingly outputs the mth high-frequency feature mapAnd mth low-frequency characteristic map +.>
Step 2.3.2, the octave convolution block in the mth octave convolution layer consists of four convolution layers, two example normalization layers, an average pooling layer and an up-sampling layer;
the mth high-frequency characteristic diagramThe mth high-frequency to high-frequency characteristic diagram ++is obtained after the treatment of the first convolution layer of the octave convolution blocks in the mth octave convolution layer>At the same time, the mth high-frequency characteristic diagram +.>After being processed by an average pooling layer, the processed data are input into a second convolution layer for processing to obtain an mth high-frequency to low-frequency characteristic diagram +.>
The mth low-frequency characteristic diagramAfter the treatment of the third convolution layer, the mth low-frequency to low-frequency characteristic diagram +.>At the same time, the mth low-frequency characteristic diagram +.>Sequentially inputting into a fourth convolution layer and an up-sampling layer for processing to obtain an mth low-frequency to high-frequency characteristic diagram ++>
Will beAnd->After fusion, input into the first example normalization layer, and output the mth high frequency feature map +.>
Will beAnd->After fusion, the obtained product is input into a second example normalization layer, and an mth low-frequency characteristic diagram is output
Step 2.3.3, the post-processing block in the mth octave convolution layer consists of two convolution layers, an up-sampling layer and an example normalization layer;
the mth high-frequency characteristic diagramAfter passing through a convolution layer of the post-processing block, the mth high-frequency to high-frequency characteristic diagram is obtained>
The mth low-frequency characteristic diagramAfter passing through another convolution layer of the post-processing block, the result is input into an up-sampling layer to obtain the m-th low-frequency to high-frequency characteristic diagram +.>
And->After fusion, the mth feature map ++is obtained after the treatment of an example normalization layer>
Mth feature mapInputting the m-th octave convolution layer to the corresponding maximum pooling for processing to obtain an m-th effective feature map F m The method comprises the steps of carrying out a first treatment on the surface of the Thereby obtaining M effective feature maps F from the feature maps of M different channel numbers, polarization angles, heights and widths after the processing of the space redundancy reduction module SSR 1 ,F 2 ,…,F m ,…,F M 。
The POL decoder in the step 2.4 is composed of M deconvolution layers;
when m=1, 2, …, M-1, each deconvolution layer comprises one 2D convolution layer and one example normalization layer; when m=m, the deconvolution layer includes only one 2D convolution layer;
when m=m, the mth feature map F M Processing by using bilinear interpolation function, inputting into 1 st deconvolution layer, and outputting 1 st deconvolution layer characteristic diagram F' 1 ;
When m=m-1, M-2, …,2, the mth feature map F m With the M-M deconvolution layer feature map F' M-m After fusion, the fused feature images are processed by using a bilinear interpolation function and then are input into an M-m+1th deconvolution layer, and an M-m+1th deconvolution layer feature image F 'is output' M-m+1 ;
When m=1, the Mth deconvolution layer processes the 1 st feature map to obtain an Mth deconvolution layer feature map F' m And the final defogging prediction graph is obtained.
The electronic device of the invention comprises a memory and a processor, wherein the memory is used for storing a program for supporting the processor to execute any polarized image defogging method, and the processor is configured to execute the program stored in the memory.
The invention relates to a computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being run by a processor, performs the steps of any one of the polarized image defogging methods.
Compared with the prior art, the invention has the beneficial effects that:
1. according to the invention, by constructing the polarized image depth neural network based on the 3D convolution, using polarized images with different polarization angles as input and combining with the 3D convolution encoder, the problem that the model design is carried out by using the 2D convolution in the traditional multi-image defogging network, and the correlation between grouped images is ignored is solved, so that the quality of the restored defogging image is improved.
2. The polarization image depth neural network based on the 3D convolution constructed by the invention introduces polarization information, uses a plurality of polarization images with different polarization angles in the same scene, can acquire more abundant scene information, solves the problem of poor generalization capability caused by using a single image as an image feature extracted from input dependent training data, thereby improving the generalization capability of the defogging network and being capable of adapting to complex and changeable environments.
3. The polarization image depth neural network based on the 3D convolution constructed by the invention introduces the space redundancy reducing SRR module based on the octave convolution, decomposes the feature image output by the convolution layer into features with different space frequencies, can safely reduce the space resolution of a low-frequency group through information sharing between adjacent positions, and solves the space redundancy problem caused by encoder dense parameters, thereby reducing the parameter quantity of the network and ensuring that the network is lighter.
4. The polarization image depth neural network based on the 3D convolution constructed by the invention aggregates the characteristic image output by each layer of the encoder with the characteristic image after the spatial redundancy reduction module to obtain a more refined prediction result, thereby improving the defogging effect.
Drawings
FIG. 1 is a flow chart of defogging using polarized images based on a 3D convolutional defogging network in accordance with the present invention;
FIG. 2 is a schematic diagram of a polarized image defogging depth neural network based on 3D convolution;
FIG. 3 is a graph of defogging results of the method of the present invention and other defogging methods on a synthetic dataset;
fig. 4 is a graph of defogging results on a real world dataset by the present invention and other defogging methods.
Detailed Description
In this embodiment, a polarized image defogging method based on 3D convolution aims to solve the problems that an existing network lacks a polarized data set and extracts useful information from grouped polarized images (multiple images shot at different polarization angles at the same view angle), and a defogging model capable of effectively defogging without specific clues is obtained by constructing a polarized image depth neural network based on 3D convolution, so that the quality of shot images in a foggy environment can be improved, and the requirements of pictures required by advanced visual tasks are met. Specifically, as shown in fig. 1, the steps are as follows:
step 1, acquiring a synthesized polarized image dataset;
step 1.1, searching a proper original data set for synthesizing a polarized image data set;
the original data set needs to meet the following two requirements: (1) A haze-free image J (z) with a scene depth map d (z); (2) with a semantic segmentation map S;
step 1.2, scattering factor beta to the atmosphere in a certain range, global atmosphere light A ∞ And the polarization degree of global atmospheric light DoP A Random assignment, thereby generating a foggy day image I (z) at pixel point z using equation (1):
I(z)=T(z)+A(z)=J(z)t(z)+A ∞ (1-t(z)) (1)
in the formula (1), z represents the spatial coordinates of the pixel, T (z), a (z) represents the transmitted light and the atmospheric light at the pixel point z, J (z) represents the haze-free image, T (z) represents the transmission diagram at the pixel point z, and T (z) =e -βd(z) Where d (z) represents the scene depth at pixel point z; in the present embodiment, the value range of the atmospheric scattering factor β is [0.01,0.02 ]]Global atmosphere light a ∞ The value range of (5) is [0.85,0.95 ]]Polarization degree DoP of global atmosphere light A The value range of (2) is [0.05,0.4 ]]。
Step 1.3, calculating the degree of polarization DoP of the transmitted light T T =g (S), where S represents a semantic segmentation map and is provided by the original dataset, g represents a random mapping function; in the present embodiment, the polarization degree DoP of transmitted light T The value range of (5) is [0.025,0.2 ]]。
Step 1.4, calculating the polarization degree DoP of the foggy day image I by using the formula (2):
I·DoP=T·DoP T +A·DoP A (2)
in formula (2), doP A The polarization degree of the atmospheric light a; in this embodiment, I, T, A may be decomposed into I // And I ⊥ ,T // And T ⊥ ,A // And A ⊥ Where// and t denote that the component is parallel or perpendicular to the plane of incidence. Thus, the degree of polarization of I, T, A can be defined as
Step 1.5, calculating the polarization angle to be by using the formula (3)Is->
In the formula (3), the amino acid sequence of the compound,representing the direction of a polarizer for transmitting a component parallel to the plane of incidence, an
In this embodiment, because the special requirements of the polarization-based synthetic dataset generation pipeline cannot be used to generate the dataset of the present invention from the Foggy image provided in the existing standard, and the Foggy Cityscapes-DBF dataset meets all the requirements, the present invention uses the Foggy image J and the depth map z provided by the dataset to generate the scattering coefficient beta and the global atmosphere light A ∞ And generating DOP by using semantic segmentation map S T And finally calculating to obtain a fog pattern I.
Step 2, constructing a polarized image defogging model based on 3D convolution based on a U-Net architecture, wherein the method comprises the following steps: a POL-3D encoder, a spatial redundancy reduction module SSR, and a POL decoder;
step 2.1, constructing a POL-3D encoder consisting of M3D convolution layers, wherein the M-th 3D convolution layer sequentially comprises: a convolution layer, an instance normalization and a ReLU activation function layer; the convolution kernel size of each convolution layer is formed by a tuple containing 3 integers and respectively represents the convolution kernel sizes in three dimensions of depth, height and width; in this embodiment, the value of M is 5, and when m=1, 5, the convolution kernel size is set to (3, 3), and when m=2,..4, the convolution kernel size is set to (2, 3).
2.2, carrying out data lifting and fusion operation on polarized images with different polarization angles, then carrying out fusion, so as to obtain a 4-dimensional high-dimensional characteristic image, inputting the 4-dimensional high-dimensional characteristic image into a POL-3D encoder, and sequentially passing through M3D convolution layers, so as to obtain M characteristic images with different channel numbers, polarization angles, heights and widths, wherein 4 dimensions of the high-dimensional characteristic image comprise the channel numbers, the polarization angles, the image heights and the image widths; in this embodiment, using four polarized images of non-passing polarization angles (0 °, 45 °, 90 °, 135 °) as inputs, the high-dimensional feature map can be expressed as a four-dimensional tensor: initial values of c×p×h×w, C, P, H, W are 3,4, 256, respectively. The number of output channels of the 5 3D convolutional layers is set to 64, 128, 256, 512, respectively.
Step 2.3, constructing a space redundancy reduction module SSR, which consists of M octave convolution layers and corresponding maximum pooling layers, wherein each octave convolution layer comprises a preprocessing block First OctConv, an octave convolution block OctConv and a post-processing block Last OctConv; in this embodiment, the number of input/output channels of the 5 octave convolution layers is (64, 64), (128 ), (256,256), (512 ), and the convolution kernel sizes of the preprocessing block, the octave convolution block, and the post-processing block are (1, 1), (3, 3), (3, 3), respectively.
Step 2.3.1, the preprocessing block in the mth octave convolution layer is composed of two branches, wherein one branch is composed of a convolution layer and an example normalization layer in sequence and is used for decomposing high-frequency characteristics; the other branch sequentially consists of an average pooling layer, a convolution layer and an example normalization layer and is used for decomposing the low-frequency characteristics; the convolution kernels of the two convolution layers are the same in size, and the output channels are different in number; m is M; in this embodiment, the convolution kernels of the two convolution layers are (3, 3), the number of output channels is controlled by a factor α, and the number of output channels of the convolution layer that decomposes the high frequency characteristic is αc out The number of output channels of the convolution layer which decomposes the low frequency characteristic is (1-alpha) c out Alpha is 0.5, the convolution kernel size of the average pooling layer is (1, 2), and the convolution step size is (1, 2).
The mth feature map respectively passes through two branches of the preprocessing block in the mth octave convolution layer, and correspondingly outputs an mth high-frequency feature map and an mth low-frequency feature map;
step 2.3.2, the octave convolution block in the mth octave convolution layer consists of four convolution layers, two example normalization layers, an average pooling layer and an up-sampling layer;
the mth high-frequency characteristic diagramThe mth high-frequency to high-frequency characteristic diagram is obtained after the treatment of the first convolution layer in the mth octave convolution layers>At the same time, mth high-frequency characteristic diagram +.>After being processed by an average pooling layer, the processed data is input into a second convolution layer for processing to obtain an mth high-frequency to low-frequency characteristic diagram +.>
The mth low-frequency characteristic diagramRespectively processing the first convolution layer to obtain the m low-frequency to low-frequency characteristic diagram +.>At the same time, mth low-frequency characteristic diagram +.>Sequentially inputting a fourth convolution layer, and processing in an up-sampling layer to obtain an mth low-frequency to high-frequency characteristic diagram ++>In the present embodimentIn the example, the average pooled convolution kernel size is (1, 2) and the convolution step size is (1, 2). The up-sampled amplification factor is (1, 2) and the algorithm uses the nearest algorithm.
Will beAnd->After fusion, input into the first example normalization layer, and output the mth high frequency feature map +.>
Will beAnd->After fusion, the obtained product is input into a second example normalization layer, and an mth low-frequency characteristic diagram is output
Step 2.3.3, the post-processing block in the mth octave convolution layer consists of two convolution layers, an up-sampling layer and an example normalization layer;
the mth high-frequency characteristic diagramAfter passing through a convolution layer of the post-processing block, the mth high-frequency to high-frequency characteristic diagram is obtained>
The mth low-frequency characteristic diagramAfter passing through another convolution layer of the post-processing block, is input into oneIn the up-sampling layer, the mth low-frequency to high-frequency characteristic diagram is obtained>
And->After fusion, the mth feature map ++is obtained after the treatment of an example normalization layer>
Mth feature mapInputting the m-th octave convolution layer to the corresponding maximum pooling for processing to obtain an m-th effective feature map F m The method comprises the steps of carrying out a first treatment on the surface of the Thereby obtaining M effective feature maps F from the feature maps of M different channel numbers, polarization angles, heights and widths after the processing of the space redundancy reduction module SSR 1 ,F 2 ,…,F m ,…,F M ;
Step 2.4, constructing a POL decoder consisting of M deconvolution layers; combining the low, middle and high layer feature maps generated by the POL-3D encoder with the output of each deconvolution layer after passing through a spatial redundancy reduction module by using a bilinear interpolation function to output a final defogging prediction map;
when m=1, 2, …, M-1, each deconvolution layer comprises one 2D convolution layer and one example normalization layer; when m=m, the deconvolution layer includes only one 2D convolution layer; in this embodiment, the number of input/output channels of the M-ary 5,5 deconvolution layers is (512 ), (1024,256), (512,128), (256, 64) and (128,3), respectively, the convolution kernel size is 3, and the convolution step size is 1.
When m=m, the mth feature map F M Processing by bilinear interpolation function and inputting to 1 st inverseIn the convolution layer, the 1 st deconvolution layer characteristic diagram F 'is output' 1 ;
When m=m-1, M-2, …,1, the mth feature map F m With the M-M deconvolution layer feature map F' M-m After fusion, the fused feature images are processed by using a bilinear interpolation function and then are input into an M-m+1th deconvolution layer, and an M-m+1th deconvolution layer feature image F 'is output' M-m+1 ;
When m=1, the Mth deconvolution layer is used for processing the 1 st feature map, and the obtained Mth deconvolution layer feature map F' M And the final defogging prediction graph is obtained.
Step 3, training a polarized image defogging model based on 3D convolution;
based on the polarized image and the corresponding real haze-free image, training a polarized image haze removal model based on the 3D convolution by using an ADAM optimizer, using an average absolute error (L1 Loss) as a Loss function, and calculating Loss between a haze removal prediction graph and the real haze-free graph to update model parameters until the Loss function converges, so that an optimal polarized image haze removal model based on the 3D convolution is obtained, and performing haze removal processing on the synthesized polarized haze image and the real photographed polarized haze image. In this embodiment, in the training phase, the network trains 300 epochs, and the initial learning rate is set to 1e -4 Attenuation was 0.5 per 50 epochs.
In this embodiment, 8925 polarized images with 4 different polarization angles (0 °, 45 °, 90 °, 135 °) and their corresponding real haze-free images generated by the polarized image generation channel are used for training, in the training process, the size of the input polarized image is randomly cut into 256×256, and the haze-free image output by the haze-free model based on 3D convolution is usedAnd J, carrying out L1loss calculation, and carrying out training by combining the loss obtained by calculation with an ADAM optimizer to guide a network so as to obtain a defogging model based on the 3D convolution and using a polarized image.
In this embodiment, an electronic device includes a memory for storing a program supporting the processor to execute the above method, and a processor configured to execute the program stored in the memory.
In this embodiment, a computer-readable storage medium stores a computer program that, when executed by a processor, performs the steps of the method described above.
Table 1 shows that the polarized image defogging method based on 3D convolution of the invention takes 'PSNR' and 'SSIM' as evaluation indexes respectively. For fair comparison, all defogging methods were retrained on the composite dataset, and during the test, the present invention and other defogging methods used polarized images and ordinary RGB images, respectively. The "PSNR", peak signal-to-noise ratio, is the ratio of the maximum power of a signal to the noise power that may affect its accuracy of representation, with a larger value indicating less distortion in image defogging. "SSIM", i.e. structural similarity, the index examines the similarity of images in terms of brightness, contrast, and structure, with a range of values of 0,1, the larger the value, the more similar the defogging image is to the real defogging image. From the quantitative analysis of Table 1, it can be seen that the method of the present invention achieves the best results on both criteria.
TABLE 1
Methods | PSNR(↑) | SSIM(↑) |
AOD-Net | 20.58 | 0.80 |
PFFNet | 28.63 | 0.89 |
GridDehazeNet | 29.23 | 0.91 |
4KDehazing-Net | 29.47 | 0.90 |
FFA-Net | 29.93 | 0.92 |
GCANet | 29.98 | 0.91 |
Ours | 30.21 | 0.92 |
FIG. 3 is a graph showing the results of the 3D convolution-based polarized image defogging method of the present invention and the current other defogging methods on a composite dataset. Wherein, ours represents the polarized image defogging method based on 3D convolution; AOD-Net, based on restating the atmospheric scattering model, proposes to use an end-to-end trainable network for defogging and to replace two key unknowns in the atmospheric scattering model with one; inspiring the PFFNT end-to-end defogging idea, adopting a network based on a U-Net architecture, and adding a ResNet-based conversion module between an encoder and a decoder to improve complex feature learning of different layers; gridDehazent is inspired by a grid-based image segmentation network, and another end-to-end trainable network independent of an atmospheric model is proposed, comprising preprocessing, an attention-based multi-scale Backbone and a post-processing module, and an attention mechanism helps a Backbone module to adjust multi-scale feature fusion contribution by activating/deactivating a part of GridDehazent; 4KDehazing-Net proposes a 4K resolution image defogging framework of multi-directional guided bilateral learning, which can process 4K (3840×2160) resolution images at a speed of 125 frames/second; FFA-Net proposes to combine channel and pixel attention together and emphasize residual learning and feature fusion; GCANet uses smooth hole convolution and gating networks to aggregate context information with multi-level features.
Fig. 4 is a graph showing the results of the 3D convolution-based polarized image defogging method of the present invention and the current other defogging methods on a real dataset. It can be seen that the method of the present invention represents a significant advantage over the real dataset, as opposed to the comparison over the synthetic data. This is because real world scattering is a complex physical process with spatial variations, which makes it difficult to learn the effects of spatial variations from common RGB images, and the lack of physical-based learning features makes these methods prone to artifacts for pixels that are subject to large spatial variations. In contrast, the method of the present invention alleviates this problem by mining the correlation between polarized images of four different polarization angles.
Claims (5)
1. The polarized image defogging method based on 3D convolution is characterized by comprising the following steps of:
step 1, acquiring a synthesized polarized image dataset;
step 1.1, acquiring a haze-free image J (z) with a scene depth map d (z) and a semantic segmentation map S;
step 1.2, scattering factor beta to the atmosphere in a certain range, global atmosphere light A ∞ And the polarization degree of global atmospheric light DoP A Random assignment, thereby generating a foggy day image I (z) at pixel point z using equation (1):
I(z)=T(z)+A(z)=J(z)t(z)+A ∞ (1-t(z)) (1)
in the formula (1), z represents the spatial coordinates of the pixel, T (z), and A (z) represents the transmitted light at the pixel point z, respectivelyAnd atmospheric light, J (z) represents a haze-free image, t (z) represents a transmission diagram at a pixel point z, and t (z) =e -βd (z), wherein d (z) represents a scene depth map at pixel point z;
step 1.3, calculating the degree of polarization DoP of the transmitted light T T =g (S), where S represents a semantic segmentation map and g represents a random mapping function;
step 1.4, calculating the polarization degree DoP of the foggy day image I by using the formula (2):
I·DoP=T·DoP T +A·DoP A (2)
in formula (2), doP A The polarization degree of the atmospheric light a;
step 1.5, calculating the polarization angle to be by using the formula (3)Is->
In the formula (3), the amino acid sequence of the compound,representing the direction of a polarizer for transmitting a component parallel to the plane of incidence, an
Step 2, constructing a polarized image defogging model based on 3D convolution based on a U-Net architecture, wherein the method comprises the following steps: a POL-3D encoder, a spatial redundancy reduction module SSR, and a POL decoder;
step 2.1, constructing the POL-3D encoder formed by M3D convolution layers, wherein the M-th 3D convolution layer sequentially comprises: a convolution layer, an instance normalization and a ReLU activation function layer; the convolution kernel size of each convolution layer is formed by a tuple containing 3 integers and respectively represents the convolution kernel sizes in three dimensions of depth, height and width;
2.2, carrying out data lifting and fusion operation on polarized images with different polarization angles, then carrying out fusion, so as to obtain a 4-dimensional high-dimensional characteristic image, inputting the 4-dimensional high-dimensional characteristic image into a POL-3D encoder, and sequentially passing through M3D convolution layers, so as to obtain M characteristic images with different channel numbers, polarization angles, heights and widths, wherein 4 dimensions of the high-dimensional characteristic image comprise the channel numbers, the polarization angles, the image heights and the image widths;
step 2.3, after the spatial redundancy reduction module SSR processes the M feature maps, M effective feature maps F are obtained 1 ,F 2 ,...,F m ,...,F M ;
Step 2.4, the POL decoder pairs M significance signature F 1 ,F 2 ,…,F m ,…,F M After processing, outputting a final defogging prediction graph;
step 3, training a polarized image defogging model based on 3D convolution;
training the defogging model of the polarized image based on the 3D convolution by using an ADAM optimizer based on the polarized image and the corresponding real defogging image, using an average absolute error L1Loss as a Loss function, and calculating the Loss between the defogging prediction image and the real defogging image to update model parameters until the Loss function converges, thereby obtaining the optimal defogging model of the polarized image based on the 3D convolution, and carrying out defogging treatment on the synthesized polarized foggy image and the real photographed polarized foggy image.
2. The polarized image defogging method based on 3D convolution according to claim 1, wherein the spatial redundancy reduction module SSR in the step 2.3 is composed of M octave convolution layers and corresponding max pooling layers, each octave convolution layer includes a preprocessing block, an octave convolution block and a post-processing block, and is processed according to the following steps:
step 2.3.1, the preprocessing block in the mth octave convolution layer is composed of two branches, wherein one branch is composed of a convolution layer and an example normalization layer in sequence and is used for decomposing high-frequency characteristics; the other branch sequentially consists of an average pooling layer, a convolution layer and an example normalization layer and is used for decomposing the low-frequency characteristics; the convolution kernels of the two convolution layers are the same in size, and the number of output channels is different; m is M;
the mth feature map respectively passes through two branches of the preprocessing block in the mth octave convolution layer and correspondingly outputs the mth high-frequency feature mapAnd mth low-frequency characteristic map +.>
Step 2.3.2, the octave convolution block in the mth octave convolution layer consists of four convolution layers, two example normalization layers, an average pooling layer and an up-sampling layer;
the mth high-frequency characteristic diagramThe mth high-frequency to high-frequency characteristic diagram ++is obtained after the treatment of the first convolution layer of the octave convolution blocks in the mth octave convolution layer>At the same time, the mth high-frequency characteristic diagram +.>After being processed by an average pooling layer, the processed data are input into a second convolution layer for processing to obtain an mth high-frequency to low-frequency characteristic diagram +.>
The mth low-frequency characteristic diagramAfter the third convolution layer treatment, the m low-frequency to low-frequency characteristic diagram is obtainedAt the same time, the mth low-frequency characteristic diagram +.>Sequentially inputting into a fourth convolution layer and an up-sampling layer for processing to obtain an mth low-frequency to high-frequency characteristic diagram ++>
Will beAnd->After fusion, the obtained product is input into a first example normalization layer, and an mth high-frequency characteristic diagram is output
Will beAnd->After fusion, the signals are input into a normalization layer of a second example, and an mth low-frequency characteristic diagram is output +.>
Step 2.3.3, the post-processing block in the mth octave convolution layer consists of two convolution layers, an up-sampling layer and an example normalization layer;
the mth high-frequency characteristic diagramAfter passing through a convolution layer of the post-processing block, the mth high-frequency to high-frequency characteristic diagram is obtained>
The mth low-frequency characteristic diagramAfter passing through another convolution layer of the post-processing block, the result is input into an up-sampling layer to obtain the m-th low-frequency to high-frequency characteristic diagram +.>
And->After fusion, the mth feature map ++is obtained after the treatment of an example normalization layer>
Mth feature mapInputting the m-th octave convolution layer to the corresponding maximum pooling for processing to obtain an m-th effective feature map F m The method comprises the steps of carrying out a first treatment on the surface of the Thereby obtaining M effective feature maps F from the feature maps of M different channel numbers, polarization angles, heights and widths after the processing of the space redundancy reduction module SSR 1 ,F 2 ,…,F m ,…,F M 。
3. The polarized image defogging method based on 3D convolution according to claim 2, wherein the POL decoder in step 2.4 is composed of M deconvolution layers;
when m=1, 2, …, M-1, each deconvolution layer comprises one 2D convolution layer and one example normalization layer; when m=m, the deconvolution layer includes only one 2D convolution layer;
when m=m, the mth feature map F M Processing by using bilinear interpolation function, inputting into 1 st deconvolution layer, and outputting 1 st deconvolution layer characteristic diagram F' 1 ;
When m=m-1, M-2, …,2, the mth feature map F m With the M-M deconvolution layer feature map F' M-m After fusion, the fused feature images are processed by using a bilinear interpolation function and then are input into an M-m+1th deconvolution layer, and an M-m+1th deconvolution layer feature image F 'is output' M-m+1 ;
When m=1, the Mth deconvolution layer processes the 1 st feature map to obtain an Mth deconvolution layer feature map F' M And the final defogging prediction graph is obtained.
4. An electronic device comprising a memory and a processor, wherein the memory is configured to store a program for supporting the processor to perform the polarized image defogging method of any of claims 1-3, the processor being configured to execute the program stored in the memory.
5. A computer readable storage medium having stored thereon a computer program, characterized in that the computer program when executed by a processor performs the steps of the polarized image defogging method of any of the claims 1-3.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310390770.0A CN116452450A (en) | 2023-04-07 | 2023-04-07 | Polarized image defogging method based on 3D convolution |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310390770.0A CN116452450A (en) | 2023-04-07 | 2023-04-07 | Polarized image defogging method based on 3D convolution |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116452450A true CN116452450A (en) | 2023-07-18 |
Family
ID=87121490
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310390770.0A Pending CN116452450A (en) | 2023-04-07 | 2023-04-07 | Polarized image defogging method based on 3D convolution |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116452450A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117911282A (en) * | 2024-03-19 | 2024-04-19 | 华中科技大学 | Construction method and application of image defogging model |
-
2023
- 2023-04-07 CN CN202310390770.0A patent/CN116452450A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117911282A (en) * | 2024-03-19 | 2024-04-19 | 华中科技大学 | Construction method and application of image defogging model |
CN117911282B (en) * | 2024-03-19 | 2024-05-28 | 华中科技大学 | Construction method and application of image defogging model |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112184577B (en) | Single image defogging method based on multiscale self-attention generation countermeasure network | |
CN110517203B (en) | Defogging method based on reference image reconstruction | |
CN111709888B (en) | Aerial image defogging method based on improved generation countermeasure network | |
CN110378849B (en) | Image defogging and rain removing method based on depth residual error network | |
CN110288550B (en) | Single-image defogging method for generating countermeasure network based on priori knowledge guiding condition | |
Hsu et al. | Single image dehazing using wavelet-based haze-lines and denoising | |
CN111861939B (en) | Single image defogging method based on unsupervised learning | |
CN110675340A (en) | Single image defogging method and medium based on improved non-local prior | |
CN111986108A (en) | Complex sea-air scene image defogging method based on generation countermeasure network | |
CN112419163B (en) | Single image weak supervision defogging method based on priori knowledge and deep learning | |
CN113284061B (en) | Underwater image enhancement method based on gradient network | |
CN112070688A (en) | Single image defogging method for generating countermeasure network based on context guidance | |
Bansal et al. | A review of image restoration based image defogging algorithms | |
CN116452450A (en) | Polarized image defogging method based on 3D convolution | |
Hsu et al. | Object detection using structure-preserving wavelet pyramid reflection removal network | |
CN115631223A (en) | Multi-view stereo reconstruction method based on self-adaptive learning and aggregation | |
Zhao et al. | A multi-scale U-shaped attention network-based GAN method for single image dehazing | |
WO2024178979A1 (en) | Single-image defogging method based on detail restoration | |
Babu et al. | An efficient image dahazing using Googlenet based convolution neural networks | |
CN117853370A (en) | Underwater low-light image enhancement method and device based on polarization perception | |
CN117011181A (en) | Classification-guided unmanned aerial vehicle imaging dense fog removal method | |
CN117036182A (en) | Defogging method and system for single image | |
CN116757949A (en) | Atmosphere-ocean scattering environment degradation image restoration method and system | |
CN116703750A (en) | Image defogging method and system based on edge attention and multi-order differential loss | |
CN113724156B (en) | Anti-network defogging method and system combining generation of atmospheric scattering model |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |