CN116452450A - Polarized image defogging method based on 3D convolution - Google Patents

Polarized image defogging method based on 3D convolution Download PDF

Info

Publication number
CN116452450A
CN116452450A CN202310390770.0A CN202310390770A CN116452450A CN 116452450 A CN116452450 A CN 116452450A CN 202310390770 A CN202310390770 A CN 202310390770A CN 116452450 A CN116452450 A CN 116452450A
Authority
CN
China
Prior art keywords
convolution
layer
mth
image
defogging
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310390770.0A
Other languages
Chinese (zh)
Inventor
王昕�
付伟
于海潮
高隽
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hefei University of Technology
Original Assignee
Hefei University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hefei University of Technology filed Critical Hefei University of Technology
Priority to CN202310390770.0A priority Critical patent/CN116452450A/en
Publication of CN116452450A publication Critical patent/CN116452450A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/73Deblurring; Sharpening
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • G06N3/0455Auto-encoder networks; Encoder-decoder networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/50Image enhancement or restoration using two or more images, e.g. averaging or subtraction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20212Image combination
    • G06T2207/20221Image fusion; Image merging
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Processing (AREA)

Abstract

The invention discloses a polarized image defogging method based on 3D convolution, which comprises the following steps: 1. acquiring a synthesized polarized image dataset by utilizing a polarized image generation channel; 2. constructing a depth convolution neural network based on 3D convolution and using polarized images, taking four polarized images with different polarization angles as input, and training the depth convolution neural network to obtain a defogging model; 3. and carrying out defogging treatment on the polarized image to be defogged by using the trained model to obtain a recovered defogged image. The invention can realize defogging of the polarized image based on 3D convolution, so as to effectively improve defogging effect under complex and changeable scenes, thereby providing clearer images for a plurality of advanced visual tasks.

Description

Polarized image defogging method based on 3D convolution
Technical Field
The invention belongs to the fields of computer vision, image processing and analysis, and particularly relates to a defogging method based on a 3D convolution network by using polarized images.
Background
Fog or haze is a common atmospheric phenomenon, and under such weather conditions, outdoor air contains a large amount of tiny suspended particles, the tiny particles can refract and scatter atmospheric light, and light rays after refraction and scattering can be mixed with light rays after reflection of a target scene to be observed, so that the visibility of the scene is greatly reduced, the contrast of an image captured by an outdoor image acquisition device is obviously reduced, and even the phenomena of image color distortion and great loss of details are caused. Some advanced computer vision tasks, such as object detection, image segmentation, etc., require high quality images as input. However, in severe weather conditions with fog or haze, the quality of the acquired image is degraded, greatly affecting the processing of these visual tasks. Therefore, image defogging work is important.
In recent years, image defogging has been increasingly emphasized by researchers, and many image defogging models with good performance have been proposed. Currently, existing frames fall broadly into two categories: traditional defogging method based on manual priori knowledge and defogging method based on deep learning algorithm. The traditional defogging method relies on manual priori knowledge based on statistics from clear images, and utilizes an atmospheric scattering model to restore a defogging image. There are well known proposed dark channel prior algorithms (DCPs) based on the assumption that the value of a pixel in a haze free image in at least one color channel is close to 0. While conventional defogging methods have advanced to some extent, these assumptions and prior knowledge are specific to the particular scene and weather conditions, and therefore have limited generalization capability, i.e., the defogging capability of the model will be significantly reduced once the environment has changed significantly. The deep learning-based method trains a defogging model through a large amount of training data, and tests on test data by using the trained model. The deep learning-based method can be divided into two types, one is to recover the foggy image indirectly through the parameters in the network learning atmosphere model, and the other is to directly output the foggy image by taking the foggy image as input from end to end by using the deep learning network.
However, these learning-based approaches still suffer from drawbacks: 1. most of the input of the deep learning-based method is that a single image (RGB image) is subjected to training test by using an atmospheric scattering model, however, two key parameters in the model need to be estimated simultaneously, and thus the problems of discomfort and poor generalization capability are caused; 2. in order to solve the ill-posed problem and to improve the generalization ability, more and more defogging methods based on a plurality of images are presented, in which the method using polarized images of different polarization angles can make full use of scene information to obtain good effect, and then most of these methods are based on the assumption that transmitted light is not significantly polarized or require a specific cue such as sky area or similar object, resulting in deterioration of defogging effect of the real world foggy image.
Disclosure of Invention
The invention provides a polarized image defogging method based on 3D convolution for solving the defects in the prior art, so as to improve the quality of a shot image in a foggy environment and the defogging effect in a complex and changeable scene, thereby meeting the requirements of pictures required by high-level visual tasks and providing clearer images for a plurality of high-level visual tasks.
In order to solve the technical problems, the invention adopts the following technical scheme:
the invention discloses a polarized image defogging method based on 3D convolution, which is characterized by comprising the following steps:
step 1, acquiring a synthesized polarized image dataset;
step 1.1, acquiring a haze-free image J (z) with a scene depth map d (z) and a semantic segmentation map S;
step 1.2, scattering factor beta to the atmosphere in a certain range, global atmosphere light A And the polarization degree of global atmospheric light DoP A Random assignment, thereby generating a foggy day image I (z) at pixel point z using equation (1):
I(z)=T(z)+A(z)=J(z)t(z)+A (1-t(z)) (1)
in the formula (1), z represents the spatial coordinates of the pixel, T (z), a (z) represents the transmitted light and the atmospheric light at the pixel point z, J (z) represents the haze-free image, T (z) represents the transmission diagram at the pixel point z, and T (z) =e -βd(z) Where d (z) represents the scene depth map at pixel point z;
step 1.3, calculating the degree of polarization DoP of the transmitted light T T =g (S), where S represents a semantic segmentation map and g represents a random mapping function;
step 1.4, calculating the polarization degree DoP of the foggy day image I by using the formula (2):
I·DoP=T·DoP T +A·DoP A (2)
in the formula (2), doPA represents the degree of polarization of atmospheric light a;
step 1.5, calculating the polarization angle to be by using the formula (3)Is->
In the formula (3), the amino acid sequence of the compound,representing the direction of a polarizer for transmitting a component parallel to the plane of incidence, an
Step 2, constructing a polarized image defogging model based on 3D convolution based on a U-Net architecture, wherein the method comprises the following steps: a POL-3D encoder, a spatial redundancy reduction module SSR, and a POL decoder;
step 2.1, constructing the POL-3D encoder formed by M3D convolution layers, wherein the M-th 3D convolution layer sequentially comprises: a convolution layer, an instance normalization and a ReLU activation function layer; the convolution kernel size of each convolution layer is formed by a tuple containing 3 integers and respectively represents the convolution kernel sizes in three dimensions of depth, height and width;
2.2, carrying out data lifting and fusion operation on polarized images with different polarization angles, then carrying out fusion, so as to obtain a 4-dimensional high-dimensional characteristic image, inputting the 4-dimensional high-dimensional characteristic image into a POL-3D encoder, and sequentially passing through M3D convolution layers, so as to obtain M characteristic images with different channel numbers, polarization angles, heights and widths, wherein 4 dimensions of the high-dimensional characteristic image comprise the channel numbers, the polarization angles, the image heights and the image widths;
step 2.3, after the spatial redundancy reduction module SSR processes the M feature maps, M effective feature maps F are obtained 1 ,F 2 ,…,F m ,…,F M
Step 2.4, the POL decoder pairs M significance signature F 1 ,F 2 ,…,F m ,…,F M After processing, outputting a final defogging prediction graph;
step 3, training a polarized image defogging model based on 3D convolution;
training the defogging model of the polarized image based on the 3D convolution by using an ADAM optimizer based on the polarized image and the corresponding real defogging image, using an average absolute error L1Loss as a Loss function, and calculating the Loss between the defogging prediction image and the real defogging image to update model parameters until the Loss function converges, thereby obtaining the optimal defogging model of the polarized image based on the 3D convolution, and carrying out defogging treatment on the synthesized polarized foggy image and the real photographed polarized foggy image.
The polarized image defogging method based on 3D convolution is also characterized in that the space redundancy reduction module SSR in the step 2.3 is composed of M octave convolution layers and corresponding maximum pooling layers, each octave convolution layer comprises a preprocessing block, an octave convolution block and a post-processing block, and the steps are as follows:
step 2.3.1, the preprocessing block in the mth octave convolution layer is composed of two branches, wherein one branch is composed of a convolution layer and an example normalization layer in sequence and is used for decomposing high-frequency characteristics; the other branch sequentially consists of an average pooling layer, a convolution layer and an example normalization layer and is used for decomposing the low-frequency characteristics; the convolution kernels of the two convolution layers are the same in size, and the number of output channels is different; m is M;
the mth feature map respectively passes through two branches of the preprocessing block in the mth octave convolution layer and correspondingly outputs the mth high-frequency feature mapAnd mth low-frequency characteristic map +.>
Step 2.3.2, the octave convolution block in the mth octave convolution layer consists of four convolution layers, two example normalization layers, an average pooling layer and an up-sampling layer;
the mth high-frequency characteristic diagramThe mth high-frequency to high-frequency characteristic diagram ++is obtained after the treatment of the first convolution layer of the octave convolution blocks in the mth octave convolution layer>At the same time, the mth high-frequency characteristic diagram +.>After being processed by an average pooling layer, the processed data are input into a second convolution layer for processing to obtain an mth high-frequency to low-frequency characteristic diagram +.>
The mth low-frequency characteristic diagramAfter the treatment of the third convolution layer, the mth low-frequency to low-frequency characteristic diagram +.>At the same time, the mth low-frequency characteristic diagram +.>Sequentially inputting into a fourth convolution layer and an up-sampling layer for processing to obtain an mth low-frequency to high-frequency characteristic diagram ++>
Will beAnd->After fusion, input into the first example normalization layer, and output the mth high frequency feature map +.>
Will beAnd->After fusion, the obtained product is input into a second example normalization layer, and an mth low-frequency characteristic diagram is output
Step 2.3.3, the post-processing block in the mth octave convolution layer consists of two convolution layers, an up-sampling layer and an example normalization layer;
the mth high-frequency characteristic diagramAfter passing through a convolution layer of the post-processing block, the mth high-frequency to high-frequency characteristic diagram is obtained>
The mth low-frequency characteristic diagramAfter passing through another convolution layer of the post-processing block, the result is input into an up-sampling layer to obtain the m-th low-frequency to high-frequency characteristic diagram +.>
And->After fusion, the mth feature map ++is obtained after the treatment of an example normalization layer>
Mth feature mapInputting the m-th octave convolution layer to the corresponding maximum pooling for processing to obtain an m-th effective feature map F m The method comprises the steps of carrying out a first treatment on the surface of the Thereby obtaining M effective feature maps F from the feature maps of M different channel numbers, polarization angles, heights and widths after the processing of the space redundancy reduction module SSR 1 ,F 2 ,…,F m ,…,F M
The POL decoder in the step 2.4 is composed of M deconvolution layers;
when m=1, 2, …, M-1, each deconvolution layer comprises one 2D convolution layer and one example normalization layer; when m=m, the deconvolution layer includes only one 2D convolution layer;
when m=m, the mth feature map F M Processing by using bilinear interpolation function, inputting into 1 st deconvolution layer, and outputting 1 st deconvolution layer characteristic diagram F' 1
When m=m-1, M-2, …,2, the mth feature map F m With the M-M deconvolution layer feature map F' M-m After fusion, the fused feature images are processed by using a bilinear interpolation function and then are input into an M-m+1th deconvolution layer, and an M-m+1th deconvolution layer feature image F 'is output' M-m+1
When m=1, the Mth deconvolution layer processes the 1 st feature map to obtain an Mth deconvolution layer feature map F' m And the final defogging prediction graph is obtained.
The electronic device of the invention comprises a memory and a processor, wherein the memory is used for storing a program for supporting the processor to execute any polarized image defogging method, and the processor is configured to execute the program stored in the memory.
The invention relates to a computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being run by a processor, performs the steps of any one of the polarized image defogging methods.
Compared with the prior art, the invention has the beneficial effects that:
1. according to the invention, by constructing the polarized image depth neural network based on the 3D convolution, using polarized images with different polarization angles as input and combining with the 3D convolution encoder, the problem that the model design is carried out by using the 2D convolution in the traditional multi-image defogging network, and the correlation between grouped images is ignored is solved, so that the quality of the restored defogging image is improved.
2. The polarization image depth neural network based on the 3D convolution constructed by the invention introduces polarization information, uses a plurality of polarization images with different polarization angles in the same scene, can acquire more abundant scene information, solves the problem of poor generalization capability caused by using a single image as an image feature extracted from input dependent training data, thereby improving the generalization capability of the defogging network and being capable of adapting to complex and changeable environments.
3. The polarization image depth neural network based on the 3D convolution constructed by the invention introduces the space redundancy reducing SRR module based on the octave convolution, decomposes the feature image output by the convolution layer into features with different space frequencies, can safely reduce the space resolution of a low-frequency group through information sharing between adjacent positions, and solves the space redundancy problem caused by encoder dense parameters, thereby reducing the parameter quantity of the network and ensuring that the network is lighter.
4. The polarization image depth neural network based on the 3D convolution constructed by the invention aggregates the characteristic image output by each layer of the encoder with the characteristic image after the spatial redundancy reduction module to obtain a more refined prediction result, thereby improving the defogging effect.
Drawings
FIG. 1 is a flow chart of defogging using polarized images based on a 3D convolutional defogging network in accordance with the present invention;
FIG. 2 is a schematic diagram of a polarized image defogging depth neural network based on 3D convolution;
FIG. 3 is a graph of defogging results of the method of the present invention and other defogging methods on a synthetic dataset;
fig. 4 is a graph of defogging results on a real world dataset by the present invention and other defogging methods.
Detailed Description
In this embodiment, a polarized image defogging method based on 3D convolution aims to solve the problems that an existing network lacks a polarized data set and extracts useful information from grouped polarized images (multiple images shot at different polarization angles at the same view angle), and a defogging model capable of effectively defogging without specific clues is obtained by constructing a polarized image depth neural network based on 3D convolution, so that the quality of shot images in a foggy environment can be improved, and the requirements of pictures required by advanced visual tasks are met. Specifically, as shown in fig. 1, the steps are as follows:
step 1, acquiring a synthesized polarized image dataset;
step 1.1, searching a proper original data set for synthesizing a polarized image data set;
the original data set needs to meet the following two requirements: (1) A haze-free image J (z) with a scene depth map d (z); (2) with a semantic segmentation map S;
step 1.2, scattering factor beta to the atmosphere in a certain range, global atmosphere light A And the polarization degree of global atmospheric light DoP A Random assignment, thereby generating a foggy day image I (z) at pixel point z using equation (1):
I(z)=T(z)+A(z)=J(z)t(z)+A (1-t(z)) (1)
in the formula (1), z represents the spatial coordinates of the pixel, T (z), a (z) represents the transmitted light and the atmospheric light at the pixel point z, J (z) represents the haze-free image, T (z) represents the transmission diagram at the pixel point z, and T (z) =e -βd(z) Where d (z) represents the scene depth at pixel point z; in the present embodiment, the value range of the atmospheric scattering factor β is [0.01,0.02 ]]Global atmosphere light a The value range of (5) is [0.85,0.95 ]]Polarization degree DoP of global atmosphere light A The value range of (2) is [0.05,0.4 ]]。
Step 1.3, calculating the degree of polarization DoP of the transmitted light T T =g (S), where S represents a semantic segmentation map and is provided by the original dataset, g represents a random mapping function; in the present embodiment, the polarization degree DoP of transmitted light T The value range of (5) is [0.025,0.2 ]]。
Step 1.4, calculating the polarization degree DoP of the foggy day image I by using the formula (2):
I·DoP=T·DoP T +A·DoP A (2)
in formula (2), doP A The polarization degree of the atmospheric light a; in this embodiment, I, T, A may be decomposed into I // And I ,T // And T ,A // And A Where// and t denote that the component is parallel or perpendicular to the plane of incidence. Thus, the degree of polarization of I, T, A can be defined as
Step 1.5, calculating the polarization angle to be by using the formula (3)Is->
In the formula (3), the amino acid sequence of the compound,representing the direction of a polarizer for transmitting a component parallel to the plane of incidence, an
In this embodiment, because the special requirements of the polarization-based synthetic dataset generation pipeline cannot be used to generate the dataset of the present invention from the Foggy image provided in the existing standard, and the Foggy Cityscapes-DBF dataset meets all the requirements, the present invention uses the Foggy image J and the depth map z provided by the dataset to generate the scattering coefficient beta and the global atmosphere light A And generating DOP by using semantic segmentation map S T And finally calculating to obtain a fog pattern I.
Step 2, constructing a polarized image defogging model based on 3D convolution based on a U-Net architecture, wherein the method comprises the following steps: a POL-3D encoder, a spatial redundancy reduction module SSR, and a POL decoder;
step 2.1, constructing a POL-3D encoder consisting of M3D convolution layers, wherein the M-th 3D convolution layer sequentially comprises: a convolution layer, an instance normalization and a ReLU activation function layer; the convolution kernel size of each convolution layer is formed by a tuple containing 3 integers and respectively represents the convolution kernel sizes in three dimensions of depth, height and width; in this embodiment, the value of M is 5, and when m=1, 5, the convolution kernel size is set to (3, 3), and when m=2,..4, the convolution kernel size is set to (2, 3).
2.2, carrying out data lifting and fusion operation on polarized images with different polarization angles, then carrying out fusion, so as to obtain a 4-dimensional high-dimensional characteristic image, inputting the 4-dimensional high-dimensional characteristic image into a POL-3D encoder, and sequentially passing through M3D convolution layers, so as to obtain M characteristic images with different channel numbers, polarization angles, heights and widths, wherein 4 dimensions of the high-dimensional characteristic image comprise the channel numbers, the polarization angles, the image heights and the image widths; in this embodiment, using four polarized images of non-passing polarization angles (0 °, 45 °, 90 °, 135 °) as inputs, the high-dimensional feature map can be expressed as a four-dimensional tensor: initial values of c×p×h×w, C, P, H, W are 3,4, 256, respectively. The number of output channels of the 5 3D convolutional layers is set to 64, 128, 256, 512, respectively.
Step 2.3, constructing a space redundancy reduction module SSR, which consists of M octave convolution layers and corresponding maximum pooling layers, wherein each octave convolution layer comprises a preprocessing block First OctConv, an octave convolution block OctConv and a post-processing block Last OctConv; in this embodiment, the number of input/output channels of the 5 octave convolution layers is (64, 64), (128 ), (256,256), (512 ), and the convolution kernel sizes of the preprocessing block, the octave convolution block, and the post-processing block are (1, 1), (3, 3), (3, 3), respectively.
Step 2.3.1, the preprocessing block in the mth octave convolution layer is composed of two branches, wherein one branch is composed of a convolution layer and an example normalization layer in sequence and is used for decomposing high-frequency characteristics; the other branch sequentially consists of an average pooling layer, a convolution layer and an example normalization layer and is used for decomposing the low-frequency characteristics; the convolution kernels of the two convolution layers are the same in size, and the output channels are different in number; m is M; in this embodiment, the convolution kernels of the two convolution layers are (3, 3), the number of output channels is controlled by a factor α, and the number of output channels of the convolution layer that decomposes the high frequency characteristic is αc out The number of output channels of the convolution layer which decomposes the low frequency characteristic is (1-alpha) c out Alpha is 0.5, the convolution kernel size of the average pooling layer is (1, 2), and the convolution step size is (1, 2).
The mth feature map respectively passes through two branches of the preprocessing block in the mth octave convolution layer, and correspondingly outputs an mth high-frequency feature map and an mth low-frequency feature map;
step 2.3.2, the octave convolution block in the mth octave convolution layer consists of four convolution layers, two example normalization layers, an average pooling layer and an up-sampling layer;
the mth high-frequency characteristic diagramThe mth high-frequency to high-frequency characteristic diagram is obtained after the treatment of the first convolution layer in the mth octave convolution layers>At the same time, mth high-frequency characteristic diagram +.>After being processed by an average pooling layer, the processed data is input into a second convolution layer for processing to obtain an mth high-frequency to low-frequency characteristic diagram +.>
The mth low-frequency characteristic diagramRespectively processing the first convolution layer to obtain the m low-frequency to low-frequency characteristic diagram +.>At the same time, mth low-frequency characteristic diagram +.>Sequentially inputting a fourth convolution layer, and processing in an up-sampling layer to obtain an mth low-frequency to high-frequency characteristic diagram ++>In the present embodimentIn the example, the average pooled convolution kernel size is (1, 2) and the convolution step size is (1, 2). The up-sampled amplification factor is (1, 2) and the algorithm uses the nearest algorithm.
Will beAnd->After fusion, input into the first example normalization layer, and output the mth high frequency feature map +.>
Will beAnd->After fusion, the obtained product is input into a second example normalization layer, and an mth low-frequency characteristic diagram is output
Step 2.3.3, the post-processing block in the mth octave convolution layer consists of two convolution layers, an up-sampling layer and an example normalization layer;
the mth high-frequency characteristic diagramAfter passing through a convolution layer of the post-processing block, the mth high-frequency to high-frequency characteristic diagram is obtained>
The mth low-frequency characteristic diagramAfter passing through another convolution layer of the post-processing block, is input into oneIn the up-sampling layer, the mth low-frequency to high-frequency characteristic diagram is obtained>
And->After fusion, the mth feature map ++is obtained after the treatment of an example normalization layer>
Mth feature mapInputting the m-th octave convolution layer to the corresponding maximum pooling for processing to obtain an m-th effective feature map F m The method comprises the steps of carrying out a first treatment on the surface of the Thereby obtaining M effective feature maps F from the feature maps of M different channel numbers, polarization angles, heights and widths after the processing of the space redundancy reduction module SSR 1 ,F 2 ,…,F m ,…,F M
Step 2.4, constructing a POL decoder consisting of M deconvolution layers; combining the low, middle and high layer feature maps generated by the POL-3D encoder with the output of each deconvolution layer after passing through a spatial redundancy reduction module by using a bilinear interpolation function to output a final defogging prediction map;
when m=1, 2, …, M-1, each deconvolution layer comprises one 2D convolution layer and one example normalization layer; when m=m, the deconvolution layer includes only one 2D convolution layer; in this embodiment, the number of input/output channels of the M-ary 5,5 deconvolution layers is (512 ), (1024,256), (512,128), (256, 64) and (128,3), respectively, the convolution kernel size is 3, and the convolution step size is 1.
When m=m, the mth feature map F M Processing by bilinear interpolation function and inputting to 1 st inverseIn the convolution layer, the 1 st deconvolution layer characteristic diagram F 'is output' 1
When m=m-1, M-2, …,1, the mth feature map F m With the M-M deconvolution layer feature map F' M-m After fusion, the fused feature images are processed by using a bilinear interpolation function and then are input into an M-m+1th deconvolution layer, and an M-m+1th deconvolution layer feature image F 'is output' M-m+1
When m=1, the Mth deconvolution layer is used for processing the 1 st feature map, and the obtained Mth deconvolution layer feature map F' M And the final defogging prediction graph is obtained.
Step 3, training a polarized image defogging model based on 3D convolution;
based on the polarized image and the corresponding real haze-free image, training a polarized image haze removal model based on the 3D convolution by using an ADAM optimizer, using an average absolute error (L1 Loss) as a Loss function, and calculating Loss between a haze removal prediction graph and the real haze-free graph to update model parameters until the Loss function converges, so that an optimal polarized image haze removal model based on the 3D convolution is obtained, and performing haze removal processing on the synthesized polarized haze image and the real photographed polarized haze image. In this embodiment, in the training phase, the network trains 300 epochs, and the initial learning rate is set to 1e -4 Attenuation was 0.5 per 50 epochs.
In this embodiment, 8925 polarized images with 4 different polarization angles (0 °, 45 °, 90 °, 135 °) and their corresponding real haze-free images generated by the polarized image generation channel are used for training, in the training process, the size of the input polarized image is randomly cut into 256×256, and the haze-free image output by the haze-free model based on 3D convolution is usedAnd J, carrying out L1loss calculation, and carrying out training by combining the loss obtained by calculation with an ADAM optimizer to guide a network so as to obtain a defogging model based on the 3D convolution and using a polarized image.
In this embodiment, an electronic device includes a memory for storing a program supporting the processor to execute the above method, and a processor configured to execute the program stored in the memory.
In this embodiment, a computer-readable storage medium stores a computer program that, when executed by a processor, performs the steps of the method described above.
Table 1 shows that the polarized image defogging method based on 3D convolution of the invention takes 'PSNR' and 'SSIM' as evaluation indexes respectively. For fair comparison, all defogging methods were retrained on the composite dataset, and during the test, the present invention and other defogging methods used polarized images and ordinary RGB images, respectively. The "PSNR", peak signal-to-noise ratio, is the ratio of the maximum power of a signal to the noise power that may affect its accuracy of representation, with a larger value indicating less distortion in image defogging. "SSIM", i.e. structural similarity, the index examines the similarity of images in terms of brightness, contrast, and structure, with a range of values of 0,1, the larger the value, the more similar the defogging image is to the real defogging image. From the quantitative analysis of Table 1, it can be seen that the method of the present invention achieves the best results on both criteria.
TABLE 1
Methods PSNR(↑) SSIM(↑)
AOD-Net 20.58 0.80
PFFNet 28.63 0.89
GridDehazeNet 29.23 0.91
4KDehazing-Net 29.47 0.90
FFA-Net 29.93 0.92
GCANet 29.98 0.91
Ours 30.21 0.92
FIG. 3 is a graph showing the results of the 3D convolution-based polarized image defogging method of the present invention and the current other defogging methods on a composite dataset. Wherein, ours represents the polarized image defogging method based on 3D convolution; AOD-Net, based on restating the atmospheric scattering model, proposes to use an end-to-end trainable network for defogging and to replace two key unknowns in the atmospheric scattering model with one; inspiring the PFFNT end-to-end defogging idea, adopting a network based on a U-Net architecture, and adding a ResNet-based conversion module between an encoder and a decoder to improve complex feature learning of different layers; gridDehazent is inspired by a grid-based image segmentation network, and another end-to-end trainable network independent of an atmospheric model is proposed, comprising preprocessing, an attention-based multi-scale Backbone and a post-processing module, and an attention mechanism helps a Backbone module to adjust multi-scale feature fusion contribution by activating/deactivating a part of GridDehazent; 4KDehazing-Net proposes a 4K resolution image defogging framework of multi-directional guided bilateral learning, which can process 4K (3840×2160) resolution images at a speed of 125 frames/second; FFA-Net proposes to combine channel and pixel attention together and emphasize residual learning and feature fusion; GCANet uses smooth hole convolution and gating networks to aggregate context information with multi-level features.
Fig. 4 is a graph showing the results of the 3D convolution-based polarized image defogging method of the present invention and the current other defogging methods on a real dataset. It can be seen that the method of the present invention represents a significant advantage over the real dataset, as opposed to the comparison over the synthetic data. This is because real world scattering is a complex physical process with spatial variations, which makes it difficult to learn the effects of spatial variations from common RGB images, and the lack of physical-based learning features makes these methods prone to artifacts for pixels that are subject to large spatial variations. In contrast, the method of the present invention alleviates this problem by mining the correlation between polarized images of four different polarization angles.

Claims (5)

1. The polarized image defogging method based on 3D convolution is characterized by comprising the following steps of:
step 1, acquiring a synthesized polarized image dataset;
step 1.1, acquiring a haze-free image J (z) with a scene depth map d (z) and a semantic segmentation map S;
step 1.2, scattering factor beta to the atmosphere in a certain range, global atmosphere light A And the polarization degree of global atmospheric light DoP A Random assignment, thereby generating a foggy day image I (z) at pixel point z using equation (1):
I(z)=T(z)+A(z)=J(z)t(z)+A (1-t(z)) (1)
in the formula (1), z represents the spatial coordinates of the pixel, T (z), and A (z) represents the transmitted light at the pixel point z, respectivelyAnd atmospheric light, J (z) represents a haze-free image, t (z) represents a transmission diagram at a pixel point z, and t (z) =e -βd (z), wherein d (z) represents a scene depth map at pixel point z;
step 1.3, calculating the degree of polarization DoP of the transmitted light T T =g (S), where S represents a semantic segmentation map and g represents a random mapping function;
step 1.4, calculating the polarization degree DoP of the foggy day image I by using the formula (2):
I·DoP=T·DoP T +A·DoP A (2)
in formula (2), doP A The polarization degree of the atmospheric light a;
step 1.5, calculating the polarization angle to be by using the formula (3)Is->
In the formula (3), the amino acid sequence of the compound,representing the direction of a polarizer for transmitting a component parallel to the plane of incidence, an
Step 2, constructing a polarized image defogging model based on 3D convolution based on a U-Net architecture, wherein the method comprises the following steps: a POL-3D encoder, a spatial redundancy reduction module SSR, and a POL decoder;
step 2.1, constructing the POL-3D encoder formed by M3D convolution layers, wherein the M-th 3D convolution layer sequentially comprises: a convolution layer, an instance normalization and a ReLU activation function layer; the convolution kernel size of each convolution layer is formed by a tuple containing 3 integers and respectively represents the convolution kernel sizes in three dimensions of depth, height and width;
2.2, carrying out data lifting and fusion operation on polarized images with different polarization angles, then carrying out fusion, so as to obtain a 4-dimensional high-dimensional characteristic image, inputting the 4-dimensional high-dimensional characteristic image into a POL-3D encoder, and sequentially passing through M3D convolution layers, so as to obtain M characteristic images with different channel numbers, polarization angles, heights and widths, wherein 4 dimensions of the high-dimensional characteristic image comprise the channel numbers, the polarization angles, the image heights and the image widths;
step 2.3, after the spatial redundancy reduction module SSR processes the M feature maps, M effective feature maps F are obtained 1 ,F 2 ,...,F m ,...,F M
Step 2.4, the POL decoder pairs M significance signature F 1 ,F 2 ,…,F m ,…,F M After processing, outputting a final defogging prediction graph;
step 3, training a polarized image defogging model based on 3D convolution;
training the defogging model of the polarized image based on the 3D convolution by using an ADAM optimizer based on the polarized image and the corresponding real defogging image, using an average absolute error L1Loss as a Loss function, and calculating the Loss between the defogging prediction image and the real defogging image to update model parameters until the Loss function converges, thereby obtaining the optimal defogging model of the polarized image based on the 3D convolution, and carrying out defogging treatment on the synthesized polarized foggy image and the real photographed polarized foggy image.
2. The polarized image defogging method based on 3D convolution according to claim 1, wherein the spatial redundancy reduction module SSR in the step 2.3 is composed of M octave convolution layers and corresponding max pooling layers, each octave convolution layer includes a preprocessing block, an octave convolution block and a post-processing block, and is processed according to the following steps:
step 2.3.1, the preprocessing block in the mth octave convolution layer is composed of two branches, wherein one branch is composed of a convolution layer and an example normalization layer in sequence and is used for decomposing high-frequency characteristics; the other branch sequentially consists of an average pooling layer, a convolution layer and an example normalization layer and is used for decomposing the low-frequency characteristics; the convolution kernels of the two convolution layers are the same in size, and the number of output channels is different; m is M;
the mth feature map respectively passes through two branches of the preprocessing block in the mth octave convolution layer and correspondingly outputs the mth high-frequency feature mapAnd mth low-frequency characteristic map +.>
Step 2.3.2, the octave convolution block in the mth octave convolution layer consists of four convolution layers, two example normalization layers, an average pooling layer and an up-sampling layer;
the mth high-frequency characteristic diagramThe mth high-frequency to high-frequency characteristic diagram ++is obtained after the treatment of the first convolution layer of the octave convolution blocks in the mth octave convolution layer>At the same time, the mth high-frequency characteristic diagram +.>After being processed by an average pooling layer, the processed data are input into a second convolution layer for processing to obtain an mth high-frequency to low-frequency characteristic diagram +.>
The mth low-frequency characteristic diagramAfter the third convolution layer treatment, the m low-frequency to low-frequency characteristic diagram is obtainedAt the same time, the mth low-frequency characteristic diagram +.>Sequentially inputting into a fourth convolution layer and an up-sampling layer for processing to obtain an mth low-frequency to high-frequency characteristic diagram ++>
Will beAnd->After fusion, the obtained product is input into a first example normalization layer, and an mth high-frequency characteristic diagram is output
Will beAnd->After fusion, the signals are input into a normalization layer of a second example, and an mth low-frequency characteristic diagram is output +.>
Step 2.3.3, the post-processing block in the mth octave convolution layer consists of two convolution layers, an up-sampling layer and an example normalization layer;
the mth high-frequency characteristic diagramAfter passing through a convolution layer of the post-processing block, the mth high-frequency to high-frequency characteristic diagram is obtained>
The mth low-frequency characteristic diagramAfter passing through another convolution layer of the post-processing block, the result is input into an up-sampling layer to obtain the m-th low-frequency to high-frequency characteristic diagram +.>
And->After fusion, the mth feature map ++is obtained after the treatment of an example normalization layer>
Mth feature mapInputting the m-th octave convolution layer to the corresponding maximum pooling for processing to obtain an m-th effective feature map F m The method comprises the steps of carrying out a first treatment on the surface of the Thereby obtaining M effective feature maps F from the feature maps of M different channel numbers, polarization angles, heights and widths after the processing of the space redundancy reduction module SSR 1 ,F 2 ,…,F m ,…,F M
3. The polarized image defogging method based on 3D convolution according to claim 2, wherein the POL decoder in step 2.4 is composed of M deconvolution layers;
when m=1, 2, …, M-1, each deconvolution layer comprises one 2D convolution layer and one example normalization layer; when m=m, the deconvolution layer includes only one 2D convolution layer;
when m=m, the mth feature map F M Processing by using bilinear interpolation function, inputting into 1 st deconvolution layer, and outputting 1 st deconvolution layer characteristic diagram F' 1
When m=m-1, M-2, …,2, the mth feature map F m With the M-M deconvolution layer feature map F' M-m After fusion, the fused feature images are processed by using a bilinear interpolation function and then are input into an M-m+1th deconvolution layer, and an M-m+1th deconvolution layer feature image F 'is output' M-m+1
When m=1, the Mth deconvolution layer processes the 1 st feature map to obtain an Mth deconvolution layer feature map F' M And the final defogging prediction graph is obtained.
4. An electronic device comprising a memory and a processor, wherein the memory is configured to store a program for supporting the processor to perform the polarized image defogging method of any of claims 1-3, the processor being configured to execute the program stored in the memory.
5. A computer readable storage medium having stored thereon a computer program, characterized in that the computer program when executed by a processor performs the steps of the polarized image defogging method of any of the claims 1-3.
CN202310390770.0A 2023-04-07 2023-04-07 Polarized image defogging method based on 3D convolution Pending CN116452450A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310390770.0A CN116452450A (en) 2023-04-07 2023-04-07 Polarized image defogging method based on 3D convolution

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310390770.0A CN116452450A (en) 2023-04-07 2023-04-07 Polarized image defogging method based on 3D convolution

Publications (1)

Publication Number Publication Date
CN116452450A true CN116452450A (en) 2023-07-18

Family

ID=87121490

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310390770.0A Pending CN116452450A (en) 2023-04-07 2023-04-07 Polarized image defogging method based on 3D convolution

Country Status (1)

Country Link
CN (1) CN116452450A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117911282A (en) * 2024-03-19 2024-04-19 华中科技大学 Construction method and application of image defogging model

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117911282A (en) * 2024-03-19 2024-04-19 华中科技大学 Construction method and application of image defogging model
CN117911282B (en) * 2024-03-19 2024-05-28 华中科技大学 Construction method and application of image defogging model

Similar Documents

Publication Publication Date Title
CN112184577B (en) Single image defogging method based on multiscale self-attention generation countermeasure network
CN110517203B (en) Defogging method based on reference image reconstruction
CN111709888B (en) Aerial image defogging method based on improved generation countermeasure network
CN110378849B (en) Image defogging and rain removing method based on depth residual error network
CN110288550B (en) Single-image defogging method for generating countermeasure network based on priori knowledge guiding condition
Hsu et al. Single image dehazing using wavelet-based haze-lines and denoising
CN111861939B (en) Single image defogging method based on unsupervised learning
CN110675340A (en) Single image defogging method and medium based on improved non-local prior
CN111986108A (en) Complex sea-air scene image defogging method based on generation countermeasure network
CN112419163B (en) Single image weak supervision defogging method based on priori knowledge and deep learning
CN113284061B (en) Underwater image enhancement method based on gradient network
CN112070688A (en) Single image defogging method for generating countermeasure network based on context guidance
Bansal et al. A review of image restoration based image defogging algorithms
CN116452450A (en) Polarized image defogging method based on 3D convolution
Hsu et al. Object detection using structure-preserving wavelet pyramid reflection removal network
CN115631223A (en) Multi-view stereo reconstruction method based on self-adaptive learning and aggregation
Zhao et al. A multi-scale U-shaped attention network-based GAN method for single image dehazing
WO2024178979A1 (en) Single-image defogging method based on detail restoration
Babu et al. An efficient image dahazing using Googlenet based convolution neural networks
CN117853370A (en) Underwater low-light image enhancement method and device based on polarization perception
CN117011181A (en) Classification-guided unmanned aerial vehicle imaging dense fog removal method
CN117036182A (en) Defogging method and system for single image
CN116757949A (en) Atmosphere-ocean scattering environment degradation image restoration method and system
CN116703750A (en) Image defogging method and system based on edge attention and multi-order differential loss
CN113724156B (en) Anti-network defogging method and system combining generation of atmospheric scattering model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination