CN115937048A - Illumination controllable defogging method based on non-supervision layer embedding and vision conversion model - Google Patents

Illumination controllable defogging method based on non-supervision layer embedding and vision conversion model Download PDF

Info

Publication number
CN115937048A
CN115937048A CN202310059502.0A CN202310059502A CN115937048A CN 115937048 A CN115937048 A CN 115937048A CN 202310059502 A CN202310059502 A CN 202310059502A CN 115937048 A CN115937048 A CN 115937048A
Authority
CN
China
Prior art keywords
network
defogging
controllable
illumination
fog
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310059502.0A
Other languages
Chinese (zh)
Inventor
丛晓峰
桂杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Southeast University
Original Assignee
Southeast University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Southeast University filed Critical Southeast University
Priority to CN202310059502.0A priority Critical patent/CN115937048A/en
Publication of CN115937048A publication Critical patent/CN115937048A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02BCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO BUILDINGS, e.g. HOUSING, HOUSE APPLIANCES OR RELATED END-USER APPLICATIONS
    • Y02B20/00Energy efficient lighting technologies, e.g. halogen lamps or gas discharge lamps
    • Y02B20/40Control techniques providing energy savings, e.g. smart controller or presence detection

Landscapes

  • Image Processing (AREA)

Abstract

The invention discloses an illumination controllable defogging method based on an unsupervised layer embedding and vision conversion model. The defogging method comprises four modules, namely an illumination controllable defogging network, a defogging judging network, a defogging synthesizing network and a defogging synthesizing judging network, wherein the illumination controllable defogging network after training can obtain a high-quality defogged image. The illumination controllable defogging network and the fog synthesizing network are composed of a multi-head self-attention module based on a window; the defogging discrimination network and the fog synthesis discrimination network are composed of convolution modules based on residual error link; constructing an illumination controllable module according to the visual layer conversion model; in the training process, dark channel prior is used as the guidance of the illumination controllable defogging network; the training process of the four components adopts an unsupervised joint training mode, and network parameters are updated by fusing prior loss, image reconstruction loss and discrimination loss. The invention can be used for traffic safety, information safety, photography and intelligent robots.

Description

Illumination controllable defogging method based on non-supervision layer embedding and vision conversion model
Technical Field
The invention relates to an illumination controllable defogging technology based on a non-supervision layer embedding and vision conversion model, and belongs to the technical field of computer vision and image processing.
Background
The image defogging task is a hot research problem in the fields of computer vision and image processing, and is widely concerned by related researchers. Under the influence of fog, the quality of an image can be degraded, the visibility of objects in a scene can be reduced, and the image can show a visual fuzzy effect; in addition, the color of the image may also show different degrees of shift according to the fog density. On the one hand, the presence of fog can negatively affect the photographic activity of human beings, resulting in the photographed image not meeting aesthetic requirements; on the other hand, fog can bring adverse factors to production and social activities, such as reduction of target detection precision in an automatic driving process, influence on the definition of vehicles and pedestrians shot by road traffic monitoring, and the like. The image defogging algorithm aims to eliminate fog in the foggy image, so that the overall quality of the foggy image can be improved, and the visual definition of the image is enhanced.
Research results in the field of image defogging have been developed primarily, and through inputting a foggy image collected by a camera into an image defogging algorithm, the foggy image can be removed primarily, and the definition of an object in the image is improved to a certain extent, but the problems of insufficient color restoration and detail blurring still exist. Wang et al propose an unsupervised defogging method using Spectral normalization named "SNSPGAN" [ Wang, yongzhen, et al. Cycle-SNSPGAN: towards Real-World Image Dehazing via Cycle Spectral Estimation method batch GAN. IEEE Transactions on Intelligent Transportation systems.2022.23:20368-20382], however, the defogged Image color and detail recovery effect obtained by the method are insufficient. The existing image defogging technology has three problems which need to be studied deeply. Firstly, the image defogging model constructed by the convolutional neural network can be used for image feature extraction and defogged image reconstruction, and is trained in an end-to-end mode, but the model constructed by the convolutional neural network lacks a model for the correlation of image internal information, so that the design of the defogging model by using a window-based local attention mechanism is a challenging problem with important research significance. Secondly, the defogging algorithm based on deep learning lacks consideration of illumination information in the defogging process, so that the brightness information of the defogged image is inaccurate, and the brightness effect of the defogged image is inconsistent with that of a real clear image; third, existing unsupervised defogging algorithms lack the use of a priori knowledge in reducing the data dependency of the model.
As image data plays an increasingly important role in social life, high-quality image data has been an important demand for technical development. Therefore, designing a high-quality defogging technology for a foggy image is a problem to be solved urgently by researchers in the field.
Disclosure of Invention
In order to solve the problem of insufficient color and detail restoration of a defogged image obtained by an image defogging model, the invention discloses an illumination controllable defogging method based on an unsupervised layer embedding and vision conversion model.
In order to achieve the purpose, the invention adopts the following technical scheme:
in a first aspect, a core network required by an image defogging method is constructed, and the steps are as follows:
step S1: and constructing core network parameters and initializing the network parameters. The designed image defogging method comprises four core networks, namely an illumination controllable defogging network, a fog synthesis network, a defogging judgment network and a fog synthesis judgment network, wherein the illumination controllable defogging network is the defogging network used in practical application, and the other three networks are only used for assisting the training of the illumination controllable defogging network and do not need to be used in the practical application process. The illumination controllable defogging network is formed by a Vision transform Block, comprises two processes of feature coding and decoding reconstruction, obtains feature output containing two branches in the decoding process, fuses the decoding features of the two branches in a visual layer embedding (Retinex Embedded) mode, and obtains defogged images under different illumination conditions by controlling the decoding feature weights of the branches; the fog synthesizing network is a single branch coding and decoding network formed by a visual conversion module, the input of the fog synthesizing network is a fog-free image, and the output of the fog synthesizing network is a fog image; the defogging discrimination network and the fog synthesis discrimination network are used for guiding the training process of the illumination controllable defogging network and the fog synthesis network.
In a second aspect, an illumination controllable module is constructed by the steps of:
step S2: and the illumination controllable defogging network controls the decoding characteristic weight in the inference stage through the illumination controllable module, so as to obtain defogged images with different characteristic weight proportions. The illumination controllable defogging network comprises an encoder and two decoders, wherein the encoder is phi, and the decoders are respectively
Figure BDA0004061015420000031
And/or>
Figure BDA0004061015420000032
For input data x, the encoder output characteristic is phi (x), and the decoding characteristics obtained by the two decoders are o d (x) And o r (x) Wherein o is d (x) Representing a direct output, calculated as follows:
Figure BDA0004061015420000033
o r (x) Representing the embedded output, for o r (x) The calculation of (2) uses a view layer (Retinex) model, and the following calculation is carried out by adopting element-by-element multiplication':
Figure BDA0004061015420000041
wherein
Figure BDA0004061015420000042
Are features obtained through a deep network.
And step S3: obtaining o d (x) And o r (x) Then, linear fusion and nonlinear mapping are required, and the linear fusion process is as follows:
o f (x)=α×o d (x)+(1-α)×o r (x)
in the above formula, α represents a balance factor, o f (x) Is the output of the linear fusion stage;
the nonlinear mapping process is realized by a hyperbolic tangent function, and the final defogging output o (x) can be obtained as follows:
Figure BDA0004061015420000043
in the above formula, e represents an exponential function with a natural constant e as a base; and taking the linear fusion and the nonlinear mapping as an illumination controllable module and embedding the illumination controllable defogging module into an illumination controllable defogging network.
In the third aspect, four core networks are constructed, and the steps are as follows:
and step S4: the illumination controllable defogging network and the fog synthesizing network both comprise two characteristic diagram dimension reducing processes and two characteristic diagram dimension increasing processes. The dimension of an input image and an output image of the illumination controllable defogging network and the fog synthesizing network is H multiplied by W multiplied by 3, wherein H represents the height of the image, W represents the width of the image, 3 represents that the image comprises 3 channels, the number of basic characteristic channels is set to be L in the characteristic calculation process, and the value of L is 64. The operations required by both the illumination controllable defogging network and the fog synthesizing network comprise blocking (Patch Partition), visual feature conversion, block embedding (Patch embedded) and UpSampling (UpSampling). The partitioning and the block embedding are realized by convolution operation, the up-sampling is realized by deconvolution, and the visual characteristic conversion module is realized by a multi-head self-attention mechanism, layer normalization and a multilayer perceptron in a window mode. The calculation flow of the illumination controllable defogging network and the fog synthesizing network is as follows.
Step S5: for the illumination controllable defogging network, the encoding process comprises 4 stages, which are respectively:
stage 1: the output feature dimension is H multiplied by W multiplied by L;
and (2) stage: block embedding layer and visual feature conversion layer, output feature dimension of
Figure BDA0004061015420000051
And (3) stage: block embedding layer and visual feature conversion layer, output feature dimension of
Figure BDA0004061015420000052
And (4) stage: visual feature translation layer
Figure BDA0004061015420000053
Step S6: the decoding process includes decoding calculation of two decoders, both of which perform stage 5 and stage 6 calculation, wherein the network parameters of stage 5 are shared by the two decoders, and are respectively:
and (5) stage: an up-sampling layer and a visual feature conversion layer with output feature dimensions of
Figure BDA0004061015420000054
And (6) stage: the output feature dimension of the up-sampling layer and the visual feature conversion layer is H multiplied by W multiplied by L.
Step S7: for the illumination controllable defogging network, after 6 stages are completed, mapping the feature map from dimension H multiplied by W multiplied by L to dimension H multiplied by W multiplied by 3 through block projection (Patch Project), and obtaining final defogging output through an illumination controllable module; the fog synthesis network only comprises an encoder and a decoder, the encoder comprises the calculation of stages 1 to 4, the decoder comprises the calculation of stages 5 and 6, the input and output characteristic dimension of each stage of the fog synthesis network is consistent with the input and output characteristic dimension of each stage of the illumination controllable defogging network, and after the calculation of 6 stages is completed, the fog synthesis network directly maps the characteristic diagram with dimension H multiplied by W multiplied by L to the output image with dimension H multiplied by W multiplied by 3 by adopting block projection.
Step S8: the defogging discrimination network and the fog synthesis discrimination network have the same structure, and for an input image with dimension H multiplied by W multiplied by 3, firstly, feature mapping is carried out through convolution kernels with length and width of 3 to obtain a feature map with dimension H multiplied by W multiplied by L, and then discrimination output is obtained through convolution with residual connection, batch normalization and activation operation.
In a fourth aspect, a priori feature constraint loss is calculated by the steps of:
step S9: tong (Chinese character of 'tong')Pseudo labels which can represent the characteristics of the clear image are calculated in a dark channel in a priori mode, and the training process of the illumination controllable defogging network is guided by calculating characteristic loss at a characteristic level. For a given input image x, a pseudo label obtained by dark channel prior calculation is theta (x), the output o (x) of the illumination controllable defogging network is constrained by adopting a prior statistical rule, and the constraint is corresponding to a prior characteristic constraint loss function L θ Comprises the following steps:
L θ (θ(x),o(x))=γ||ψ(θ(x))-ψ(o(x))|| 2
in the above formula, the function psi represents a pre-trained feature extraction network for estimating the distance between the defogged image o (x) and the pseudo label theta (x) feature and calculating the loss function L θ The parameters of the pre-trained feature extraction network are kept fixed in the process of (2), and the parameters of the pre-trained feature extraction network are not updated after back propagation. Loss function L in the training process of the illumination controllable defogging network θ The weight gamma of the training data is dynamically updated, and the weight gamma is attenuated to be 0.9 times of the previous period every other complete training period.
In a fifth aspect, network loss is calculated and parameters are updated by the steps of:
step S10: the image defogging model is trained in an unsupervised mode, the illumination controllable defogging network and the defogging distinguishing network are used for defogging training, the fog synthesis network and the fog synthesis distinguishing network are used for fog synthesis training, and the defogging training and the fog synthesis training are restrained by adopting cycle consistent loss. G is adopted as the illumination controllable defogging network and the fog synthesizing network J And G I Indicating that the mist-removing discriminating network and the mist synthesizing discriminating network adopt D J And D I It is shown that in the sampling process, the fog image x is randomly sampled from the fog image domain I, and the fog-free image y is randomly sampled from the fog-free image domain J. The training loss of the image defogging model comprises antagonism loss, reconstruction constraint loss and prior characteristic constraint loss function L θ Wherein the reconstruction constraint penalty comprises a round robin consistency penalty and an identity penalty. The resistance loss was calculated as follows:
Figure BDA0004061015420000071
Figure BDA0004061015420000072
in the above formula p x And p y Representing the distribution function of x and y, respectively, E represents the mathematical expectation, L (G) J ,D J ) And L (G) I ,D I ) All are antagonistic losses;
step S11: the generated image is further constrained using a cyclic consistency penalty and an identity penalty as follows:
Figure BDA0004061015420000073
in the above formula L r (G I ,G J ) Representing a reconstruction constraint penalty.
Step S12: the overall loss function is an antagonism loss, a reconstruction constraint loss and a priori characteristic constraint loss function L θ Weighted summation of (c):
L all =L(G J ,D J )+L(G I ,D I )+L r (G I ,G J )+λL θ
in the above formula, λ is a balance factor.
And after the calculation of the loss function is completed, updating the parameters of the four networks through a gradient descent algorithm.
The invention can effectively remove the fog contained in the foggy image, and can restore the color and the details, and the clear defogged image can be obtained in an end-to-end mode in practical application. The invention can be used for data security, image processing, weather monitoring and robots.
Drawings
FIG. 1 is a diagram of an illumination controlled defogging network;
FIG. 2 is a diagram of a mist synthesis network;
fig. 3 is a graph comparing the defogging results of the indoor images.
Fig. 4 is a comparison graph of defogging results of outdoor images.
Detailed Description
The present invention will be further illustrated with reference to the accompanying drawings and specific embodiments, which are to be understood as merely illustrative of the invention and not as limiting the scope of the invention. It should be noted that the terms "front," "back," "left," "right," "upper" and "lower" used in the following description refer to directions in the drawings, and the terms "inner" and "outer" refer to directions toward and away from, respectively, the geometric center of a particular component.
The illumination controllable defogging method based on the non-supervision visual layer embedding and visual conversion model does not need to use supervision loss in the training process, so that paired foggy and fogless pictures corresponding to scenes are not needed. The training process uses indoor fog data set ITS [ Li Boyi, et al, benchmark single-Image smoothing and beyond, IEEE Transactions on Image Processing,2018,28 (1): 492-505 ] and outdoor fog data set 4KID Zhuoran smoothing, et al, ultra-high-definition Image smoothing, IEEE Conference on Computer Vision and Pattern recognition, 2021:16180-16189 ], respectively, and the specific implementation process comprises the following steps.
In a first aspect, a core network required by an image defogging method is constructed, and the steps are as follows:
step S1: and constructing core network parameters and initializing the network parameters. The designed image defogging method comprises four core networks, namely an illumination controllable defogging network, a fog synthesis network, a defogging judgment network and a fog synthesis judgment network, wherein the illumination controllable defogging network is the defogging network used in practical application, and the other three networks are only used for assisting the training of the illumination controllable defogging network and do not need to be used in the practical application process. And generating initialization parameters of the network by adopting Gaussian distribution. The illumination controllable defogging network is formed by a Vision transform Block, comprises two processes of feature coding and decoding reconstruction, obtains feature output containing two branches in the decoding process, fuses the decoding features of the two branches in a visual layer embedding (Retinex Embedded) mode, and obtains defogged images under different illumination conditions by controlling the decoding feature weights of the branches; the fog synthesizing network is a single branch coding and decoding network formed by a visual conversion module, the input of the fog synthesizing network is a fog-free image, and the output of the fog synthesizing network is a fog image; the defogging discrimination network and the fog synthesis discrimination network are used for guiding the training process of the illumination controllable defogging network and the fog synthesis network.
In a second aspect, an illumination controllable module is constructed and added to an illumination controllable defogging model, and the steps are as follows:
step S2: the illumination controllable defogging network controls the decoding characteristic weight in the inference stage through the illumination controllable module, so as to obtain defogged images with different characteristic weight proportions. The illumination controllable defogging network comprises an encoder and two decoders, wherein the encoder is phi, and the decoders are respectively
Figure BDA0004061015420000091
And/or>
Figure BDA0004061015420000092
For input data x, the encoder output characteristic is phi (x), and the decoding characteristics obtained by the two decoders are o d (x) And o r (x) Wherein o is d (x) Representing a direct output, calculated as follows:
Figure BDA0004061015420000093
o r (x) Representing the embedded output, for o r (x) The calculation of (2) is carried out by using a view layer (Retinex) model and adopting a way of element-by-element multiplication "+" to carry out the following calculation:
Figure BDA0004061015420000101
wherein
Figure BDA0004061015420000102
Are features obtained through a deep network.
And step S3: obtaining o d (x) And o r (x) Then, linear fusion and nonlinear mapping are needed, and the linear fusion process is as follows:
o f (x)=α×o d (x)+(1-α)×o r (x)
the nonlinear mapping process is realized by adopting a hyperbolic tangent function, and the final defogging output o (x) can be obtained as follows:
Figure BDA0004061015420000103
the linear fusion and the nonlinear mapping are used as an illumination controllable module and are embedded into an illumination controllable defogging network, and the illumination controllable module is shown as an output end of a graph 1 (an illumination controllable defogging network structure graph).
In the third aspect, four core networks are constructed, and the steps are as follows:
and step S4: the illumination controllable defogging network and the fog synthesizing network both comprise a characteristic diagram dimension reduction process twice and a characteristic diagram dimension increase process twice. The dimension of an input image and an output image of the illumination controllable defogging network and the fog synthesizing network is H multiplied by W multiplied by 3, wherein H represents the height of the image, W represents the width of the image, 3 represents that the image comprises 3 channels, the number of basic characteristic channels is set to be L in the characteristic calculation process, and the value of L is 64. The operations required by both the illumination controllable defogging network and the fog synthesis network comprise partitioning (Patch Partition), visual feature conversion, block embedding (Patch Embedded) and UpSampling (UpSampling). The partitioning and the block embedding are realized by convolution operation, the up-sampling is realized by deconvolution, and the visual characteristic conversion module is realized by a multi-head self-attention mechanism, layer normalization and a multilayer perceptron in a window mode. The calculation flow of the illumination controllable defogging network and the fog synthesis network is as follows.
Step S5: for the illumination controllable defogging network, the encoding process comprises 4 stages, which are respectively:
stage 1: the output feature dimension is H multiplied by W multiplied by L;
and (2) stage: block embedding layer and visual feature conversion layer, output feature dimension of
Figure BDA0004061015420000111
And (3) stage: block embedding layer and visual feature conversion layer, output feature dimension of
Figure BDA0004061015420000112
And (4) stage: visual feature translation layer
Figure BDA0004061015420000113
Step S6: the decoding process includes decoding calculations of two decoders, both performing stage 5 and stage 6 calculations, where the network parameters of stage 5 are shared by the two decoders, respectively:
and (5) stage: an up-sampling layer and a visual feature conversion layer, the output feature dimension is
Figure BDA0004061015420000114
And 6: the output feature dimension of the up-sampling layer and the visual feature conversion layer is H multiplied by W multiplied by L.
Step S7: for the illumination controllable defogging network, after 6 stages are completed, mapping the feature map from dimension H multiplied by W multiplied by L to dimension H multiplied by W multiplied by 3 through block projection (Patch Project), and obtaining final defogging output through an illumination controllable module; the fog synthesis network only comprises an encoder and a decoder, the encoder comprises calculation in stages 1 to 4, the decoder comprises calculation in stages 5 and 6, the input and output characteristic dimension of each stage of the fog synthesis network is consistent with the input and output characteristic dimension of each stage of the illumination controllable defogging network, and after the calculation in 6 stages is completed, the fog synthesis network directly maps the characteristic diagram with dimension H multiplied by W multiplied by L to the output image with dimension H multiplied by W multiplied by 3 by adopting block projection. The structure of the illumination controllable defogging network is shown in figure 1, and the structure of the fog synthesizing network is shown in figure 2.
Step S8: the defogging discrimination network and the fog synthesis discrimination network have the same structure, for an input image with dimension H multiplied by W multiplied by 3, firstly, feature mapping is carried out through a convolution kernel with length and width of 3 to obtain a feature map with dimension H multiplied by W multiplied by L, then, discrimination output is obtained through convolution with residual connection, batch normalization and activation operation (ReLU), dimension reduction of the feature map is carried out by using convolution with step length of 2 behind each residual block (convolution + batch normalization + activation function), and the defogging discrimination network and the fog synthesis discrimination network share three residual blocks and convolution with step length of 2.
In a fourth aspect, a priori feature constraint loss is calculated by the steps of:
step S9: pseudo labels which can represent the characteristics of the clear images are calculated in a priori mode through the dark channels, and the training process of the illumination controllable defogging network is guided through the characteristic loss calculation at the characteristic level. For a given input image x, a pseudo label obtained by dark channel prior calculation is theta (x), the output o (x) of the illumination controllable defogging network is constrained by adopting a prior statistical rule, and the constraint is corresponding to a prior characteristic constraint loss function L θ Comprises the following steps:
L θ (θ(x),o(x))=γ||ψ(θ(x))-ψ(o(x))|| 2
in the above formula, the function psi represents a pre-trained feature extraction network for estimating the distance between the defogged image o (x) and the pseudo label theta (x) feature and calculating the loss function L θ The parameters of the pre-trained feature extraction network are kept fixed in the process of (2), and the parameters of the pre-trained feature extraction network are not updated after back propagation. Loss function L in the training process of the illumination controllable defogging network θ The weight gamma of (a) is dynamically updated, and the weight gamma is attenuated to 0.9 times of the previous period every other complete training period.
In a fifth aspect, network loss is calculated and parameters are updated by the steps of:
step S10: the image defogging model is trained in an unsupervised mode, and the light illumination controllable defogging network and the defogging discrimination network are usedThe method is used for defogging training, the fog synthesis network and the fog synthesis discrimination network are used for fog synthesis training, and the defogging training and the fog synthesis training are restrained by adopting cycle consistent loss. G is adopted as the illumination controllable defogging network and the fog synthesizing network J And G I Indicating that the mist-removing discriminating network and the mist synthesizing discriminating network adopt D J And D I It is shown that in the sampling process, the fog image x is randomly sampled from the fog image domain I, and the fog-free image y is randomly sampled from the fog-free image domain J. The training loss of the image defogging model comprises antagonism loss, reconstruction constraint loss and prior characteristic constraint loss function L θ Wherein the reconstruction constraint penalty comprises a round robin consistency penalty and an identity penalty. The resistance loss was calculated as follows:
Figure BDA0004061015420000131
Figure BDA0004061015420000132
step S11: the generated image is further constrained using a cyclic consistency penalty and an identity penalty as follows:
Figure BDA0004061015420000133
step S12: the overall loss function is an antagonism loss, a reconstruction constraint loss and a priori characteristic constraint loss function L θ Weighted summation of (c):
L all =L(G J ,D J )+L(G I ,D I )+L r (G I ,G J )+λL θ
and after the calculation of the loss function is completed, updating the parameters of the four networks through a gradient descent algorithm.
Two quantitative evaluation indexes are adopted for the performance evaluation of the defogging algorithm, wherein the first index is Peak Signal-to-Noise Ratio (PSNR), and the second index is Structural Similarity (SSIM), and the larger the two indexes are, the better the defogging effect is. The evaluation process contrasts three existing Image defogging methods, cycleDehaze [ Engin Deniz, et al. Cycle-haze: enhanced cycle for single Image haze. IEEE Conference on Computer Vision and Pattern Recognition Workshops.2018:825-833 ], ZID [ Li Boyun, et al. Zero-shot Image haze. IEEE Transactions on Image Processing,2020,29 8457-8466 ] and SNSPGAN. The results in tables 1 and 2 show that the quantitative evaluation results obtained with the defogging method proposed by the present invention are higher. The defogging visual effects of fig. 3 and 4 show that the defogging method provided by the invention can obtain better visual evaluation results, and the defogged image obtained by the invention is closer to the reference image. The visual result and the quantitative result both prove the effectiveness of the defogging network designed by the invention.
Table 1: results of indoor data defogging quantitative evaluation
Index (I) SSIM PSNR
CycleDehaze 0.810 18.870
ZID 0.835 19.830
SNSPGAN 0.788 17.747
The invention 0.894 22.558
Table 2: results of quantitative evaluation of defogging of outdoor data sets
Index (I) SSIM PSNR
CycleDehaze 0.886 21.067
ZID 0.506 12.499
SNSPGAN 0.714 14.045
The invention 0.929 24.562
The technical means disclosed in the invention scheme are not limited to the technical means disclosed in the above embodiments, but also include the technical scheme formed by any combination of the above technical features.

Claims (7)

1. A light controllable defogging method based on a non-supervision layer embedding and vision conversion model is characterized in that the designed image defogging method comprises four core networks, namely a light controllable defogging network, a fog synthesis network, a defogging judgment network and a fog synthesis judgment network, wherein the light controllable defogging network is a defogging network used in practical application, and the other three networks are only used for assisting the training of the light controllable defogging network and do not need to be used in the practical application process; the illumination controllable defogging network is formed by a visual conversion module and comprises two processes of feature coding and decoding reconstruction, wherein feature output containing two branches is obtained in the decoding process, decoding features of the two branches are fused in a visual layer embedding mode, and defogged images under different illumination conditions are obtained by controlling the decoding feature weights of the branches; the fog synthesizing network is a single-branch coding and decoding network formed by a visual conversion module, and the input of the fog synthesizing network is a fog-free image and the output of the fog synthesizing network is a fog image; the defogging discrimination network and the fog synthesis discrimination network are used for guiding the training process of the illumination controllable defogging network and the fog synthesis network.
2. The illumination-controllable defogging method based on the non-supervision layer embedding and vision conversion model is characterized in that the illumination-controllable defogging network controls the decoding feature weight of the inference stage through an illumination-controllable module so as to obtain defogged images with different feature weight ratios; the illumination controllable defogging network comprises an encoder and two decoders, wherein the encoder is phi, and the decoders are respectively phi
Figure FDA0004061015400000011
And/or>
Figure FDA0004061015400000012
For input data x, the encoder output characteristic is phi (x), and the decoding characteristics obtained by the two decoders are o d (x) And o r (x) Wherein o is d (x) Representing a direct output, calculated as follows:
Figure FDA0004061015400000013
o r (x) Representing the visual layer embedding output, for o r (x) The calculation of (2) uses a view layer model, and the following calculation is carried out by adopting element-by-element multiplication '×':
Figure FDA0004061015400000021
wherein
Figure FDA0004061015400000022
Is a feature obtained through a deep network; obtaining o d (x) And o r (x) Then, linear fusion and nonlinear mapping are needed, and the linear fusion process is as follows:
o f (x)=α×o d (x)+(1-α)×o r (x)
in the above formula, α represents a balance factor, o f (x) Is the output of the linear fusion stage; the nonlinear mapping process is realized by adopting a hyperbolic tangent function, and the final defogging output o (x) can be obtained as follows:
Figure FDA0004061015400000023
in the above formula, e represents an exponential function with a natural constant e as a base; and taking the linear fusion and the nonlinear mapping as an illumination controllable module and embedding the illumination controllable defogging module into an illumination controllable defogging network.
3. The illumination-controllable defogging method based on the non-supervision layer embedding and vision conversion model is characterized in that the illumination-controllable defogging network and the fog synthesizing network respectively comprise two characteristic diagram dimension reduction processes and two characteristic diagram dimension increasing processes; the dimension of an input image and an output image of the illumination controllable defogging network and the fog synthesizing network is H multiplied by W multiplied by 3, wherein H represents the height of the image, W represents the width of the image, 3 represents that the image comprises 3 channels, the number of basic characteristic channels is set to be L in the characteristic calculation process, and the value of L is 64; the operations required by the illumination controllable defogging network and the fog synthesizing network comprise blocking, visual feature conversion, block embedding and upsampling; the partitioning and the block embedding are realized by convolution operation, the up-sampling is realized by deconvolution, and the visual characteristic conversion module is realized by a multi-head self-attention mechanism, layer normalization and a multilayer perceptron in a window mode.
4. The illumination-controllable defogging method based on the non-supervision layer embedding and vision conversion model as claimed in claim 3, wherein the calculation flow of the illumination-controllable defogging network and the fog synthesis network is as follows:
first, for the illumination-controllable defogging network, the encoding process includes 4 stages, which are respectively: stage 1: the output feature dimension is H multiplied by W multiplied by L; and (2) stage: a block embedding layer and a visual feature conversion layer with output feature dimensions of
Figure FDA0004061015400000031
And (3) stage: block embedding layer and visual feature conversion layer, output feature dimension of
Figure FDA0004061015400000032
And (4) stage: visual feature translation layer
Figure FDA0004061015400000033
The decoding process includes decoding calculation of two decoders, both of which perform stage 5 and stage 6 calculation, wherein the network parameters of stage 5 are shared by the two decoders, and are respectively:
and (5) stage: upsampling layer and visual feature conversion layer, output feature dimensionIs composed of
Figure FDA0004061015400000034
And 6: the output feature dimension of the up-sampling layer and the visual feature conversion layer is H multiplied by W multiplied by L.
Secondly, after 6 stages of the illumination controllable defogging network are completed, mapping a feature graph from dimension H multiplied by W multiplied by L to dimension H multiplied by W multiplied by 3 through block projection, and obtaining final defogging output through an illumination controllable module; the fog synthesis network only comprises an encoder and a decoder, the encoder comprises calculation in stages 1 to 4, the decoder comprises calculation in stages 5 and 6, the input and output characteristic dimension of each stage of the fog synthesis network is consistent with the input and output characteristic dimension of each stage of the illumination controllable defogging network, and after the calculation in 6 stages is completed, the fog synthesis network directly maps the characteristic diagram with dimension H multiplied by W multiplied by L to the output image with dimension H multiplied by W multiplied by 3 by adopting block projection.
5. The illumination-controllable defogging method based on the non-monitoring layer embedding and vision conversion model is characterized in that the defogging discrimination network and the fog synthesis discrimination network have the same structure, and for an input image with the dimension H multiplied by W multiplied by 3, firstly, feature mapping is carried out through a convolution kernel with the length and the width being 3, a feature map with the dimension H multiplied by W multiplied by L is obtained, and then, discrimination output is obtained through convolution with residual connection, batch normalization and activation operation.
6. The illumination-controllable defogging method based on the non-supervision layer embedding and vision conversion model is characterized in that a pseudo label which can represent the characteristics of a clear image is calculated in a dark channel in a priori manner, and the training process of the illumination-controllable defogging network is guided by calculating the characteristic loss at the characteristic level; for a given input image x, a pseudo label obtained by dark channel prior calculation is theta (x), the output o (x) of the illumination controllable defogging network is constrained by adopting a prior statistical rule, and the constraint is corresponding to a prior characteristic constraint loss function L θ Comprises the following steps:
L θ (θ(x),o(x))=γ||ψθ(x))-ψ(o(x))|| 2
in the above formula, the function psi represents a pre-trained feature extraction network for estimating the distance between the defogged image o (x) and the pseudo label theta (x) feature and calculating the loss function L θ The parameters of the pre-trained feature extraction network are kept fixed in the process of (2), and the parameters of the pre-trained feature extraction network are not updated after back propagation; loss function L in the training process of the illumination controllable defogging network θ The weight gamma of the training data is dynamically updated, and the weight gamma is attenuated to be 0.9 times of the previous period every other complete training period.
7. The method of claim 1, wherein the designed image defogging model is trained in an unsupervised manner, the illumination controllable defogging network and the defogging discrimination network are used for defogging training, the fog synthesis network and the fog synthesis discrimination network are used for fog synthesis training, and the defogging training and the fog synthesis training are constrained by cycle coincidence loss. G is adopted as the illumination controllable defogging network and the fog synthesizing network J And G I Indicating that the mist-removing discriminating network and the mist synthesizing discriminating network adopt D J And D I In the sampling process, randomly sampling a fog image x from a fog image domain I, and randomly sampling a fog-free image y from a fog-free image domain J; the training loss of the image defogging model comprises antagonism loss, reconstruction constraint loss and prior characteristic constraint loss function L θ Wherein the reconstruction constraint loss comprises a cycle consistency loss and an identity loss; the resistance loss was calculated as follows:
Figure FDA0004061015400000051
Figure FDA0004061015400000053
in the above formula p x And p y Representing the distribution function of x and y respectively,
e stands for mathematical expectation, L (G) J ,D J ) And L (G) I ,D I ) Are all antagonistic losses;
the generated image is further constrained using a cyclic consistency penalty and an identity penalty as follows:
Figure FDA0004061015400000054
in the above formula L r (G I ,G J ) Represents a reconstruction constraint penalty;
the overall loss function is an antagonism loss, a reconstruction constraint loss and a priori characteristic constraint loss function L θ Weighted summation of (2):
L all =L(G J ,D J )+L(G I ,D I )+L r (G I ,G J )+λL θ
in the above formula, lambda is a balance factor; and finally, updating the network parameters through a gradient descent algorithm.
CN202310059502.0A 2023-01-17 2023-01-17 Illumination controllable defogging method based on non-supervision layer embedding and vision conversion model Pending CN115937048A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310059502.0A CN115937048A (en) 2023-01-17 2023-01-17 Illumination controllable defogging method based on non-supervision layer embedding and vision conversion model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310059502.0A CN115937048A (en) 2023-01-17 2023-01-17 Illumination controllable defogging method based on non-supervision layer embedding and vision conversion model

Publications (1)

Publication Number Publication Date
CN115937048A true CN115937048A (en) 2023-04-07

Family

ID=86656206

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310059502.0A Pending CN115937048A (en) 2023-01-17 2023-01-17 Illumination controllable defogging method based on non-supervision layer embedding and vision conversion model

Country Status (1)

Country Link
CN (1) CN115937048A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116228608A (en) * 2023-05-10 2023-06-06 耕宇牧星(北京)空间科技有限公司 Processing network for defogging remote sensing image and defogging method for remote sensing image

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116228608A (en) * 2023-05-10 2023-06-06 耕宇牧星(北京)空间科技有限公司 Processing network for defogging remote sensing image and defogging method for remote sensing image

Similar Documents

Publication Publication Date Title
Yu et al. Underwater-GAN: Underwater image restoration via conditional generative adversarial network
CN111784602B (en) Method for generating countermeasure network for image restoration
CN111915530B (en) End-to-end-based haze concentration self-adaptive neural network image defogging method
CN110796009A (en) Method and system for detecting marine vessel based on multi-scale convolution neural network model
CN110503613B (en) Single image-oriented rain removing method based on cascade cavity convolution neural network
Hu et al. Underwater image restoration based on convolutional neural network
CN110288550B (en) Single-image defogging method for generating countermeasure network based on priori knowledge guiding condition
CN113344806A (en) Image defogging method and system based on global feature fusion attention network
CN109584188B (en) Image defogging method based on convolutional neural network
CN111476213A (en) Method and device for filling covering area of shelter based on road image
Zhang et al. CNN cloud detection algorithm based on channel and spatial attention and probabilistic upsampling for remote sensing image
CN111861925A (en) Image rain removing method based on attention mechanism and gate control circulation unit
CN110349093B (en) Single image defogging model construction and defogging method based on multi-stage hourglass structure
CN109410171A (en) A kind of target conspicuousness detection method for rainy day image
CN114943893B (en) Feature enhancement method for land coverage classification
Liang et al. An improved DualGAN for near-infrared image colorization
Ahn et al. EAGNet: Elementwise attentive gating network-based single image de-raining with rain simplification
CN111539246A (en) Cross-spectrum face recognition method and device, electronic equipment and storage medium thereof
CN115578280A (en) Construction method of double-branch remote sensing image defogging network
CN115937048A (en) Illumination controllable defogging method based on non-supervision layer embedding and vision conversion model
CN115861380A (en) End-to-end unmanned aerial vehicle visual target tracking method and device in foggy low-light scene
CN112164010A (en) Multi-scale fusion convolution neural network image defogging method
CN111598793A (en) Method and system for defogging image of power transmission line and storage medium
Li et al. An end-to-end system for unmanned aerial vehicle high-resolution remote sensing image haze removal algorithm using convolution neural network
CN114155165A (en) Image defogging method based on semi-supervision

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination