CN115937048A

CN115937048A - Illumination controllable defogging method based on non-supervision layer embedding and vision conversion model

Info

Publication number: CN115937048A
Application number: CN202310059502.0A
Authority: CN
Inventors: 丛晓峰; 桂杰
Original assignee: Southeast University
Current assignee: Southeast University
Priority date: 2023-01-17
Filing date: 2023-01-17
Publication date: 2023-04-07

Abstract

The invention discloses an illumination controllable defogging method based on an unsupervised layer embedding and vision conversion model. The defogging method comprises four modules, namely an illumination controllable defogging network, a defogging judging network, a defogging synthesizing network and a defogging synthesizing judging network, wherein the illumination controllable defogging network after training can obtain a high-quality defogged image. The illumination controllable defogging network and the fog synthesizing network are composed of a multi-head self-attention module based on a window; the defogging discrimination network and the fog synthesis discrimination network are composed of convolution modules based on residual error link; constructing an illumination controllable module according to the visual layer conversion model; in the training process, dark channel prior is used as the guidance of the illumination controllable defogging network; the training process of the four components adopts an unsupervised joint training mode, and network parameters are updated by fusing prior loss, image reconstruction loss and discrimination loss. The invention can be used for traffic safety, information safety, photography and intelligent robots.

Description

Illumination controllable defogging method based on non-supervision layer embedding and vision conversion model

Technical Field

The invention relates to an illumination controllable defogging technology based on a non-supervision layer embedding and vision conversion model, and belongs to the technical field of computer vision and image processing.

Background

The image defogging task is a hot research problem in the fields of computer vision and image processing, and is widely concerned by related researchers. Under the influence of fog, the quality of an image can be degraded, the visibility of objects in a scene can be reduced, and the image can show a visual fuzzy effect; in addition, the color of the image may also show different degrees of shift according to the fog density. On the one hand, the presence of fog can negatively affect the photographic activity of human beings, resulting in the photographed image not meeting aesthetic requirements; on the other hand, fog can bring adverse factors to production and social activities, such as reduction of target detection precision in an automatic driving process, influence on the definition of vehicles and pedestrians shot by road traffic monitoring, and the like. The image defogging algorithm aims to eliminate fog in the foggy image, so that the overall quality of the foggy image can be improved, and the visual definition of the image is enhanced.

Research results in the field of image defogging have been developed primarily, and through inputting a foggy image collected by a camera into an image defogging algorithm, the foggy image can be removed primarily, and the definition of an object in the image is improved to a certain extent, but the problems of insufficient color restoration and detail blurring still exist. Wang et al propose an unsupervised defogging method using Spectral normalization named "SNSPGAN" [ Wang, yongzhen, et al. Cycle-SNSPGAN: towards Real-World Image Dehazing via Cycle Spectral Estimation method batch GAN. IEEE Transactions on Intelligent Transportation systems.2022.23:20368-20382], however, the defogged Image color and detail recovery effect obtained by the method are insufficient. The existing image defogging technology has three problems which need to be studied deeply. Firstly, the image defogging model constructed by the convolutional neural network can be used for image feature extraction and defogged image reconstruction, and is trained in an end-to-end mode, but the model constructed by the convolutional neural network lacks a model for the correlation of image internal information, so that the design of the defogging model by using a window-based local attention mechanism is a challenging problem with important research significance. Secondly, the defogging algorithm based on deep learning lacks consideration of illumination information in the defogging process, so that the brightness information of the defogged image is inaccurate, and the brightness effect of the defogged image is inconsistent with that of a real clear image; third, existing unsupervised defogging algorithms lack the use of a priori knowledge in reducing the data dependency of the model.

As image data plays an increasingly important role in social life, high-quality image data has been an important demand for technical development. Therefore, designing a high-quality defogging technology for a foggy image is a problem to be solved urgently by researchers in the field.

Disclosure of Invention

In order to solve the problem of insufficient color and detail restoration of a defogged image obtained by an image defogging model, the invention discloses an illumination controllable defogging method based on an unsupervised layer embedding and vision conversion model.

In order to achieve the purpose, the invention adopts the following technical scheme:

in a first aspect, a core network required by an image defogging method is constructed, and the steps are as follows:

step S1: and constructing core network parameters and initializing the network parameters. The designed image defogging method comprises four core networks, namely an illumination controllable defogging network, a fog synthesis network, a defogging judgment network and a fog synthesis judgment network, wherein the illumination controllable defogging network is the defogging network used in practical application, and the other three networks are only used for assisting the training of the illumination controllable defogging network and do not need to be used in the practical application process. The illumination controllable defogging network is formed by a Vision transform Block, comprises two processes of feature coding and decoding reconstruction, obtains feature output containing two branches in the decoding process, fuses the decoding features of the two branches in a visual layer embedding (Retinex Embedded) mode, and obtains defogged images under different illumination conditions by controlling the decoding feature weights of the branches; the fog synthesizing network is a single branch coding and decoding network formed by a visual conversion module, the input of the fog synthesizing network is a fog-free image, and the output of the fog synthesizing network is a fog image; the defogging discrimination network and the fog synthesis discrimination network are used for guiding the training process of the illumination controllable defogging network and the fog synthesis network.

In a second aspect, an illumination controllable module is constructed by the steps of:

step S2: and the illumination controllable defogging network controls the decoding characteristic weight in the inference stage through the illumination controllable module, so as to obtain defogged images with different characteristic weight proportions. The illumination controllable defogging network comprises an encoder and two decoders, wherein the encoder is phi, and the decoders are respectively

And/or>

For input data x, the encoder output characteristic is phi (x), and the decoding characteristics obtained by the two decoders are o _d (x) And o _r (x) Wherein o is _d (x) Representing a direct output, calculated as follows:

o _r (x) Representing the embedded output, for o _r (x) The calculation of (2) uses a view layer (Retinex) model, and the following calculation is carried out by adopting element-by-element multiplication':

wherein

Are features obtained through a deep network.

And step S3: obtaining o _d (x) And o _r (x) Then, linear fusion and nonlinear mapping are required, and the linear fusion process is as follows:

o _f (x)＝α×o _d (x)+(1-α)×o _r (x)

in the above formula, α represents a balance factor, o _f (x) Is the output of the linear fusion stage;

the nonlinear mapping process is realized by a hyperbolic tangent function, and the final defogging output o (x) can be obtained as follows:

in the above formula, e represents an exponential function with a natural constant e as a base; and taking the linear fusion and the nonlinear mapping as an illumination controllable module and embedding the illumination controllable defogging module into an illumination controllable defogging network.

In the third aspect, four core networks are constructed, and the steps are as follows:

and step S4: the illumination controllable defogging network and the fog synthesizing network both comprise two characteristic diagram dimension reducing processes and two characteristic diagram dimension increasing processes. The dimension of an input image and an output image of the illumination controllable defogging network and the fog synthesizing network is H multiplied by W multiplied by 3, wherein H represents the height of the image, W represents the width of the image, 3 represents that the image comprises 3 channels, the number of basic characteristic channels is set to be L in the characteristic calculation process, and the value of L is 64. The operations required by both the illumination controllable defogging network and the fog synthesizing network comprise blocking (Patch Partition), visual feature conversion, block embedding (Patch embedded) and UpSampling (UpSampling). The partitioning and the block embedding are realized by convolution operation, the up-sampling is realized by deconvolution, and the visual characteristic conversion module is realized by a multi-head self-attention mechanism, layer normalization and a multilayer perceptron in a window mode. The calculation flow of the illumination controllable defogging network and the fog synthesizing network is as follows.

Step S5: for the illumination controllable defogging network, the encoding process comprises 4 stages, which are respectively:

stage 1: the output feature dimension is H multiplied by W multiplied by L;

and (2) stage: block embedding layer and visual feature conversion layer, output feature dimension of

And (3) stage: block embedding layer and visual feature conversion layer, output feature dimension of

And (4) stage: visual feature translation layer

Step S6: the decoding process includes decoding calculation of two decoders, both of which perform stage 5 and stage 6 calculation, wherein the network parameters of stage 5 are shared by the two decoders, and are respectively:

and (5) stage: an up-sampling layer and a visual feature conversion layer with output feature dimensions of

And (6) stage: the output feature dimension of the up-sampling layer and the visual feature conversion layer is H multiplied by W multiplied by L.

Step S7: for the illumination controllable defogging network, after 6 stages are completed, mapping the feature map from dimension H multiplied by W multiplied by L to dimension H multiplied by W multiplied by 3 through block projection (Patch Project), and obtaining final defogging output through an illumination controllable module; the fog synthesis network only comprises an encoder and a decoder, the encoder comprises the calculation of stages 1 to 4, the decoder comprises the calculation of stages 5 and 6, the input and output characteristic dimension of each stage of the fog synthesis network is consistent with the input and output characteristic dimension of each stage of the illumination controllable defogging network, and after the calculation of 6 stages is completed, the fog synthesis network directly maps the characteristic diagram with dimension H multiplied by W multiplied by L to the output image with dimension H multiplied by W multiplied by 3 by adopting block projection.

Step S8: the defogging discrimination network and the fog synthesis discrimination network have the same structure, and for an input image with dimension H multiplied by W multiplied by 3, firstly, feature mapping is carried out through convolution kernels with length and width of 3 to obtain a feature map with dimension H multiplied by W multiplied by L, and then discrimination output is obtained through convolution with residual connection, batch normalization and activation operation.

In a fourth aspect, a priori feature constraint loss is calculated by the steps of:

step S9: tong (Chinese character of 'tong')Pseudo labels which can represent the characteristics of the clear image are calculated in a dark channel in a priori mode, and the training process of the illumination controllable defogging network is guided by calculating characteristic loss at a characteristic level. For a given input image x, a pseudo label obtained by dark channel prior calculation is theta (x), the output o (x) of the illumination controllable defogging network is constrained by adopting a prior statistical rule, and the constraint is corresponding to a prior characteristic constraint loss function L _θ Comprises the following steps:

L _θ (θ(x),o(x))＝γ||ψ(θ(x))-ψ(o(x))|| ₂

in the above formula, the function psi represents a pre-trained feature extraction network for estimating the distance between the defogged image o (x) and the pseudo label theta (x) feature and calculating the loss function L _θ The parameters of the pre-trained feature extraction network are kept fixed in the process of (2), and the parameters of the pre-trained feature extraction network are not updated after back propagation. Loss function L in the training process of the illumination controllable defogging network _θ The weight gamma of the training data is dynamically updated, and the weight gamma is attenuated to be 0.9 times of the previous period every other complete training period.

In a fifth aspect, network loss is calculated and parameters are updated by the steps of:

step S10: the image defogging model is trained in an unsupervised mode, the illumination controllable defogging network and the defogging distinguishing network are used for defogging training, the fog synthesis network and the fog synthesis distinguishing network are used for fog synthesis training, and the defogging training and the fog synthesis training are restrained by adopting cycle consistent loss. G is adopted as the illumination controllable defogging network and the fog synthesizing network _J And G _I Indicating that the mist-removing discriminating network and the mist synthesizing discriminating network adopt D _J And D _I It is shown that in the sampling process, the fog image x is randomly sampled from the fog image domain I, and the fog-free image y is randomly sampled from the fog-free image domain J. The training loss of the image defogging model comprises antagonism loss, reconstruction constraint loss and prior characteristic constraint loss function L _θ Wherein the reconstruction constraint penalty comprises a round robin consistency penalty and an identity penalty. The resistance loss was calculated as follows:

in the above formula p _x And p _y Representing the distribution function of x and y, respectively, E represents the mathematical expectation, L (G) _J ,D _J ) And L (G) _I ,D _I ) All are antagonistic losses;

step S11: the generated image is further constrained using a cyclic consistency penalty and an identity penalty as follows:

in the above formula L _r (G _I ,G _J ) Representing a reconstruction constraint penalty.

Step S12: the overall loss function is an antagonism loss, a reconstruction constraint loss and a priori characteristic constraint loss function L _θ Weighted summation of (c):

L _all ＝L(G _J ,D _J )+L(G _I ,D _I )+L _r (G _I ,G _J )+λL _θ

in the above formula, λ is a balance factor.

And after the calculation of the loss function is completed, updating the parameters of the four networks through a gradient descent algorithm.

The invention can effectively remove the fog contained in the foggy image, and can restore the color and the details, and the clear defogged image can be obtained in an end-to-end mode in practical application. The invention can be used for data security, image processing, weather monitoring and robots.

Drawings

FIG. 1 is a diagram of an illumination controlled defogging network;

FIG. 2 is a diagram of a mist synthesis network;

fig. 3 is a graph comparing the defogging results of the indoor images.

Fig. 4 is a comparison graph of defogging results of outdoor images.

Detailed Description

The present invention will be further illustrated with reference to the accompanying drawings and specific embodiments, which are to be understood as merely illustrative of the invention and not as limiting the scope of the invention. It should be noted that the terms "front," "back," "left," "right," "upper" and "lower" used in the following description refer to directions in the drawings, and the terms "inner" and "outer" refer to directions toward and away from, respectively, the geometric center of a particular component.

The illumination controllable defogging method based on the non-supervision visual layer embedding and visual conversion model does not need to use supervision loss in the training process, so that paired foggy and fogless pictures corresponding to scenes are not needed. The training process uses indoor fog data set ITS [ Li Boyi, et al, benchmark single-Image smoothing and beyond, IEEE Transactions on Image Processing,2018,28 (1): 492-505 ] and outdoor fog data set 4KID Zhuoran smoothing, et al, ultra-high-definition Image smoothing, IEEE Conference on Computer Vision and Pattern recognition, 2021:16180-16189 ], respectively, and the specific implementation process comprises the following steps.

step S1: and constructing core network parameters and initializing the network parameters. The designed image defogging method comprises four core networks, namely an illumination controllable defogging network, a fog synthesis network, a defogging judgment network and a fog synthesis judgment network, wherein the illumination controllable defogging network is the defogging network used in practical application, and the other three networks are only used for assisting the training of the illumination controllable defogging network and do not need to be used in the practical application process. And generating initialization parameters of the network by adopting Gaussian distribution. The illumination controllable defogging network is formed by a Vision transform Block, comprises two processes of feature coding and decoding reconstruction, obtains feature output containing two branches in the decoding process, fuses the decoding features of the two branches in a visual layer embedding (Retinex Embedded) mode, and obtains defogged images under different illumination conditions by controlling the decoding feature weights of the branches; the fog synthesizing network is a single branch coding and decoding network formed by a visual conversion module, the input of the fog synthesizing network is a fog-free image, and the output of the fog synthesizing network is a fog image; the defogging discrimination network and the fog synthesis discrimination network are used for guiding the training process of the illumination controllable defogging network and the fog synthesis network.

In a second aspect, an illumination controllable module is constructed and added to an illumination controllable defogging model, and the steps are as follows:

step S2: the illumination controllable defogging network controls the decoding characteristic weight in the inference stage through the illumination controllable module, so as to obtain defogged images with different characteristic weight proportions. The illumination controllable defogging network comprises an encoder and two decoders, wherein the encoder is phi, and the decoders are respectively

And/or>

o _r (x) Representing the embedded output, for o _r (x) The calculation of (2) is carried out by using a view layer (Retinex) model and adopting a way of element-by-element multiplication "+" to carry out the following calculation:

wherein

Are features obtained through a deep network.

And step S3: obtaining o _d (x) And o _r (x) Then, linear fusion and nonlinear mapping are needed, and the linear fusion process is as follows:

o _f (x)＝α×o _d (x)+(1-α)×o _r (x)

the nonlinear mapping process is realized by adopting a hyperbolic tangent function, and the final defogging output o (x) can be obtained as follows:

the linear fusion and the nonlinear mapping are used as an illumination controllable module and are embedded into an illumination controllable defogging network, and the illumination controllable module is shown as an output end of a graph 1 (an illumination controllable defogging network structure graph).

and step S4: the illumination controllable defogging network and the fog synthesizing network both comprise a characteristic diagram dimension reduction process twice and a characteristic diagram dimension increase process twice. The dimension of an input image and an output image of the illumination controllable defogging network and the fog synthesizing network is H multiplied by W multiplied by 3, wherein H represents the height of the image, W represents the width of the image, 3 represents that the image comprises 3 channels, the number of basic characteristic channels is set to be L in the characteristic calculation process, and the value of L is 64. The operations required by both the illumination controllable defogging network and the fog synthesis network comprise partitioning (Patch Partition), visual feature conversion, block embedding (Patch Embedded) and UpSampling (UpSampling). The partitioning and the block embedding are realized by convolution operation, the up-sampling is realized by deconvolution, and the visual characteristic conversion module is realized by a multi-head self-attention mechanism, layer normalization and a multilayer perceptron in a window mode. The calculation flow of the illumination controllable defogging network and the fog synthesis network is as follows.

stage 1: the output feature dimension is H multiplied by W multiplied by L;

And (4) stage: visual feature translation layer

Step S6: the decoding process includes decoding calculations of two decoders, both performing stage 5 and stage 6 calculations, where the network parameters of stage 5 are shared by the two decoders, respectively:

and (5) stage: an up-sampling layer and a visual feature conversion layer, the output feature dimension is

And 6: the output feature dimension of the up-sampling layer and the visual feature conversion layer is H multiplied by W multiplied by L.

Step S7: for the illumination controllable defogging network, after 6 stages are completed, mapping the feature map from dimension H multiplied by W multiplied by L to dimension H multiplied by W multiplied by 3 through block projection (Patch Project), and obtaining final defogging output through an illumination controllable module; the fog synthesis network only comprises an encoder and a decoder, the encoder comprises calculation in stages 1 to 4, the decoder comprises calculation in stages 5 and 6, the input and output characteristic dimension of each stage of the fog synthesis network is consistent with the input and output characteristic dimension of each stage of the illumination controllable defogging network, and after the calculation in 6 stages is completed, the fog synthesis network directly maps the characteristic diagram with dimension H multiplied by W multiplied by L to the output image with dimension H multiplied by W multiplied by 3 by adopting block projection. The structure of the illumination controllable defogging network is shown in figure 1, and the structure of the fog synthesizing network is shown in figure 2.

Step S8: the defogging discrimination network and the fog synthesis discrimination network have the same structure, for an input image with dimension H multiplied by W multiplied by 3, firstly, feature mapping is carried out through a convolution kernel with length and width of 3 to obtain a feature map with dimension H multiplied by W multiplied by L, then, discrimination output is obtained through convolution with residual connection, batch normalization and activation operation (ReLU), dimension reduction of the feature map is carried out by using convolution with step length of 2 behind each residual block (convolution + batch normalization + activation function), and the defogging discrimination network and the fog synthesis discrimination network share three residual blocks and convolution with step length of 2.

step S9: pseudo labels which can represent the characteristics of the clear images are calculated in a priori mode through the dark channels, and the training process of the illumination controllable defogging network is guided through the characteristic loss calculation at the characteristic level. For a given input image x, a pseudo label obtained by dark channel prior calculation is theta (x), the output o (x) of the illumination controllable defogging network is constrained by adopting a prior statistical rule, and the constraint is corresponding to a prior characteristic constraint loss function L _θ Comprises the following steps:

L _θ (θ(x),o(x))＝γ||ψ(θ(x))-ψ(o(x))|| ₂

in the above formula, the function psi represents a pre-trained feature extraction network for estimating the distance between the defogged image o (x) and the pseudo label theta (x) feature and calculating the loss function L _θ The parameters of the pre-trained feature extraction network are kept fixed in the process of (2), and the parameters of the pre-trained feature extraction network are not updated after back propagation. Loss function L in the training process of the illumination controllable defogging network _θ The weight gamma of (a) is dynamically updated, and the weight gamma is attenuated to 0.9 times of the previous period every other complete training period.

step S10: the image defogging model is trained in an unsupervised mode, and the light illumination controllable defogging network and the defogging discrimination network are usedThe method is used for defogging training, the fog synthesis network and the fog synthesis discrimination network are used for fog synthesis training, and the defogging training and the fog synthesis training are restrained by adopting cycle consistent loss. G is adopted as the illumination controllable defogging network and the fog synthesizing network _J And G _I Indicating that the mist-removing discriminating network and the mist synthesizing discriminating network adopt D _J And D _I It is shown that in the sampling process, the fog image x is randomly sampled from the fog image domain I, and the fog-free image y is randomly sampled from the fog-free image domain J. The training loss of the image defogging model comprises antagonism loss, reconstruction constraint loss and prior characteristic constraint loss function L _θ Wherein the reconstruction constraint penalty comprises a round robin consistency penalty and an identity penalty. The resistance loss was calculated as follows:

L _all ＝L(G _J ,D _J )+L(G _I ,D _I )+L _r (G _I ,G _J )+λL _θ

Two quantitative evaluation indexes are adopted for the performance evaluation of the defogging algorithm, wherein the first index is Peak Signal-to-Noise Ratio (PSNR), and the second index is Structural Similarity (SSIM), and the larger the two indexes are, the better the defogging effect is. The evaluation process contrasts three existing Image defogging methods, cycleDehaze [ Engin Deniz, et al. Cycle-haze: enhanced cycle for single Image haze. IEEE Conference on Computer Vision and Pattern Recognition Workshops.2018:825-833 ], ZID [ Li Boyun, et al. Zero-shot Image haze. IEEE Transactions on Image Processing,2020,29 8457-8466 ] and SNSPGAN. The results in tables 1 and 2 show that the quantitative evaluation results obtained with the defogging method proposed by the present invention are higher. The defogging visual effects of fig. 3 and 4 show that the defogging method provided by the invention can obtain better visual evaluation results, and the defogged image obtained by the invention is closer to the reference image. The visual result and the quantitative result both prove the effectiveness of the defogging network designed by the invention.

Table 1: results of indoor data defogging quantitative evaluation

Index (I)	SSIM	PSNR
			CycleDehaze	0.810	18.870
ZID	0.835	19.830
			SNSPGAN	0.788	17.747
The invention	0.894	22.558

Table 2: results of quantitative evaluation of defogging of outdoor data sets

Index (I)	SSIM	PSNR
			CycleDehaze	0.886	21.067
ZID	0.506	12.499
			SNSPGAN	0.714	14.045
The invention	0.929	24.562

The technical means disclosed in the invention scheme are not limited to the technical means disclosed in the above embodiments, but also include the technical scheme formed by any combination of the above technical features.

Claims

1. A light controllable defogging method based on a non-supervision layer embedding and vision conversion model is characterized in that the designed image defogging method comprises four core networks, namely a light controllable defogging network, a fog synthesis network, a defogging judgment network and a fog synthesis judgment network, wherein the light controllable defogging network is a defogging network used in practical application, and the other three networks are only used for assisting the training of the light controllable defogging network and do not need to be used in the practical application process; the illumination controllable defogging network is formed by a visual conversion module and comprises two processes of feature coding and decoding reconstruction, wherein feature output containing two branches is obtained in the decoding process, decoding features of the two branches are fused in a visual layer embedding mode, and defogged images under different illumination conditions are obtained by controlling the decoding feature weights of the branches; the fog synthesizing network is a single-branch coding and decoding network formed by a visual conversion module, and the input of the fog synthesizing network is a fog-free image and the output of the fog synthesizing network is a fog image; the defogging discrimination network and the fog synthesis discrimination network are used for guiding the training process of the illumination controllable defogging network and the fog synthesis network.

2. The illumination-controllable defogging method based on the non-supervision layer embedding and vision conversion model is characterized in that the illumination-controllable defogging network controls the decoding feature weight of the inference stage through an illumination-controllable module so as to obtain defogged images with different feature weight ratios; the illumination controllable defogging network comprises an encoder and two decoders, wherein the encoder is phi, and the decoders are respectively phi

And/or>

o _r (x) Representing the visual layer embedding output, for o _r (x) The calculation of (2) uses a view layer model, and the following calculation is carried out by adopting element-by-element multiplication '×':

wherein

Is a feature obtained through a deep network; obtaining o _d (x) And o _r (x) Then, linear fusion and nonlinear mapping are needed, and the linear fusion process is as follows:

o _f (x)＝α×o _d (x)+(1-α)×o _r (x)

in the above formula, α represents a balance factor, o _f (x) Is the output of the linear fusion stage; the nonlinear mapping process is realized by adopting a hyperbolic tangent function, and the final defogging output o (x) can be obtained as follows:

3. The illumination-controllable defogging method based on the non-supervision layer embedding and vision conversion model is characterized in that the illumination-controllable defogging network and the fog synthesizing network respectively comprise two characteristic diagram dimension reduction processes and two characteristic diagram dimension increasing processes; the dimension of an input image and an output image of the illumination controllable defogging network and the fog synthesizing network is H multiplied by W multiplied by 3, wherein H represents the height of the image, W represents the width of the image, 3 represents that the image comprises 3 channels, the number of basic characteristic channels is set to be L in the characteristic calculation process, and the value of L is 64; the operations required by the illumination controllable defogging network and the fog synthesizing network comprise blocking, visual feature conversion, block embedding and upsampling; the partitioning and the block embedding are realized by convolution operation, the up-sampling is realized by deconvolution, and the visual characteristic conversion module is realized by a multi-head self-attention mechanism, layer normalization and a multilayer perceptron in a window mode.

4. The illumination-controllable defogging method based on the non-supervision layer embedding and vision conversion model as claimed in claim 3, wherein the calculation flow of the illumination-controllable defogging network and the fog synthesis network is as follows:

first, for the illumination-controllable defogging network, the encoding process includes 4 stages, which are respectively: stage 1: the output feature dimension is H multiplied by W multiplied by L; and (2) stage: a block embedding layer and a visual feature conversion layer with output feature dimensions of

And (4) stage: visual feature translation layer

The decoding process includes decoding calculation of two decoders, both of which perform stage 5 and stage 6 calculation, wherein the network parameters of stage 5 are shared by the two decoders, and are respectively:

and (5) stage: upsampling layer and visual feature conversion layer, output feature dimensionIs composed of

Secondly, after 6 stages of the illumination controllable defogging network are completed, mapping a feature graph from dimension H multiplied by W multiplied by L to dimension H multiplied by W multiplied by 3 through block projection, and obtaining final defogging output through an illumination controllable module; the fog synthesis network only comprises an encoder and a decoder, the encoder comprises calculation in stages 1 to 4, the decoder comprises calculation in stages 5 and 6, the input and output characteristic dimension of each stage of the fog synthesis network is consistent with the input and output characteristic dimension of each stage of the illumination controllable defogging network, and after the calculation in 6 stages is completed, the fog synthesis network directly maps the characteristic diagram with dimension H multiplied by W multiplied by L to the output image with dimension H multiplied by W multiplied by 3 by adopting block projection.

5. The illumination-controllable defogging method based on the non-monitoring layer embedding and vision conversion model is characterized in that the defogging discrimination network and the fog synthesis discrimination network have the same structure, and for an input image with the dimension H multiplied by W multiplied by 3, firstly, feature mapping is carried out through a convolution kernel with the length and the width being 3, a feature map with the dimension H multiplied by W multiplied by L is obtained, and then, discrimination output is obtained through convolution with residual connection, batch normalization and activation operation.

6. The illumination-controllable defogging method based on the non-supervision layer embedding and vision conversion model is characterized in that a pseudo label which can represent the characteristics of a clear image is calculated in a dark channel in a priori manner, and the training process of the illumination-controllable defogging network is guided by calculating the characteristic loss at the characteristic level; for a given input image x, a pseudo label obtained by dark channel prior calculation is theta (x), the output o (x) of the illumination controllable defogging network is constrained by adopting a prior statistical rule, and the constraint is corresponding to a prior characteristic constraint loss function L _θ Comprises the following steps:

L _θ (θ(x),o(x))＝γ||ψθ(x))-ψ(o(x))|| ₂

in the above formula, the function psi represents a pre-trained feature extraction network for estimating the distance between the defogged image o (x) and the pseudo label theta (x) feature and calculating the loss function L _θ The parameters of the pre-trained feature extraction network are kept fixed in the process of (2), and the parameters of the pre-trained feature extraction network are not updated after back propagation; loss function L in the training process of the illumination controllable defogging network _θ The weight gamma of the training data is dynamically updated, and the weight gamma is attenuated to be 0.9 times of the previous period every other complete training period.

7. The method of claim 1, wherein the designed image defogging model is trained in an unsupervised manner, the illumination controllable defogging network and the defogging discrimination network are used for defogging training, the fog synthesis network and the fog synthesis discrimination network are used for fog synthesis training, and the defogging training and the fog synthesis training are constrained by cycle coincidence loss. G is adopted as the illumination controllable defogging network and the fog synthesizing network _J And G _I Indicating that the mist-removing discriminating network and the mist synthesizing discriminating network adopt D _J And D _I In the sampling process, randomly sampling a fog image x from a fog image domain I, and randomly sampling a fog-free image y from a fog-free image domain J; the training loss of the image defogging model comprises antagonism loss, reconstruction constraint loss and prior characteristic constraint loss function L _θ Wherein the reconstruction constraint loss comprises a cycle consistency loss and an identity loss; the resistance loss was calculated as follows:

in the above formula p _x And p _y Representing the distribution function of x and y respectively,

e stands for mathematical expectation, L (G) _J ,D _J ) And L (G) _I ,D _I ) Are all antagonistic losses;

the generated image is further constrained using a cyclic consistency penalty and an identity penalty as follows:

in the above formula L _r (G _I ,G _J ) Represents a reconstruction constraint penalty;

the overall loss function is an antagonism loss, a reconstruction constraint loss and a priori characteristic constraint loss function L _θ Weighted summation of (2):

L _all ＝L(G _J ,D _J )+L(G _I ,D _I )+L _r (G _I ,G _J )+λL _θ

in the above formula, lambda is a balance factor; and finally, updating the network parameters through a gradient descent algorithm.