CN113066025B

CN113066025B - Image defogging method based on incremental learning and feature and attention transfer

Info

Publication number: CN113066025B
Application number: CN202110304663.2A
Authority: CN
Inventors: 王科平; 李冰锋; 韦金阳; 杨艺; 李新伟; 崔立志
Original assignee: Henan University of Technology
Current assignee: Henan University of Technology
Priority date: 2021-03-23
Filing date: 2021-03-23
Publication date: 2022-11-18
Anticipated expiration: 2041-03-23
Also published as: CN113066025A

Abstract

The invention discloses an image defogging method based on incremental learning and feature and attention transfer, which comprises the following steps of: s1, constructing a self-encoder network serving as a teacher network, and extracting a first intermediate layer feature diagram and a first feature attention diagram; s2, constructing a defogging network serving as a student network, outputting a second intermediate layer characteristic diagram and a second characteristic attention diagram to fit the first intermediate layer characteristic diagram and the first characteristic attention diagram, and performing enhancement operation on corresponding characteristics of a third characteristic attention diagram obtained after fitting; s3, training the teacher network by using a plurality of groups of paired same images; s4, carrying out optimization training on the student network by using a plurality of groups of paired fog images and clear images; s5, training a student network by using the combined action of an SSIM loss function and a Smooth L1 loss function; and S6, performing incremental operation on the data set in the student network, and improving the defogging capacity of the defogging network on other data.

Description

Image defogging method based on incremental learning and feature and attention transfer

Technical Field

The invention relates to the field of image processing, in particular to an image defogging method based on incremental learning and feature and attention transfer.

Background

In recent years, the air quality is deteriorated and the haze weather is gradually increased due to industrial production, automobile emission and other reasons, the problems of low contrast, color distortion, blurring and the like of a foggy image acquired by imaging equipment are caused by absorption and scattering of light rays by suspended particles in the air, the visual effect of the image is directly influenced by the haze image, and a high-level computer vision task taking the image as a processing object is limited, so that the haze image sharpening research has important significance in the field of computer vision.

An image restoration method based on an atmospheric scattering model and an image defogging method based on deep learning are the mainstream methods at present. However, the image restoration method based on the atmospheric scattering model has the problems of defogging residue, image distortion and the like caused by inaccurate estimation of intermediate parameters; the image defogging method based on the deep learning has the problem of weak generalization capability due to the limitation of a data set; therefore, there is a need to develop a method for improving the defogging and generalization ability in the defogging network to solve the above problems.

Disclosure of Invention

The invention aims to solve the problems and provides an image defogging method based on incremental learning, characteristics and attention transfer, which is simple to operate and capable of improving a defogging effect.

In order to realize the purpose, the technical scheme of the invention is as follows:

an image defogging method based on incremental learning and feature and attention transfer comprises the following steps:

s1, constructing a self-encoder network serving as a teacher network, and extracting first intermediate layer feature graphs and first feature attention graphs of different dimensions in the self-encoder network for subsequent training of a student network;

s2, constructing a defogging network serving as a student network, wherein the defogging network consists of a residual block and two layers of convolutions, and outputting a second intermediate layer characteristic diagram with different dimensions and a first intermediate layer characteristic diagram extracted from a coder network by using a Smooth L1 loss function to constrain the residual block, and fitting a first characteristic attention diagram and a first intermediate layer characteristic diagram by using a third characteristic attention diagram obtained after fitting as weights to perform enhancement operation on corresponding characteristics;

s3, using a plurality of groups of same images in pairs as input and labels of a teacher network to train the teacher network;

s4, using a plurality of groups of paired fog images and clear images as input and labels of the student network to carry out optimization training on the student network;

s5, using a Smooth L1 loss function as a loss function between labels and defogging results in a teacher network and a student network, simultaneously using an SSIM loss function as a loss function between a first middle layer characteristic diagram and a second middle layer characteristic diagram, using the Smooth L1 loss function as a loss function between a first attention diagram and a second attention diagram, and training the student network under the joint action of the SSIM loss function and the Smooth L1 loss function;

and S6, performing incremental operation on the data set in the student network, and improving the defogging capacity of the defogging network on other data.

Further, the self-encoder network in the step S1 is composed of a convolution module and an up-sampling module; the convolution module comprises four layers of convolution operation; the first layer of convolution uses 64 convolution kernels of 3x3, and performs convolution operation with step size of 2 and pad of 1, i.e. f ₁ ＝3，c ₁ =64, then the first layer convolution may be denoted as Conv ₁ (3, 64, 3); the second layer convolution uses 128 convolution kernels of 3x3, with step size 1 and pad 1, i.e. f ₂ ＝3，c ₂ =128, second layer convolution may be denoted as Conv ₂ (64, 128, 3); the third layer of convolution uses 256 convolution kernels at 3x3, and performs the convolution operation with step size of 2 and pad of 1, i.e. f ₃ ＝3，c ₃ =256, the convolution of the third layer can be represented as Conv ₃ (128,256,3); the fourth layer of convolution uses 512 convolution kernels of 3x3, and performs convolution operation with step size of 1 and pad of 1, i.e. f ₄ ＝3，c ₄ =512, fourth layer convolution may be expressed as Conv ₄ (256,512,3)；

The upper partThe sampling module corresponds to the convolution module, and the up-sampling module comprises four layers of deconvolution operations; the first layer of deconvolution uses 256 4x4 convolution kernels, and upsamples with step size of 2 and pad of 1, i.e. f ₁ '＝4， c ₁ ' =256, first layer deconvolution denoted as TranConv ₁ (512, 256, 4); the second layer of deconvolution adopts 128 1x1 convolution kernels, and upsampling is carried out by adopting step length of 1 and pad of 0, namely f ₂ '＝1，c' ₂ =128, second layer deconvolution denoted as TranConv ₂ (256, 128, 1); the third layer of deconvolution adopts 64 4x4 convolution kernels, and performs upsampling by using step length of 2 and pad of 1, namely f ₃ '＝4，c' ₃ =64, third layer deconvolution as TranConv ₃ (128, 64, 4); the fourth layer of deconvolution adopts 3 1x1 convolution kernels, and upsampling is carried out by adopting the step length of 1 and the pad of 0, namely f ₄ '＝1， c' ₄ =3, fourth layer deconvolution denoted as TranConv ₄ (64,3,1)。

Further, the residual block in step S2 adopts two 3 × 3 convolution layers, the pad is 1, the step is 1, the input dimension and the output dimension are kept unchanged, that is, each layer of residual block is in the Conv-ReLU-Add format, and in addition, a layer of convolution kernel with a convolution kernel of 3 × 3, the step is 2, and the pad is 1 is added before the first layer and the third layer of residual block to perform downsampling operation.

Further, in the step S2, a third feature attention is input into the student network after the feature enhancement operation is performed on the feature in the feature enhancement module.

Further, in step S5, a Smooth L1 loss is used as a loss function between the output and the tag and between the first feature attention diagram and the second feature attention diagram, the Smooth L1 loss function is obtained by improving on the basis of an L1 norm loss function, and a mathematical formula of the L1 norm loss function is as follows:

wherein J is a label, and the label is a label,

n is the number of samples as the network estimation result;

the mathematical formula for the Smooth L1 loss function is:

wherein the content of the first and second substances,

further, the mathematical formula of the SSIM loss function in step S5 is:

wherein x is the intermediate characteristic of the foggy image of the student network learning,

intermediate feature, mu, of fog-free image output for teacher's network _x 、

Respectively the average value of the second intermediate layer characteristic diagram and the first intermediate layer characteristic diagram,

the variances of the second intermediate layer characteristic diagram and the first intermediate layer characteristic diagram are respectively,

the covariance of the second interlayer feature map and the first interlayer feature map; c. C ₁ ＝(k ₁ L) ² 、c ₁ ＝(k ₂ L) ² Is a constant number, k ₁ Is 0.01,k ₂ Is 0.03 and L is the pixel value dynamic range of the image.

Further, the incremental operation on the data set in the student network in step S6 includes the following steps:

s61, selecting an indoor fog image data set to train an encoder network;

s62, inputting the indoor fog image data set serving as a training data set into a defogging network, and simultaneously inputting a clear image corresponding to each fog image in the indoor fog image data set into an encoder network;

and S63, on the basis of the parameters of the defogging network in the step S62, reserving part of the indoor fog images, and adding part of the outdoor fog images as a training data set to retrain the defogging network.

Compared with the prior art, the invention has the advantages and positive effects that:

the invention provides an image defogging method based on incremental learning and feature and attention transfer, which can effectively improve the defogging and generalization capabilities of a defogging network; a double-network model is adopted on a network structure, a self-encoder is used as a teacher network, a middle-layer characteristic diagram and a characteristic attention diagram of a fog-free image are extracted to increase the constraint of a loss function and guide the learning of a defogging network (a student network), an incremental learning method idea is adopted on a training mode, the defogging network is trained by using an indoor fog diagram data set, a small sample data set including an indoor fog diagram and an outdoor fog diagram is used after the training is finished, the network is retrained, the forgetting of the defogging network to the original knowledge is reduced under the combined action of the guidance of the teacher network and the retention of a small number of data sets, and the defogging effect of the image is improved.

The invention has stronger defogging capability on indoor image data, only needs to take a small amount of image data to perform incremental learning on the network when the defogging effect of the network on an outdoor image data set needs to be improved, does not need to retrain the network by a large amount of data, saves a large amount of time, has good effect on processing two data sets, and has better performance compared with other advanced defogging methods.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the embodiments or the description of the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

FIG. 1 is a general block diagram of a network of the present invention;

fig. 2 is a schematic structural diagram of a feature enhancement module.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived from the embodiments of the present invention by a person skilled in the art without any creative effort, should be included in the protection scope of the present invention.

The invention provides a double-network defogging method combining attention, incremental learning and other methods, which comprises the following steps of:

(1) And constructing a self-encoder network serving as a teacher network, reconstructing clear images, extracting intermediate layer feature maps and feature attention maps of different dimensions of the network, and training subsequent student networks. The teacher network comprises an up-sampling module and a down-sampling module.

(2) And constructing a defogging network serving as a student network for clarifying a fog image, wherein the network consists of residual blocks formed by jump connection, constraining feature images of different dimensions output by the residual blocks by using a Smooth L1 loss function, fitting a teacher network feature image and an attention image drawing, and taking the fitted attention image drawing as weight to enhance corresponding features. And the dimension of the output characteristic diagram of each layer of the student network corresponds to the teacher network.

(3) The teacher network is trained using multiple sets of pairs of identical images as teacher network inputs and labels.

(4) And (4) using multiple groups of paired fog images and clear images as input and labels of the student network to optimize and train the student network.

(5) And measuring the difference between the label and the network defogging result by using a Smooth L1 norm as a loss function between the label and the network defogging result in the teacher network and the student network, and further using an SSIM loss function as a loss function of a characteristic diagram between the teacher network and the student network, and using the Smooth L1 loss function as a loss function of the first attention diagram and the second attention diagram, wherein the two loss functions jointly act to train the student network.

(6) The student network fits the intermediate characteristics of the teacher network as much as possible, and the characteristics are enhanced by attention, so that the characteristic extraction capability of the student network is improved, and the defogging capability is further enhanced.

(7) The student network data set is increased, the defogging capacity of the network on other data is improved, and the generalization capacity of the network is enhanced.

In step (1), the self-encoder is composed of four layers of convolution and up-sampling modules, the first layer of convolution uses 64 convolution kernels of 3x3, the step size is 2, and the pad is 1 to perform convolution operation, namely f ₁ ＝3，c ₁ =64, then convolutional layer can be represented as Conv ₁ (3, 64, 3); the second layer convolution uses 128 convolution kernels of 3x3, with step size 1 and pad 1, i.e. f ₂ ＝3，c ₂ =128, convolutional layer may be referred to as Conv ₂ (64, 128, 3); the third layer of convolution uses 256 convolution kernels of 3x3, with step size of 2 and pad of 1, and performs the convolution operation, i.e. f ₃ ＝3，c ₃ =256, the convolution layer can be represented as Conv ₃ (128, 256, 3); the fourth layer of convolution uses 512 convolution kernels of 3x3, and performs convolution operation with step size of 1 and pad of 1, i.e. f ₄ ＝3，c ₄ =512, convolution layer may be denoted Conv ₄ (256,512,3). In order to prevent the grid effect of the image restored by deconvolution when the characteristic diagram is too small, the invention only carries out downsampling twice, expands channels simultaneously, is beneficial to network training, prevents information loss, and activates output by using the ReLU after each layer of convolution, thereby increasing the nonlinearity of the network. The up-sampling operation corresponds to the convolution module, the four deconvolution layers are utilized to restore the original size of the image, and meanwhile, the number of channels is also restored to the original state, and the specific operation is as follows: first layer is inverseThe convolution uses 256 4 × 4 convolution kernels, with step size of 2 and pad of 1, to perform one up-sampling, i.e. f ₁ '＝4，c ₁ ' =256, reduce 512 channels to 256, deconvolution layer is denoted TranConv ₁ (512, 256, 4), the second layer of deconvolution uses 128 1x1 convolution kernels with step size of 1, pad of 0, without changing the feature map size, i.e., f ₂ '＝1，c' ₂ =128, deconvolution layer denoted TranConv ₂ (256, 128, 1); the third layer of deconvolution uses 64 4x4 convolution kernels, uses step size of 2 and pad of 1, and performs upsampling to double the feature map as in the first layer, i.e., f ₃ '＝4，c' ₃ =64, deconvolution denoted TranConv ₃ (128, 64, 4); the fourth layer of deconvolution adopts 3 1x1 convolution kernels, the step length is 1, the pad is 0, and the size of the characteristic diagram is not changed, namely f ₄ '＝1，c' ₄ =3, deconvolution as TranConv ₄ (64,3,1)。

In the step (2), the defogging network adopts residual blocks to form a network backbone, and the 'identity mapping' reduces the loss of characteristic information in the characteristic extraction process, retains more information, is beneficial to the training of the network and prevents the problem of gradient 'explosion'. The defogging network uses four residual blocks, performs feature extraction with two convolution layers for downsampling, and in order to ensure that the feature diagram of each layer of the defogging network has the same dimension as the feature diagram of a teacher network, the residual blocks adopt two 3x3 convolution layers, the pad is 1, the step length is 1, the input dimension and the output dimension are kept unchanged, namely each layer is in Conv-ReLU-Conv-ReLU-Add format, no BN layer exists, the experiment shows that the BN layer can cause the color distortion phenomenon of an image, and then, a convolution with the convolution kernel of 3x3 and the step length of 2 and the pad of 1 is respectively added in front of the first layer and the third layer of residual blocks to perform downsampling operation. In addition, the characteristics of each layer of the teacher network and the student network are input into a characteristic enhancement module (FE), and the characteristics are enhanced and then input into the next layer of the student network. And finally, performing up-sampling operation on the feature graph to recover a clear fog-free image, wherein the up-sampling operation is the same as that of a teacher network.

The self-encoder structure adopts an encoding-decoding structure to reconstruct images, and the self-encoder is selected as a teacher to learn the mapping from the original images to the original images through network learning. Based on the idea that a fog-free clear image serving as an input extracted feature is more representative and more suitable for recovering a fog-free image compared with a fog-free image, the method takes a self-encoder which is well trained as a teacher network, extracts a fog-free image feature map and an attention map which are obtained from each middle layer of the network, calculates a loss function with a corresponding feature map and the attention map which are obtained from a defogging network (student network), and fits the feature map extracted by the defogging network to the teacher network feature map. In addition, the method transforms the feature graph through the Sigmod function to obtain an attention map, and due to the characteristics of the Sigmod function, the value obtained by transforming the important feature pixel points is larger, namely the weight is larger, so that the neural network puts more attention on the important features. The Sigmod function maps the feature values to (0, 1) and multiplies the feature values correspondingly with the original feature graph as attention weights, so that the obtained feature values are gradually reduced and are not beneficial to network training, aiming at the problem, an identity mapping is added in the invention, the original feature graph and the processed feature graph are subjected to element addition, so that the function of feature enhancement is achieved, the feature enhancement is realized by an FE (feature enhancement) module in the graph 1, the invention selects feature graphs extracted by different convolution layers for feature enhancement, so that features extracted by a defogging network are more comprehensively fitted to teacher network features, and an FE structural diagram is shown in a graph 2.

In the step (5), a Smooth L1 loss function is used for measuring the difference between the clear image output by the network and the real clear image, and the training of the network is realized by minimizing the loss function. The Smooth L1 loss function is obtained by improving an L1 norm loss function, and the mathematical formula of the L1 norm loss can be expressed as:

wherein J is a label, and the label is a label,

for the network estimation result, N is the number of samples. The L1 loss function has good robustness, but its central point is a break point, which is not Smooth, resulting in unstable solution, and for this problem, the scholars propose that the Smooth L1 loss function improves the L1 loss, and its mathematical formula can be expressed as:

wherein the content of the first and second substances,

in step (6), besides calculating the Loss Loss1 of the fog map and the fog-free map, the intermediate output of the teacher network is used as a soft label of the student network, the Loss LOSS _ F between the feature maps of each layer and the Loss LOSS _ A between the attention maps are added, smooth L1 Loss is used as a Loss function between the foggy image and the estimated fogless image and between two network intermediate attention maps, and Structural Similarity Index (SSIM) is used as a Loss function between the intermediate characteristic maps output by the two networks. The structural similarity loss is used for measuring the structural similarity between two images, the structural similarity is compared from three aspects of brightness, contrast and structure, the evaluation standard of SSIM is similar to the visual system of human, the sensing of local structural change is sensitive, the detail processing is more perfect, and the network performance is greatly improved due to the constraint of a multi-loss function. The SSIM mathematical expression is:

wherein x is the intermediate characteristic of the fog map of the student network learning,

intermediate fog-free map feature, mu, output for teacher's network _x 、

Are the average values of the characteristic maps respectively,

respectively, the variance of the feature map is obtained,

is the feature map covariance. c. C ₁ ＝(k ₁ L) ² 、c ₁ ＝(k ₂ L) ² Is a constant number, k ₁ 、k ₂ The dynamic ranges of the pixel values of the images are 0.01 and 0.03 by default, and the value of the method is 255.

In step (7), in order to enhance the network generalization capability, an incremental learning manner is adopted in the network learning part, and the structure is shown in fig. 1 (right). The network training is divided into three steps: the first step is as follows: training a self-encoder (teacher network), and selecting an indoor fog image as a data set, so that the teacher network has good indoor image reconstruction and feature extraction capability; the second step is that: the defogging network (student network) adopts an indoor fog image data set, the fog image is input into the defogging network, meanwhile, a clear image corresponding to the fog image is used as the input of the self-encoder, and the defogging network is trained to have the capability of removing indoor image haze; the third step: and on the basis of the parameters of the defogging network in the second step, reserving a small amount of indoor fog maps, and adding a small amount of outdoor fog maps as a data set to retrain the network. The incremental learning has the defects that old knowledge is forgotten, and when the network learns new knowledge, part of the existing knowledge is forgotten, but the teacher network provided by the invention not only reduces the forgetting of the student network to the existing knowledge, but also improves the effect of the network to the new knowledge.

The results of the experiments were compared on ITS data sets, and objective evaluation index pairs are shown in table 1.

TABLE 1 Objective evaluation index comparison

The results were compared on the OTS data sets, and the objective evaluation index pairs are shown in table 2.

TABLE 2 Objective evaluation index

As is apparent from tables 1 and 2, the technical scheme of the invention can effectively improve the defogging and generalization capability of the defogging network; the invention adopts a double-network model on a network structure, a self-encoder is used as a teacher network, a middle layer characteristic diagram and a characteristic attention diagram of a fog-free image are extracted to increase the constraint of a loss function and guide the learning of a defogging network (student network), an incremental learning method idea is adopted on a training mode, firstly, an indoor fog diagram data set is used for training the defogging network, a small sample data set is used after the training is finished, the network is retrained, the forgetting of the defogging network on the original knowledge is reduced under the coaction of the teacher network and the reservation of a small number of data sets, and the defogging effect of the image is improved.

Claims

1. An image defogging method based on increment learning and feature and attention transfer is characterized in that: the method comprises the following steps:

s1, constructing a self-encoder network serving as a teacher network, and extracting first intermediate layer feature diagrams and first feature attention diagrams with different dimensions in the self-encoder network for subsequent training of a student network;

s2, constructing a defogging network serving as a student network, wherein the defogging network consists of a residual block and two layers of convolutions, and outputting a second intermediate layer characteristic diagram with different dimensions and a second characteristic attention diagram to fit a first intermediate layer characteristic diagram and a first characteristic attention diagram extracted from a coder network by using a Smooth L1 loss function constraint residual block, and performing enhancement operation on corresponding characteristics by using a third characteristic attention diagram obtained by converting the fitted characteristics as weight;

s3, using a plurality of groups of paired same clear images as the input and the label of the teacher network to train the teacher network;

s5, a Smooth L1 loss function is used as a loss function between labels and defogging results in the teacher network and the student network, an SSIM loss function is used as a loss function between a first middle layer characteristic diagram and a second middle layer characteristic diagram, the Smooth L1 loss function is used as a loss function between a first attention diagram and a second attention diagram, and the SSIM loss function and the Smooth L1 loss function jointly act to train the student network;

2. The image defogging method based on incremental learning and feature and attention transfer as claimed in claim 1, wherein: the self-encoder network in the step S1 consists of a convolution module and an up-sampling module; the convolution module comprises four layers of convolution operation; the first layer of convolution uses 64 convolution kernels of 3x3, and performs convolution operation with step size of 2 and pad of 1, i.e. f ₁ ＝3，c ₁ =64, then the first layer convolution may be denoted as Conv ₁ (3, 64, 3); the second layer convolution uses 128 convolution kernels of 3x3, with step size 1 and pad 1, i.e. f ₂ ＝3，c ₂ =128, second layer convolution may be denoted as Conv ₂ (64, 128, 3); the third layer of convolution uses 256 convolution kernels at 3x3, and performs the convolution operation with step size of 2 and pad of 1, i.e. f ₃ ＝3，c ₃ =256, the convolution of the third layer can be represented as Conv ₃ (128, 256, 3); the fourth convolution uses 512 convolution kernels of 3x3, performs the convolution operation with step size 1 and pad 1,i.e. f ₄ ＝3，c ₄ =512, fourth layer convolution may be expressed as Conv ₄ (256,512,3)；

The up-sampling module corresponds to the convolution module and comprises four layers of deconvolution operations; the first layer of deconvolution uses 256 4x4 convolution kernels, upsampling is carried out by using step size of 2 and pad of 1, namely f' ₁ ＝4，c′ ₁ =256, first layer deconvolution as TranConv ₁ (512, 256, 4); the second layer of deconvolution uses 128 convolution kernels of 1x1, upsampling with step size 1 and pad 0, i.e. f' ₂ ＝1，c′ ₂ =128, second layer deconvolution as TranConv ₂ (256, 128, 1); the third layer of deconvolution uses 64 4x4 convolution kernels, upsampled with step size of 2 and pad of 1, i.e., f' ₃ ＝4，c′ ₃ =64, third layer deconvolution as TranConv ₃ (128, 64, 4); the fourth layer of deconvolution uses 3 convolution kernels of 1x1, upsampling with step size 1 and pad 0, i.e. f' ₄ ＝1，c′ ₄ =3, fourth layer deconvolution as TranConv ₄ (64,3,1)。

3. The image defogging method based on incremental learning and feature and attention transfer as claimed in claim 2, wherein: the residual block in step S2 adopts two 3 × 3 convolution layers, where pad is 1 and step length is 1, and input dimension and output dimension are kept unchanged, that is, each layer of residual block is in Conv-ReLU-Add format, and in addition, a layer of convolution kernel with 3 × 3 and step length of 2 and pad is 1 is added before the first layer and the third layer of residual block respectively to perform downsampling operation.

4. The image defogging method based on incremental learning and feature and attention transfer as claimed in claim 3, wherein: in the step S2, the third feature attention is input into the student network after the feature enhancement operation is performed on the feature in the feature enhancement module.

5. The image defogging method based on incremental learning and feature and attention transfer according to claim 4, wherein: in the step S5, a Smooth L1 loss is used as a loss function between the output and the tag and between the first feature attention diagram and the second feature attention diagram, the Smooth L1 loss function is obtained by improving on the basis of an L1 norm loss function, and a mathematical formula of the L1 norm loss function is as follows:

wherein J is a label, and J is a label,

n is the number of samples as the network estimation result;

the mathematical formula for the Smooth L1 loss function is:

wherein the content of the first and second substances,

6. the image defogging method based on incremental learning and feature and attention transfer as claimed in claim 5, wherein: the mathematical formula of the SSIM loss function in step S5 is:

wherein x is the intermediate characteristic of the fog image of the student network learning,

intermediate feature, mu, of fog-free image output for teacher's network _x 、

Respectively are the average values of the characteristic diagram of the second intermediate layer and the characteristic diagram of the first intermediate layer,

7. The image defogging method based on incremental learning and feature and attention transfer as claimed in claim 6, wherein: the step S6 of performing increment operation on the data set in the student network comprises the following steps:

s61, selecting an indoor fog image data set to train an encoder network;